Are you Paralyzed by a Hoard of Big Data?

Lured by the promise of big data benefits, many organizations are leveraging cheap storage to hoard vast amounts of structured and unstructured data. Without a clear framework for big data governance and use, businesses run the risk of becoming paralyzed under an unorganized jumble of data, much of which has become stale and past its expiration date. Stale data is toxic to your business – it could lead you into taking the wrong action based on data that is no longer relevant.

You know there’s valuable stuff in there, but the thought of wading through all THAT to find it stops you dead in your tracks.  There goes your goal of business process improvement, which according to a recent Informatica survey, most businesses cite as their number one Big Data Initiative goal.

Just as the individual hoarder often requires a professional organizer to help them pare the hoard and institute acquisition and retention rules for preventing hoard-induced paralysis in the future, organizations should seek outside help when they find themselves unable to turn their data hoard into actionable information.

An effective big data strategy needs to include the following components:

  1. An appropriate toolset for analyzing big data and making it actionable by the right people. Avoid building an ivory tower big data bureaucracy, and remember, insight has to turn into action.
  2. A clear and flexible framework, such as social master data management, for integrating big data with enterprise applications, one that can quickly leverage new sources of information about your customers and your market.
  3. Information lifecycle management rules and practices, so that insight and action will be taken based on relevant, as opposed to stale  information.
  4. Consideration of how the enterprise application portfolio might need to be refined to maximize the availability and relevance of big data. In today’s world, that will involve grappling with the flow of information between cloud and internally hosted applications as well.
  5. Comprehensive data security framework that defines who is entitled to use the data, change the data and delete the data, as well as encryption requirements as well as any required upgrades in network security.

Get the picture? Your big data strategy isn’t just a data strategy. It has to be a comprehensive technology-process-people strategy.

All of these elements, should of course, be considered when building your big data business case, and estimating return on investment.

How does a data-driven healthcare organization work?

As the pressure increases for accountability and transparency for healthcare organizations, the spotlight is squarely on data: how does the organization gather, validate, store and report it.  In addition, the increasing level of regulatory reporting is driving home a need for certifying data – applying rigor and measurement to its quality, audit, and lineage.  As a result, a healthcare organization must develop an Enterprise Information Management approach that zeros in on treating data as a strategic asset.  While treating data as an asset would seem to be obvious given the level of IT systems necessary to run a typical healthcare organization, the explosion of digital data collected and types of digital data (i.e. video, digital photos, audio files) has overwhelmed the ability to locate, analyze and organize it.

A typical example of this problem comes when an organization decides to implement Business Intelligence or performance indicators with an electronic dashboard.  There are many challenges in linking data sources to corporate performance measures.  When the same data element exists in multiple places, i.e. patient IDs, encounter events, then there must be a decision about the authoritative source or “single version of the truth.” Then there is the infamous data collision problem: Americans move around and organizations end up with multiple addresses for what appears to be the same person, or worse yet, multiple lists of prescribed medications that don’t match.  The need to reconcile data discrepancies requires returning to the original source of information – the patient to bring it to a current status.  Each of us can relate to filling out the form on the clipboard in the doctor’s office multiple times.  Finally, there is the problem of sparseness – we have part of the data for tracking performance but we don’t have enough for the calculation.  This problem can go on and on, but it boils down to having the right data, at the right time and using it in the right manner.

Wouldn’t the solution simply be to create an Enterprise Data Warehouse or Operational Data Store that has all of the cleansed, de-duplicated, latest data elements in it?  Certainly!  Big IF coming up: IF your organization has data governance to establish a framework for audit-ability of data; IF your organization can successfully map source application systems to the target enterprise store; IF your organization can establish master data management for all the key reference tables; IF your organization can agree on standard terminologies, and most importantly, IF you can convince every employee that creates data that quality matters, not just today but always.

One solution is to understand a key idea that made personal computers a success – build an abstraction layer.  The operating system of a personal computer established flexibility by hiding the complexity of different hardware items from the casual user through a hardware abstraction layer that most of us think of as drivers.  A video driver, a CD driver, USB driver allows the modularity and allows flexibility to adapt the usefulness of the PC.  The same principle applies to data-driven healthcare organizations.  Most healthcare applications try to tout their ability to be the data warehouse solution.  However, the need for the application to improve over time introduces change and version control issues, thus instability in the enterprise data warehouse.  In response, moving the data into an enterprise data warehouse creates the abstraction layer and the extract, transform and load (ETL) process can act like the drivers in the PC example.  Then as the healthcare applications move through time, they do not disrupt the Enterprise Data Warehouse, its related data marts and, most importantly, the performance management systems that run the business.  It is not always necessary to move the data in order to create the abstraction layer, but there are other benefits to that approach including the retirement of legacy applications.

In summary, a strong data-driven healthcare organization has to train and communicate the importance of data as a support for performance management and get the buy-in from the moment of data acquisition through the entire lifecycle of that key data element.  The pay-offs are big: revenue optimization, risk mitigation and elimination of redundant costs.  When a healthcare organization focuses on treating data as a strategic asset, then it changes the outcome for everyone in the organization, and restores trust and reliability for making key decisions.

Driving Value from Your Healthcare Analytics Program –Key Program Components

If you are a healthcare provider or payer organization contemplating an initial implementation of a Business Intelligence (BI) Analytics system, there are several areas to keep in mind as you plan your program.  The following key components appear in every successful BI Analytics program.  And the sooner you can bring focus and attention to these critical areas, the sooner you will improve your own chances for success.

Key Program Components

Last time we reviewed the primary, top-level technical building blocks.  However, the technical components are not the starting point for these solutions.  Technical form must follow business function.  The technical components come to life only when the primary mission and drivers of the specific enterprise are well understood.  And these must be further developed into a program for defining, designing, implementing and evangelizing the needs and capabilities of BI and related analytics tuned to the particular needs and readiness of the organization.

Key areas that require careful attention in every implementation include the following:

We have found that healthcare organizations (and solution vendors!) have contrasting opinions on how best to align the operational data store (ODS) and enterprise data warehouse (EDW) portions of their strategy with the needs of their key stakeholders and constituencies.  The “supply-driven” approach encourages a broad-based uptake of virtually all data that originates from one or more authoritative source system, without any real pre-qualification of the usefulness of that information for a particular purpose.  This is the hope-laden “build it and they will come” strategy.  Conversely, the “demand-driven” approach encourages a particular focus on analytic objectives and scope, and uses this focus to concentrate the initial data uptake to satisfy a defined set of analytic subject areas and contexts.  The challenge here is to not so narrowly focus the incoming data stream that it limits related exploratory analysis.

For example, a supply-driven initiative might choose to tap into an existing enterprise application integration (EAI) bus and siphon all published HL7 messages into the EDW or ODS data collection pipe.  The proponents might reason that if these messages are being published on an enterprise bus, they should be generally useful; and if they are reasonably compliant with the HL7 RIM, their integration should be relatively straightforward.  However, their usefulness for a particular analytic purpose would still need to be investigated separately.

Conversely, a demand-driven project might start with a required set of representative analytic question instances or archetypes, and drive the data sourcing effort backward toward the potentially diverging points of origin within the business operations.  For example, a surgical analytics platform to discern patterns between or among surgical cost components, OR schedule adherence, outcomes variability, payer mix, or the impact of specific material choices would depend on specific data elements that might originate from potentially disparate locations and settings.  The need here is to ensure that the data sets required to support the specific identified analyses are covered; but the collection strategy should not be so exclusive that it prevents exploration of unanticipated inquiries or analyses.

I’ll have a future blog topic on a methodology we have used successfully to progressively decompose, elaborate and refine stakeholder analytic needs into the data architecture needed to support them.

In many cases, a key objective for implementing healthcare analytics will be to bring focus to specific areas of enterprise operations: to drive improvements in quality, performance or outcomes; to drive down costs of service delivery; or to increase resource efficiency, productivity or throughput, while maintaining quality, cost and compliance.  A common element in all of these is a focus on process.  You must identify the specific processes (or workflows) that you wish to measure and monitor.  Any given process, however simple or complex, will have a finite number of “pulse points,” any one of which will provide a natural locus for control or analysis to inform decision makers about the state of operations and progress toward measured objectives or targets.  These loci become the raw data collection points, where the primary data elements and observations (and accompanying meta-data) are captured for downstream transformation and consumption.

For example, if a health system is trying to gain insight into opportunities for flexible scheduling of OR suites and surgical teams, the base level data collection must probe into the start and stop times for each segment in the “setup and teardown” of a surgical case, and all the resource types and instances needed to support those processes.  Each individual process segment (i.e. OR ready/busy, patient in/out, anesthesia start/end, surgeon in/out, cut/close, PACU in/out, etc.) has distinct control loci the measurement of which comprises the foundational data on which such analyses must be built.  You won’t gain visibility into optimization opportunities if you don’t measure the primary processes at sufficient granularity to facilitate inquiry and action.

Each pulse point reveals a critical success component in the overall operation.  Management must decide how each process will be measured, and how the specific data to be captured will enable both visibility and action.  Visibility that the specific critical process elements being performed are within tolerance and on target; or that they are deviating from a standard or plan and require corrective action.  And the information must both enable and facilitate focused action that will bring performance and outcomes back into compliance with the desired or required standards or objectives.

A key aspect of metric design is defining the needed granularity and dimensionality.  The former ensures the proper focus and resolution on the action needed.  The latter facilitates traceability and exploration into the contexts in which performance and quality issues arise.  If any measured areas under-perform, the granularity and dimensionality will provide a focus for appropriate corrective actions.  If they achieve superior performance, they can be studied and characterized for possible designation as best practices.

For example, how does a surgical services line that does 2500 total knees penetrate this monolithic volume and differentiate these cases in a way that enables usable insights and focused action?  The short answer is to characterize each instance to enable flexible-but-usable segmentation (and sub-segmentation); and when a segment of interest is identified (under-performing; over-performing; or some other pattern), the n-tuple of categorical attributes that was used to establish the segment becomes a roadmap defining the context and setting for the action: either corrective action (i.e. for deviation from standard) or reinforcing action (i.e. for characterizing best practices).  So, dimensions of surgical team, facility, care setting, procedure, implant type and model, supplier, starting ordinal position, day of week, and many others can be part of your surgical analytics metrics design.

Each metric must ultimately be deconstructed into the specific raw data elements, observations and quantities (and units) that are needed to support the computation of the corresponding metric.  This includes the definition, granularity and dimensionality of each data element; its point of origin in the operation and its position within the process to be measured; the required frequency for its capture and timeliness for its delivery; and the constraints on acceptable values or other quality standards to ensure that the data will reflect accurately the state of the operation or process, and will enable (and ideally facilitate) a focused response once its meaning is understood.

An interesting consideration is how to choose the source for a collected data element, when multiple legitimate sources exist (this issue spills over into data governance (see below); and what rules are needed to arbitrate such conflicts.  Arbitration can be based on: whether each source is legitimately designated as authoritative; where each conflicting (or overlapping) data element (and its contents) resides in a life cycle that impacts its usability; what access controls or proprietary rights pertain to the specific instance of data consumption; and the purpose for or context in which the data element is obtained.  Resolving these conflicts is not always as simple as designating a single authoritative source.

Controlling data quality at its source is essential.  All downstream consumers and transformation operations are critically dependent on the quality of each data element at its point of origin or introduction into the data stream.  Data cleansing becomes much more problematic if it occurs downstream of the authoritative source, during subsequent data transformation or data presentation operations.  Doing so effectively allows data to “originate” at virtually any position in the data stream, making traceability and quality tracking more difficult, and increasing the burden of retaining the data that originates at the various points to the quality standard.  On the other hand, downstream consumers may have little or no influence or authority to impose the data cleansing or capture constraints on those who actually collect the data.

Organizations are often unreceptive to the suggestion that their data may have quality issues.  “The data’s good.  It has to be; we run the business on it!”  Although this might be true, when you remove data from its primary operating context, and attempt to use it for different purposes such as aggregation, segmentation, forecasting and integrated analytics, problems with data quality rise to the surface and become visible.

Elements of data quality include: accuracy; integrity; timeliness; timing and dynamics; clear semantics; rules for capture; transformation; and distribution.  Your strategy must include establishing and then enforcing definitions, measures, policies and procedures to ensure that your data is meeting the necessary quality standards. 

The data architecture must anticipate the structure and relationships of the primary data elements, including the required granularity, dimensionality, and alignment with other identifying or describing elements (e.g. master and reference data); and the nature and positioning of the transformation and consumption patterns within the various user bases.

For example, to analyze the range in variation of maintaining schedule integrity in our surgical services example, for each case we must capture micro-architectural elements such as the scheduled and actual start and end times for each critical participant and resource type (e.g. surgeon, anesthesiologist, patient, technician, facility, room, schedule block, equipment, supplies, medications, prior and following case, etc.), each of which becomes a dimension in the hierarchical analytic contexts that will reveal and help to characterize where under-performance or over-performance are occurring.  The corresponding macro-architectural components will address requirements such as scalability, distinction between retrieval and occurrence latency, data volumes, data lineage, and data delivery.

By the way: none of this presumes a “daily batch” system.  Your data architecture might need to anticipate and accommodate complex hybrid models for federating and staging incremental data sets to resolve unavoidable differences in arrival dynamics, granularity, dimensionality, key alignment, or perishability.  I’ll have another blog on this topic, separately.

You should definitely anticipate that the incorporation and integration of additional subject areas and data sets will increase the value of the data; in many instances, far beyond that for which it was originally collected.  As the awareness and use of this resource begins to grow, both the value and sensitivity attributed to these data will increase commensurately.  The primary purpose of data governance is to ensure that the highest quality data assets obtained from all relevant sources are available to all consumers who need them, after all the necessary controls have been put in place.

Key components of an effective strategy are the recognition of data as an enterprise asset; the designation of authoritative sources; commitment to data quality standards and processes; recognition of data proceeding through a life cycle of origination, transformation and distribution, with varying degrees of ownership, stewardship and guardianship, on its way to various consumers for various purposes.  Specific characteristics such as the level of aggregation; the degree of protection required (e.g. PHI); the need for de-identification and re-identification; the designation of “snapshots” and “versions” of data sets; and the constraints imposed by proprietary rights. These will all impact the policies and governance structures needed to ensure proper usage of this critical asset.

Are you positioned for success?

Successful implementation of BI analytics requires more than a careful selection of technology platforms, tools and applications.  The selection of technical components will ideally follow the definition of the organizations needs for these capabilities.  The program components outlined here are a good start on the journey to embedded analytics, proactively driving the desired improvement throughout your enterprise.

Data Darwinism – Capabilities that provide a competitive advantage

In my previous post, I introduced the concept of Data Darwinism, which states that for a company to be the ‘king of the jungle’ (and remain so), they need to have the ability to continually innovate.   Let’s be clear, though.   Innovation must be aligned with the strategic goals and objectives of the company.   The landscape is littered with examples of innovative ideas that didn’t have a market.  

So that begs the question “What are the behaviors and characteristics of companies that are at the top of the food chain?”    The answer to that question can go in many different directions.   With respect to Data Darwinism, the following hierarchy illustrates the categories of capabilities that an organization needs to demonstrate to truly become a dominant force.

Foundational

The impulse will be for an organization to want to immediately jump to implementing capabilities that they think will allow them to be at the top of the pyramid.   And while this is possible to a certain extent, you must put in place certain foundational capabilities to have a sustainable model.     Examples of capabilities at this level include data integration, data standardization, data quality, and basic reporting.

Without clean, integrated, accurate data that is aligned with the intended business goals, the ability to implement the more advanced capabilities is severely limited.    This does not mean that all foundational capabilities must be implemented before moving on to the next level.  Quite the opposite actually.   You must balance the need for the foundational components with the return that the more advanced capabilities will enable.

Transitional

Transitional capabilities are those that allow an organization to move from silo’d, isolated, often duplicative efforts to a more ‘centralized’ platform in which to leverage their data.    Capabilities at this level of the hierarchy start to migrate towards an enterprise view of data and include such things as a more complete, integrated data set, increased collaboration, basic analytics and ‘coordinated governance’.

Again, you don’t need to fully instantiate the capabilities at this level before building capabilities at the next level.   It continues to be a balancing act.

Transformational

Transformational capabilities are those that allow the company to start to truly differentiate themselves from their competition.   It doesn’t fully deliver the innovative capabilities that set them head and shoulders above other companies, but rather sets the stage for such.   This stage can be challenging for organizations as it can require a significant change in mind-set compared to the current way its conducts its operations.   Capabilities at this level of the hierarchy include more advanced analytical capabilities (such as true data mining), targeted access to data by users, and ‘managed governance’.

Innovative

Innovative capabilities are those that truly set a company apart from its competitors.   They allow for innovative product offerings, unique methods of handling the customer experience and new ways in which to conduct business operations.   Amazon is a great example of this.   Their ability to customize the user experience and offer ‘recommendations’ based on a wealth of user buying  trend data has set them apart from most other online retailers.    Capabilities at this level of the hierarchy include predictive analytics, enterprise governance and user self-service access to data.

The bottom line is that moving up the hierarchy requires vision, discipline and a pragmatic approach.   The journey is not always an easy one, but the rewards more than justify the effort.

Check back for the next installment of this series “Data Darwinism – Evolving Your Data Environment.”

Data Darwinism – Are you on the path to extinction?

Most people are familiar with Darwinism.  We’ve all heard the term survival of the fittest.   There is even a humorous take on the subject with the annual Darwin Awards, given to those individuals who have removed themselves from the gene pool through, shall we say, less than intelligent choices.

Businesses go through ups and downs, transformations, up-sizing/down-sizing, centralization/ decentralization, etc.   In other words, they are trying to adapt to the current and future events in order to grow.   Just as in the animal kingdom, some will survive and dominate, some will not fare as well.   In today’s challenging business environment, while many are trying to merely survive, others are prospering, growing and dominating.  

So what makes the difference between being the king of the jungle or being prey?   The ability to make the right decisions in the face of uncertainty.     This is often easier said than done.   However, at the core of making the best decisions is making sure you have the right data.   That brings us back to the topic at hand:  Data Darwinism.   Data Darwinism can be defined as:

“The practice of using an organization’s data to survive, adapt, compete and innovate in a constantly changing and increasingly competitive business environment.”

When asked to assess where they are on the Data Darwinism continuum, many companies will say that they are at the top of the food chain, that they are very fast at getting data to make decisions, that they don’t see data as a problem, etc.   However, when truly asked to objectively evaluate their situation, they often come up with a very different, and often frightening, picture. 

  It’s as simple as looking at your behavior when dealing with data:

If you find yourself exhibiting more of the behaviors on the left side of the picture above, you might be a candidate for the next Data Darwin Awards.

Check back for the next installment of this series “Data Darwinism – Capabilities that Provide a Competitive Advantage.”

Tackling the Tough One: Master Data Management for the Healthcare Enterprise

One of the big struggles in healthcare is the difficulty of Master Data Management.  A typical regional hospital organization can have upwards of 200+ healthcare applications, multiple versions of systems and, of course, many, many “hidden” departmental applications.  In that situation, Master Data Management for the enterprise as a whole can seem like a daunting task.  Experience dictates that those who are successful in this effort start with one important weapon: data and application governance.

Data and application governance can often be compared to building police stations, but it is much more than that.  Governance in healthcare must begin with an understanding of data as an asset to the enterprise.  For example, developing an Enterprise Master Patient Index (EMPI) is creating a key asset for healthcare providers to verify the identity of a patient independent of how they enter the healthcare delivery system.  Patients are more than a surgical case, an outpatient visit or pharmacy visit.  Master data management in healthcare is the cornerstone of moving to treating patients across the entire continuum of care, independent of applications and location of care.  Bringing the ambulatory, acute care and home care settings into one view will provide assurance to patients that a healthcare organization is managing the entire enterprise.

Tracking healthcare providers and their credentials across multiple hospitals, clinics and offices is another master data management challenge.  While there are specialized applications for managing doctor’s credentials, there are not enterprise-level views that encompass all types of healthcare professionals in a large healthcare organization and their respective certifications.  In addition, this provider provisioning should be closely aligned with security and access to protected healthcare information.  A well designed governance program can supervise the creation of this key master data and the integration across the organization.

An enterprise view of Master Data provides a core foundation for exploiting an organizations data to its full potential and offers dividends beyond the required investment.  Healthcare organizations are facing many upcoming challenges with reference data as a part of master data management, especially as the mandated change from ICD-9 to ICD-10 codes approaches.   Hierarchies are the magic behind business analytics – the ability to define roll-up and drill-downs of information.  Core business concepts should be implemented as master data – how does the organization view itself?  The benefits of a carefully defined and well governed master data management program are many: Consistent reporting of trusted information, a common enterprise understanding of information, cost efficiencies of reliable data, improved decision making from trusted authoritative sources, and most importantly in healthcare, improved quality of care.

Data and application governance is the key to success with master data management.  Just like an inventory, key data elements, tables and reference data must be cataloged and carefully managed.  Master data must be guarded by three types of key people: a data owner, a data steward and a data guardian.  The data owner must take responsibility for the creation and maintenance of the key asset.  The data steward will be the subject matter expert that determines the quality of the master data and its appropriate application and security.  Finally, the data guardian is the information technology professional that oversees the database, the proper back-up and recovery of the data assets and manages the delivery of the information.  In all three roles, accountability is important and overseen by an enterprise information management (EIM) group that is composed of key data owners and executive IT management.

In summary, master data management provides the thread that ties all other data in the enterprise together.  It is worth the challenge to create, maintain and govern properly.  For success, pick the right people, understand the process and use a reliable technology.

10 things you can do right now to improve your Enterprise Business Intelligence (BI) capabilities

  1. Document examples of manulytics (manual analytics) activities to illustrate hidden fixed costs. Any BI investment initiative needs executive support and budget. You need to make a case for the investment to improve your BI capability and show the business a ROI (Return on Investment). The cost of the current manualytics activity needs to be documented to highlight the hidden fixed costs of the current way of doing business to help build consensus to make improvements.
  2. Identify manualytics processes to be moved to production and automate.
  3. Raise awareness of data as a corporate asset.
  4. Enlist and cultivate a C-level executive sponsor for your Enterprise BI effort.
  5. When the business asks a question that is difficult to answer – keep track of the level of effort expended to generate the information. How many analysts with spreadsheets are compiling information manually? When the answers accuracy are questioned, how much more time is spent proving the numbers are correct.
  6. Development and document metadata wherever possible – Build in metadata requirements gathering into your SDLC – Create and standardize a process to capture table and column definitions and business logic into a standard format. Get tribal knowledge documented so that the business can continue to operate if people leave or move on.
  7. Data Governance – develop a committee to work towards managing the data and IT assets of the organization.
  8. Create/Assign data stewards for each of the source systems to agree on service level agreements for your source systems and resolve data quality issues.
  9. Work to centralize your reference data – business hierarchies like department and product need to be centralized, agreed upon by all stakeholders – this is a task that can be driven by your corporate governance committee.
  10. Don’t boil the ocean – Look for candidate pilot projects with a narrow scope to show quick wins to the business (90 day max.)
  11. Work toward tool standardization – many organizations own one of each BI tool – work to standardize on one or two.
  12. Build a Center of Excellence around BI and ETL – work to centralize your internal expertise for BI and ETL.

Well we ended up with twelve items, any one of which could fill a book or whitepaper and may be the subject of a future post.

As we work with different organizations, similar themes emerge. Every organization is different and your road to BI maturity is different from other companies. Sometimes it pays to have a fresh set of eyes come in and survey your current state to get you started on the right foot.