Managing Data Integrity

When was the last time you looked at a view of data, report, or graph in CRM and said to yourself, “This doesn’t look right”? You’re not alone. Keeping data up-to-date is a common issue for many organizations. We rely on its accuracy for decision making. An example of decision-making from data is determining which resource to assign to a project. If the project pipeline is inaccurate, a more senior resource might get tied up in a smaller project when their skillset would have been better used on a more important project. Another example might be deciding to make an investment based on erroneous forecasts of that investment’s future.

When data is out-of-date and you recognize this, the risk of an inaccurate decision is diminished as you have the opportunity to contact the data owner(s) to get an update. When it goes unnoticed, the risk of bad decisions increases. While there are many reasons why data can get out of date, there is often one common root cause: the person responsible for entering the data did so incorrectly or failed to do so. Rather than demonizing a person, we can look to find ways to make it easier for the data to be kept up to date.

There are many factors that go into data integrity:

Does the responsible party for the data entry also own the information gathering mechanism?

This can manifest when there is a team assigned to a record or there is a disconnect and/or lag in the data gathering process. For example, if there is a government agency that only provides updates periodically, but management needs information more frequently, this can present a problem. Possible solutions:

  • One record – one owner. No team ownership of a record.
  • Talk with management about the data they want and the source if outside the direct control of the responsible party. Have an open dialogue if the data gathering mechanism is flawed or doesn’t meet the needs of management to decide on a best course of action.

Does data have to be kept up-to-date real time or can it be done periodically?

Not all decisions have to be made ad-hoc. Some decisions can be deferred, occurring weekly or monthly. It is important that an organization examine the risk associated with each data element. Those that supply data feeding high-risk areas or decisions needing to be made more often need updates frequently from their data owners. Those with less risk or are used less-often can have less emphasis on being kept up to date. Remember, at the end of the day, a person, somewhere, had to provide that data. As individuals, no one is perfect and it is unreasonable to expect perfection on every record, every field, every time. Prioritize!

Can data be automated?

There are many tools available that can be added on to your software that automates data gathering. There are many companies that have created tools that, for example, go out to the web and pull in data updates related to a search topic. Consider installing or developing such tools where appropriate. This will reduce the need for a person in your organization to be assigned to this task. It will save time and money!

Consider using a tool’s workflow or a manually created workflow to help remind data owners to make updates.

Many data tools have built in workflows. These can be used to set tasks or send an email periodically for data owners reminding them to update a record. An example might be to create a field called “Last update” which should be changed each time a person reviews the record to make updates to important fields. If this data is more than a week old, an email can be sent to the data owner. Where such tools are not available in the tool, one could use their email application to have a reoccurring task or calendar item to remind them. At last resort, a sticky note on a physical calendar can do the trick!

Data is the life-blood of an organization. Keeping it up-to-date is important for decision making affecting both small and big outcomes. Most data comes from people. Help your people by setting up reasonable, sound business practices and processes around data integrity. It won’t prevent erroneous data, but you’ll find less of it and will make you and your data owner’s work-lives much easier. For a case study about how Edgewater has followed these practices, click here for more information.

Why EMR’s Are Not Panacea’s for Healthcare’s Data Problems

So, you’ve decided to go with Epic or Centricity or Cerner for your organization’s EMR.

Think your EMR is Hamlin’s Wizard Oil?

Good, the first tough decision is out of the way. If you’re a medium to large size healthcare organization, you likely allocated a few million to a few hundred million dollars on your implementation over five to ten years. I will acknowledge that this is a significant investment, probably one of the largest in your organizations history (aside from a new expansion, but these implementations can easily surpass the cost of building a new hospital).  But I will argue: “Does that really mean the other initiatives you’ve been working should suddenly be put on hold, take a back seat, or even cease to exist?”Absolutely not. The significant majority of healthcare organizations (save a few top performers) are already years and almost a decade behind the rest of the world in adapting technology for improving the way the healthcare is delivered. How do I know this? Well, you tell me, “What other industry continues to publicly have 100,000 mistakes a year?” Okay, glad we now agree. So, are you really going to argue with me that being single-threaded, with a narrow focus on a new system implementation, is the only thing your organization can be committed to? If you’re answer is yes, I have some Cher cassette tapes, a transistor radio, a mullet, and some knee highs that should suit you well in your outdated mentality.

An EMR implementation is a game-changer. Every single one of your clinical workflows will be adjusted, electronic documentation will become the standard, and clinicians will be held accountable like never before for their interaction with the new system. Yes, it depends on what modules you buy – Surgery, IP, OP, scheduling, billing, and the list goes on. But for those of us in the data integration world, trying every day to convince healthcare leaders that turning data into information should be top of mind, this boils down to one basic principle – you have added yet another source of data to your already complex, disparate application landscape. Is it a larger data source than most? Yes. But does this mean you treat it any differently when considering its impact on the larger need for real time, accurate integrated enterprise data analysis? No. Very much no. Does it also mean that your people are suddenly ready to embrace this new technology and leverage all of its benefits? Probably not. Why? Because an EMR, contrary to popular belief, is not a panacea for the personal accountability and data problems in healthcare:

  • If you want to analyze any of the data from your EMR you still need to pull it into an enterprise data model with a solid master data foundation and structure to accommodate a lot more data than will just come from the system (how about materials management, imaging, research, quality, risk?)
    • And please don’t tell me your EMR is also your data warehouse because then you’re in much worse shape than I thought…
    • You’re not all of a sudden reporting real time. It will still take you way too long to produce those quality reports, service line dashboards, or <insert report name here>. Yes there is a real time feed available from the EMR back end database, but that doesn’t change the fact that there are still manual processes required for transforming some of this information, so a sound data quality and data governance strategy is critical BEFORE deploying such a huge, new system.

The list goes on. If you want to hear more, I’m armed to the teeth with examples of why an EMR implementation should be just that, a focused implementation. Yes it will require more resources, time and commitment, but don’t lose sight of the fact that there are plenty more things you needed to do with your data before the EMR came, and the same will be the case once your frenzied EMR-centric mentality is gone.

Keeping the Black Swan at Bay

A recent article in the Harvard Business Review highlighted some alarming statistics on project failures. IT projects were overrunning their budgets by an average of 27%, but the real shocker was that one in six of these projects was over by 200% on average. They dubbed these epic failures the “black swans” of the project portfolio.

The article ends with some excellent advice on avoiding the black swan phenomenon, but the recommendations focus on two areas:

  • Assessments of the ability of the business to take a big hit
  • Sound project management practices such as breaking big projects down into smaller chunks, developing contingency plans, and embracing reference class forecasting.

We would like to add to this list a set of “big project readiness” tasks that offer additional prevention of your next big IT project becoming a black swan.

Project Management Readiness: If you don’t have seasoned PMs with successful big project experience on your team, you need to fill that staffing gap either permanently or with contract help for the big project. Yes, you need an internal PM even if the software vendor has their own PM.

Data Readiness:  Address your data quality issues now, and establish data ownership and data governance before you undertake the big project.

Process/organization/change management readiness: Are your current business processes well documented? Is the process scope of the big project defined correctly? Are process owners clearly identified?  Do you have the skills and framework for defining how the software may change your business processes, organization structure and headcounts? If not, you run a significant risk of failing to achieve anticipated ROI for this project. Do you have a robust corporate communication framework? Do you have the resources, skills and experience to develop and run training programs in house?

Let’s face it: experience matters. If you’re already struggling to recover from a technology black swan, you are at considerable risk for reproducing the same level of failure if you don’t undertake a radical overhaul of your approach by identifying and addressing every significant weakness in the areas noted above.

We have developed a project readiness assessment model that can help you understand your risks and develop an action plan for addressing them before you undertake anything as mission critical as an ERP replacement, CRM implementation,  legacy modernization or other mission critical technology project. If you have a big project on your radar (or already underway), contact to schedule a pre-implementation readiness assessment.

Electronic Medical Records ≠ Accurate Data

As our healthcare systems race to implement Electronic Medical Records or EMRs, the amount of data that will be available and accessible for a single patient is about to explode.  “As genetic and genomic information becomes more readily available, we soon may have up to 1,000 health facts available for each particular patient,” notes Patrick Soon-Shiong, executive director of the UCLA Wireless Health Institute and executive chairman of Abraxis BioScience, Inc., a Los Angeles-based biotech firm dedicated to delivering therapeutics and technologies that treat cancer and other illnesses.  The challenge is clear: how can a healthcare organization manage the accuracy of 1,000 health facts?

As the volume of individual data elements expands to encompass 1,000 health facts per patient, there is an urgent need for electronic tools to manage the quality, timeliness and origination of those data.  One key example is simply making sure that each patient has a unique identifier with which to attach and connect the individual health facts.  This may seem like a mundane detail, but it is absolutely critical to uniquely identify and unambiguously associate each key health fact with the right patient, at the right time.  Whenever patients are admitted to a health system, they are typically assigned a unique medical record number that both clinicians and staff use to identify, track, and cross-reference their records.  Ideally, every patient receives a single, unique identifier.  Reality, however, tells a different story, because many patients wind up incorrectly possessing multiple medical record numbers,  while others wind up incorrectly sharing the same identifier.

These errors, known respectively as master person index (MPI) duplicates and overlays, can cause physicians and other caregivers to unknowingly make treatment decisions based on incomplete or inaccurate data, posing a serious risk to patient safety.  Thus, it is no wonder that improving the accuracy of patient identification repeatedly heads The Joint Commission’s national patient safety goals list on an annual basis.

Assembling an accurate, complete, longitudinal view of a patient’s record is comparable to assembling a giant jigsaw puzzle.  Pieces of that puzzle are scattered widely across the individual systems and points of patient contact within a complex web of hospitals, outpatient clinics, and physician offices.  Moreover, accurately linking them to their rightful owner requires the consolidation and correction of the aforementioned MPI errors.  To accomplish this task, every hospital nationwide must either implement an MPI solution directly, hire a third party to clean up “dirty” MPI and related data, or implement some other reliable and verifiable approach.  Otherwise, these fundamental uncertainties will continue to hamper the effective and efficient delivery of the core clinical services of the extended health system.

Unfortunately, this issue doesn’t simply require a one-time clean-up job for most healthcare systems.  The challenge of maintaining the data integrity of the MPI has just begun.  That’s because neither an identity resolution solution, nor an MPI software technology, nor a one-time clean-up will address the root causes of these MPI errors on their own.  In a great majority of cases, more fundamental issues underlie the MPI data issue, such as flawed registration procedures; inadequate or poorly trained staff; naming conventions that vary from one operational setting or culture to another; widespread use of nicknames; and even confusion caused by name changes due to marriages and divorces – or simple misspelling.

To address these challenges, institutions must combine both an MPI technology solution, which includes human intervention, and the reengineering of patient registration processes or other points of contact where patient demographics are captured or updated.  Unless these two elements are in place, providers’ ability to improve patient safety and quality of care will be impaired because the foundation underpinning the MPI will slowly deteriorate.

Another solution is the use of data profiling software tools.  These tools allow the identification of common patterns of data errors, including erroneous data entry, to focus and drive needed revisions or other improvements in business processes.  Effective data profiling tools can run automatically using business rules to focus on the exceptions of inaccurate data that need to be addressed.  As the number of individual health facts increases for each patient, the need for automating data accuracy will continue to grow, and the extended health system will need to address these issues.

When healthcare providers make critical patient care decisions, they need to have confidence in the accuracy and integrity of the electronic data.  Instead of a physician or nurse having to assemble and scan dozens of electronic patient records in order to catch a medication error or an overlooked allergy, these data profiling tools can scan thousands of records, apply business rules to identify the critical data inaccuracies, including missing or incomplete data elements, and notify the right people to take action to correct them.

The time has come in the age of computer-based medical records that electronic data accuracy is now a key element in patient safety; as critical as data completeness.  What better way to manage data accuracy than with smart electronic tools for data profiling?  Who knows?  The life you save or improve may be your own.

How does a data-driven healthcare organization work?

As the pressure increases for accountability and transparency for healthcare organizations, the spotlight is squarely on data: how does the organization gather, validate, store and report it.  In addition, the increasing level of regulatory reporting is driving home a need for certifying data – applying rigor and measurement to its quality, audit, and lineage.  As a result, a healthcare organization must develop an Enterprise Information Management approach that zeros in on treating data as a strategic asset.  While treating data as an asset would seem to be obvious given the level of IT systems necessary to run a typical healthcare organization, the explosion of digital data collected and types of digital data (i.e. video, digital photos, audio files) has overwhelmed the ability to locate, analyze and organize it.

A typical example of this problem comes when an organization decides to implement Business Intelligence or performance indicators with an electronic dashboard.  There are many challenges in linking data sources to corporate performance measures.  When the same data element exists in multiple places, i.e. patient IDs, encounter events, then there must be a decision about the authoritative source or “single version of the truth.” Then there is the infamous data collision problem: Americans move around and organizations end up with multiple addresses for what appears to be the same person, or worse yet, multiple lists of prescribed medications that don’t match.  The need to reconcile data discrepancies requires returning to the original source of information – the patient to bring it to a current status.  Each of us can relate to filling out the form on the clipboard in the doctor’s office multiple times.  Finally, there is the problem of sparseness – we have part of the data for tracking performance but we don’t have enough for the calculation.  This problem can go on and on, but it boils down to having the right data, at the right time and using it in the right manner.

Wouldn’t the solution simply be to create an Enterprise Data Warehouse or Operational Data Store that has all of the cleansed, de-duplicated, latest data elements in it?  Certainly!  Big IF coming up: IF your organization has data governance to establish a framework for audit-ability of data; IF your organization can successfully map source application systems to the target enterprise store; IF your organization can establish master data management for all the key reference tables; IF your organization can agree on standard terminologies, and most importantly, IF you can convince every employee that creates data that quality matters, not just today but always.

One solution is to understand a key idea that made personal computers a success – build an abstraction layer.  The operating system of a personal computer established flexibility by hiding the complexity of different hardware items from the casual user through a hardware abstraction layer that most of us think of as drivers.  A video driver, a CD driver, USB driver allows the modularity and allows flexibility to adapt the usefulness of the PC.  The same principle applies to data-driven healthcare organizations.  Most healthcare applications try to tout their ability to be the data warehouse solution.  However, the need for the application to improve over time introduces change and version control issues, thus instability in the enterprise data warehouse.  In response, moving the data into an enterprise data warehouse creates the abstraction layer and the extract, transform and load (ETL) process can act like the drivers in the PC example.  Then as the healthcare applications move through time, they do not disrupt the Enterprise Data Warehouse, its related data marts and, most importantly, the performance management systems that run the business.  It is not always necessary to move the data in order to create the abstraction layer, but there are other benefits to that approach including the retirement of legacy applications.

In summary, a strong data-driven healthcare organization has to train and communicate the importance of data as a support for performance management and get the buy-in from the moment of data acquisition through the entire lifecycle of that key data element.  The pay-offs are big: revenue optimization, risk mitigation and elimination of redundant costs.  When a healthcare organization focuses on treating data as a strategic asset, then it changes the outcome for everyone in the organization, and restores trust and reliability for making key decisions.

Workbenches, Lenses and Decisions, Oh My! A Data Quality Software Assessment


In a recent survey conducted by The Information Difference, the top three domains requiring data quality initiatives included the product domain, the financial domain, and the name/address domain.  This was surprising since most of the data quality vendors offer name and address matching features; however, few offer product specific features and even fewer offer a financial based set of features. The survey included twenty-seven questions that ranged from a ranking of organizational data quality estimates to data quality implementation specifics.   The survey contains thorough analysis spanning the data quality paradigm. One of the more telling questions in the survey was in reference to the vendor/tool selected by the organizations implementing data quality solutions.

After reading the response summary, it was clear that there was not a predominant choice.  As the survey points out, this could be a consequence of the rather large number of data quality tools available on the market. With so many data quality options, could it be that the data quality market has become so saturated that the difference between offerings has become obscured?

With this in mind, I have put together an assessment that analyzes how the features of two leading vendor offerings, Informatica and Oracle, address data quality issues in the enterprise.  The specific products involved are Informatica’s Data Quality Workbench and Oracle’s DataLens®.  While this assessment is limited in scope, it does correlate with two of the most popular data domains; product and name/address. 

The Informatica Data Quality Workbench

The Informatica data quality product offering includes two products, Data Explorer and Data Quality Workbench; however, for the purposes of this assessment, only the Data Quality Workbench will be reviewed.  

The reason for this is that Data Explorer is primarily a profiling tool which provides insight to what data requires attention, whereas Data Quality Workbench is the tool that performs many of the quality enhancements. 

The Data Quality Workbench contains many features that enable the data quality analyst to enrich data; however, chief among these are the address validation and matching components.  

The address validation component utilizes a service provided by AddressDoctor®, a leader in global address validation.  This service validates addresses fed into the component in multiple ways such as street level, geocoding, and delivery point validation via their reference database which currently covers 240 countries and territories.  As a result, non-deliverable addresses are verified or corrected, increasing the success of operational initiatives such as sales, marketing, and customer service.  

 In addition to the address components, there are also match components available designed to compare various types of strings such as numeric, character and variable character based. 

The tool generates a score representing the degree that the two strings are similar.  The higher the match score, the greater likelihood that the two strings are a match. Potential matches are grouped enabling manual or automated evaluation in the nomination of a master transaction.

Oracle DataLens®

Formerly from Silver Creek Systems, DataLens® is a data quality engine built specifically for product integration and master data management.  Using semantic technology, DataLens is able to identify and correct errant product descriptions regardless of how the information is presented. This distinguishes it from most data cleansing products.

Based on specific contexts, such as manufacturing or pharmaceutical, DataLens® can recognize the meaning of values regardless of word order, spelling deviations or punctuation. DataLens® also enables on-the-fly classifications, such as Federal Supply Class, and language conversion abilities from any language to any language.

Oracle’s long term vision for DataLens® is a seamless integration with Oracle’s Product Management Hub which will allows organizations to centralize the management of product information from various sources.  This collaborative relationship will allow organizations to evaluate and, if necessary, standardize product descriptions as part of an enterprise data management and migration effort.

The Assessment

Now that we’ve covered the basics of these products, what conclusions can we draw?  Considering the native technologies built into each of these products, it is reasonable to conclude that there is little overlap between the two.  While both these products are excellent data quality tools, they are meant to address two distinct data quality domains.

With its address validation technology, Data Quality Workbench is primed for customer data integration (CDI), while DataLens’ imminent integration with Oracle’s Product Management Hub makes it a compelling choice for product information management (PIM).

Customer Data Integration (CDI)

CDI benefits organizations both large and small by enabling a “single view of the customer” and typically relies on name and address coupling in order to identify potential duplicate customer data. CDI is often associated with direct marketing campaigns, but also provide benefits in billing operations and customer service operations.

Informatica’s Data Quality Workbench is an appropriate selection for an organization looking to achieve any of the following objectives:

  1. Eliminate direct marketing mailings to undeliverable addresses
  2. Eliminate multiple direct marketing mailings to the same customer
  3. Eliminate multiple direct marketing mailings to the same household
  4. Eliminate erroneous billing activities due to customer/client duplication
  5. Eliminate erroneous billing activities due to undeliverable addresses
  6. Increase customer satisfaction by eliminating confusion caused by duplicate customer data
  7. Decrease resolution time for customer service incidents by eliminating duplicate customer data

Product Information Management (PIM)

PIM initiatives benefit organizations with multiple product lines and distributed order fulfillment operations.  They are frequently associated with supply chain operations in an effort to reduce product data variability and stream-line product order fulfillment. PIM projects are rooted in data governance and rely on external reference data and business process vigor to implement.

Oracle’s DataLens is an appropriate selection for an organization looking to achieve any of the following objectives:

  1. Eliminate erroneous order fulfillment activities caused by stale or variant product information
  2. Eliminate incorrect billing due to discrepencies in product data
  3. Eliminate under utilization of warehouse inventory due to confusion on availability of product
  4. Eliminate confusion and delays at customs due to discrepencies in product weights and descriptions
  5. Eliminate reconciliation exercises associated with the remediation of product data
  6. Increase cross-sell for customers via aligned data on product usage
  7. Decrease errors resulting from poor data entry accuracy

Just as no two data quality projects are the same, neither are data quality software products. So while Oracle’s DataLens and Informatica’s Data Quality Workbench are both classified under the data quality software umbrella, they are so different in design and implementation that they cannot be thought of as interchangeable. Each tool enables the execution of information quality in data domains so distinct that it is important to understand this context prior to the investment of purchasing such a tool.

This further supports the need to perform an assessment of tool features aligned to the business need in the project planning phase in order to ensure full capitalization of the investment in the data quality initiative.

Driving Value from Your Healthcare Analytics Program –Key Program Components

If you are a healthcare provider or payer organization contemplating an initial implementation of a Business Intelligence (BI) Analytics system, there are several areas to keep in mind as you plan your program.  The following key components appear in every successful BI Analytics program.  And the sooner you can bring focus and attention to these critical areas, the sooner you will improve your own chances for success.

Key Program Components

Last time we reviewed the primary, top-level technical building blocks.  However, the technical components are not the starting point for these solutions.  Technical form must follow business function.  The technical components come to life only when the primary mission and drivers of the specific enterprise are well understood.  And these must be further developed into a program for defining, designing, implementing and evangelizing the needs and capabilities of BI and related analytics tuned to the particular needs and readiness of the organization.

Key areas that require careful attention in every implementation include the following:

We have found that healthcare organizations (and solution vendors!) have contrasting opinions on how best to align the operational data store (ODS) and enterprise data warehouse (EDW) portions of their strategy with the needs of their key stakeholders and constituencies.  The “supply-driven” approach encourages a broad-based uptake of virtually all data that originates from one or more authoritative source system, without any real pre-qualification of the usefulness of that information for a particular purpose.  This is the hope-laden “build it and they will come” strategy.  Conversely, the “demand-driven” approach encourages a particular focus on analytic objectives and scope, and uses this focus to concentrate the initial data uptake to satisfy a defined set of analytic subject areas and contexts.  The challenge here is to not so narrowly focus the incoming data stream that it limits related exploratory analysis.

For example, a supply-driven initiative might choose to tap into an existing enterprise application integration (EAI) bus and siphon all published HL7 messages into the EDW or ODS data collection pipe.  The proponents might reason that if these messages are being published on an enterprise bus, they should be generally useful; and if they are reasonably compliant with the HL7 RIM, their integration should be relatively straightforward.  However, their usefulness for a particular analytic purpose would still need to be investigated separately.

Conversely, a demand-driven project might start with a required set of representative analytic question instances or archetypes, and drive the data sourcing effort backward toward the potentially diverging points of origin within the business operations.  For example, a surgical analytics platform to discern patterns between or among surgical cost components, OR schedule adherence, outcomes variability, payer mix, or the impact of specific material choices would depend on specific data elements that might originate from potentially disparate locations and settings.  The need here is to ensure that the data sets required to support the specific identified analyses are covered; but the collection strategy should not be so exclusive that it prevents exploration of unanticipated inquiries or analyses.

I’ll have a future blog topic on a methodology we have used successfully to progressively decompose, elaborate and refine stakeholder analytic needs into the data architecture needed to support them.

In many cases, a key objective for implementing healthcare analytics will be to bring focus to specific areas of enterprise operations: to drive improvements in quality, performance or outcomes; to drive down costs of service delivery; or to increase resource efficiency, productivity or throughput, while maintaining quality, cost and compliance.  A common element in all of these is a focus on process.  You must identify the specific processes (or workflows) that you wish to measure and monitor.  Any given process, however simple or complex, will have a finite number of “pulse points,” any one of which will provide a natural locus for control or analysis to inform decision makers about the state of operations and progress toward measured objectives or targets.  These loci become the raw data collection points, where the primary data elements and observations (and accompanying meta-data) are captured for downstream transformation and consumption.

For example, if a health system is trying to gain insight into opportunities for flexible scheduling of OR suites and surgical teams, the base level data collection must probe into the start and stop times for each segment in the “setup and teardown” of a surgical case, and all the resource types and instances needed to support those processes.  Each individual process segment (i.e. OR ready/busy, patient in/out, anesthesia start/end, surgeon in/out, cut/close, PACU in/out, etc.) has distinct control loci the measurement of which comprises the foundational data on which such analyses must be built.  You won’t gain visibility into optimization opportunities if you don’t measure the primary processes at sufficient granularity to facilitate inquiry and action.

Each pulse point reveals a critical success component in the overall operation.  Management must decide how each process will be measured, and how the specific data to be captured will enable both visibility and action.  Visibility that the specific critical process elements being performed are within tolerance and on target; or that they are deviating from a standard or plan and require corrective action.  And the information must both enable and facilitate focused action that will bring performance and outcomes back into compliance with the desired or required standards or objectives.

A key aspect of metric design is defining the needed granularity and dimensionality.  The former ensures the proper focus and resolution on the action needed.  The latter facilitates traceability and exploration into the contexts in which performance and quality issues arise.  If any measured areas under-perform, the granularity and dimensionality will provide a focus for appropriate corrective actions.  If they achieve superior performance, they can be studied and characterized for possible designation as best practices.

For example, how does a surgical services line that does 2500 total knees penetrate this monolithic volume and differentiate these cases in a way that enables usable insights and focused action?  The short answer is to characterize each instance to enable flexible-but-usable segmentation (and sub-segmentation); and when a segment of interest is identified (under-performing; over-performing; or some other pattern), the n-tuple of categorical attributes that was used to establish the segment becomes a roadmap defining the context and setting for the action: either corrective action (i.e. for deviation from standard) or reinforcing action (i.e. for characterizing best practices).  So, dimensions of surgical team, facility, care setting, procedure, implant type and model, supplier, starting ordinal position, day of week, and many others can be part of your surgical analytics metrics design.

Each metric must ultimately be deconstructed into the specific raw data elements, observations and quantities (and units) that are needed to support the computation of the corresponding metric.  This includes the definition, granularity and dimensionality of each data element; its point of origin in the operation and its position within the process to be measured; the required frequency for its capture and timeliness for its delivery; and the constraints on acceptable values or other quality standards to ensure that the data will reflect accurately the state of the operation or process, and will enable (and ideally facilitate) a focused response once its meaning is understood.

An interesting consideration is how to choose the source for a collected data element, when multiple legitimate sources exist (this issue spills over into data governance (see below); and what rules are needed to arbitrate such conflicts.  Arbitration can be based on: whether each source is legitimately designated as authoritative; where each conflicting (or overlapping) data element (and its contents) resides in a life cycle that impacts its usability; what access controls or proprietary rights pertain to the specific instance of data consumption; and the purpose for or context in which the data element is obtained.  Resolving these conflicts is not always as simple as designating a single authoritative source.

Controlling data quality at its source is essential.  All downstream consumers and transformation operations are critically dependent on the quality of each data element at its point of origin or introduction into the data stream.  Data cleansing becomes much more problematic if it occurs downstream of the authoritative source, during subsequent data transformation or data presentation operations.  Doing so effectively allows data to “originate” at virtually any position in the data stream, making traceability and quality tracking more difficult, and increasing the burden of retaining the data that originates at the various points to the quality standard.  On the other hand, downstream consumers may have little or no influence or authority to impose the data cleansing or capture constraints on those who actually collect the data.

Organizations are often unreceptive to the suggestion that their data may have quality issues.  “The data’s good.  It has to be; we run the business on it!”  Although this might be true, when you remove data from its primary operating context, and attempt to use it for different purposes such as aggregation, segmentation, forecasting and integrated analytics, problems with data quality rise to the surface and become visible.

Elements of data quality include: accuracy; integrity; timeliness; timing and dynamics; clear semantics; rules for capture; transformation; and distribution.  Your strategy must include establishing and then enforcing definitions, measures, policies and procedures to ensure that your data is meeting the necessary quality standards. 

The data architecture must anticipate the structure and relationships of the primary data elements, including the required granularity, dimensionality, and alignment with other identifying or describing elements (e.g. master and reference data); and the nature and positioning of the transformation and consumption patterns within the various user bases.

For example, to analyze the range in variation of maintaining schedule integrity in our surgical services example, for each case we must capture micro-architectural elements such as the scheduled and actual start and end times for each critical participant and resource type (e.g. surgeon, anesthesiologist, patient, technician, facility, room, schedule block, equipment, supplies, medications, prior and following case, etc.), each of which becomes a dimension in the hierarchical analytic contexts that will reveal and help to characterize where under-performance or over-performance are occurring.  The corresponding macro-architectural components will address requirements such as scalability, distinction between retrieval and occurrence latency, data volumes, data lineage, and data delivery.

By the way: none of this presumes a “daily batch” system.  Your data architecture might need to anticipate and accommodate complex hybrid models for federating and staging incremental data sets to resolve unavoidable differences in arrival dynamics, granularity, dimensionality, key alignment, or perishability.  I’ll have another blog on this topic, separately.

You should definitely anticipate that the incorporation and integration of additional subject areas and data sets will increase the value of the data; in many instances, far beyond that for which it was originally collected.  As the awareness and use of this resource begins to grow, both the value and sensitivity attributed to these data will increase commensurately.  The primary purpose of data governance is to ensure that the highest quality data assets obtained from all relevant sources are available to all consumers who need them, after all the necessary controls have been put in place.

Key components of an effective strategy are the recognition of data as an enterprise asset; the designation of authoritative sources; commitment to data quality standards and processes; recognition of data proceeding through a life cycle of origination, transformation and distribution, with varying degrees of ownership, stewardship and guardianship, on its way to various consumers for various purposes.  Specific characteristics such as the level of aggregation; the degree of protection required (e.g. PHI); the need for de-identification and re-identification; the designation of “snapshots” and “versions” of data sets; and the constraints imposed by proprietary rights. These will all impact the policies and governance structures needed to ensure proper usage of this critical asset.

Are you positioned for success?

Successful implementation of BI analytics requires more than a careful selection of technology platforms, tools and applications.  The selection of technical components will ideally follow the definition of the organizations needs for these capabilities.  The program components outlined here are a good start on the journey to embedded analytics, proactively driving the desired improvement throughout your enterprise.

Data Profiling: The BI Grail

In Healthcare analytics, as in analytics for virtually all other businesses, the landscape facing the Operations, Finance, Clinical, and other organizations within the enterprise is almost always populated by a rich variety of systems which are prospective sources for decision support analysis.   I propose that we insert into the discussion some ideas about the inarguable value of, first, data profiling, and second, a proactive data quality effort as part of any such undertaking.

Whether done from the ground up or when the scope of an already successful initial project is envisioned to expand significantly, all data integration/warehousing/business intelligence efforts benefit from the proper application of these disciplines and the actions taken based upon their findings, early, often, and as aggressively as possible.

I like to say sometimes that in data-centric applications, the framework and mechanisms which comprise a solution are actually even more abstract in some respects than traditional OLTP applications because, up to the point at which a dashboard or report is consumed by a user, the entire application virtually IS the data, sans bells, whistles, and widgets which are the more “material” aspects of GUI/OLTP development efforts:

  • Data entry applications, forms, websites, etc. all exist generally outside the reach of the project being undertaken.
  • Many assertions and assumptions are usually made about the quality of that data.
  • Many, if not most, of those turn out not to be true, or at least not entirely accurate, despite the very earnest efforts of all involved.

What this means in terms of risk to the project cannot be overstated.   Because it is largely unknown in most instances it obviously can neither be qualified nor quantified.   It often turns what seems, on the face of it, to be a relatively simple “build machine X” with gear A, chain B, and axle C project into “build machine X” with gear A (with missing teeth), chain B (not missing any links but definitely rusty and needing some polishing), and axle C (which turns out not to even exist though it is much discussed, maligned, or even praised depending upon who is in the room and how big the company is).

Enter The Grail.   If there is a Grail in data integration and business intelligence, it may well be data profiling and quality management, on its own or as a precursor to true Master Data Management (if that hasn’t already become a forbidden term for your organization due to past failed tries at it).

Data Profiling gives us a pre-emptive strike against our preconceived notions about the quality and content of our data.   It gives us not only quantifiable metrics by which to measure and modify our judgement of the task before us, but frequently results in various business units spinning off immediately into the scramble to improve upon what they honestly did not realize was so flawed.

Data Quality efforts, following comprehensive profiling and any proactive quality correction which is possible, give a project the possibility of fixing problems without changing source systems per se, but before the business intelligence solution becomes either a burned out husk on the side of the EPM highway (failed because of poor data), or at the least a de facto data profiling tool in its own right, by coughing out whatever data doesn’t work instead of serving its intended purpose- to deliver key business performance information based upon a solid data foundation in which all have confidence.

The return on investment for such an effort is measurable, sustainable, and so compelling as an argument that no serious BI undertaking, large or small, should go forward without it.   Whether in Healthcare, Financial Services, Manufacturing, or another vertical,  its value is, I submit, inarguable.

Data Darwinism – Evolving your data environment

In my previous posts, the concept of Data Darwinism was introduced, as well as the types of capabilities that allow a company to set itself apart from its competition.   Data Darwinism is the practice of using an organization’s data to survive, adapt, compete and innovate in a constantly changing and increasingly competitive business environment.   If you take an honest and objective look at how and why you are using data, you might find out that you are on the wrong side of the equation.  So the question is “how do I move up the food chain?”

The goal of evolving your data environment is to change from using your data in a reactionary manner and just trying to survive, to proactively using your data as a foundational component to constantly innovate to create a competitive advantage.

The plan is simple on the surface, but not always so easy in execution.   It requires an objective assessment of where you are compared to where you need to be, a plan/blueprint/roadmap to get from here to there, and flexible, iterative execution.


As mentioned before, taking an objective look at where you are compared to where you need to be is the first critical step.  This is often an interesting conversation among different parts of the organization that have competing interests and objectives. Many organizations can’t get past this first step. People get caught up in politics and self-interest and lose sight of the goal; to move the organization forward into a competitive advantage situation. Other organizations don’t have the in-house expertise or discipline to conduct the assessment. However, until this can be done, you remain vulnerable to other organizations that have moved past this step.


Great, now you’ve done the assessment, you know what your situation is and what your strengths and weaknesses are.  Without a roadmap of how to get to your data utopia, you’re going nowhere.   The roadmap is really a blueprint of inter-related capabilities that need to be implemented incrementally over time to constantly move the organization forward.   Now, I’ve seen this step end very badly for organizations that make some fundamental mistakes.  They try to do too much at once.  They make the roadmap too rigid to adapt to changing business needs.   They take a form over substance approach.  All these can be fatal to an organization.   They key to the roadmap is three-fold:

  • Flexible – This is not a sprint.   Evolving your data environment takes time.   Your business priorities will change, the external environment in which you operate will change, etc.   The roadmap needs to be flexible enough to enable it to adapt to these types of challenges.
  • – There will be an impulse to move quickly and do everything at once.   That almost never works.   It is important to align the priorities with the overall priorities of the organization.
  • Realistic – Just as you had to take an objective, and possibly painful, look at where you were with respect to your data, you have to take a similar look at what can be done given any number of constraints all organizations face.   Funding, people, discipline, etc. are all factors that need to be considered when developing the roadmap.   In some cases, you might not have the internal skill sets necessary and have to leverage outside talent.   In other cases, you will have to implement new processes, organizational constructs and enabling technologies to enable the movement to a new level.  

Execute Iteratively

The capabilities you need to implement will build upon each other and it will take time for the organization to adapt to the changes.   Taking an iterative approach that focuses on building capabilities based on the organization’s business priorities will greatly increase your chance of success.  It also gives you a chance to evaluate the capabilities to see if they are working as anticipated and generating the expected returns.   Since you are taking an iterative approach, you have the opportunity to make the necessary changes to continue moving forward.

The path to innovation is not always an easy one.   It requires a solid, yet flexible, plan to get there and persistence to overcome the obstacles that you will encounter.   However, in the end, it’s a journey well worth the effort.

Data Darwinism – Capabilities that provide a competitive advantage

In my previous post, I introduced the concept of Data Darwinism, which states that for a company to be the ‘king of the jungle’ (and remain so), they need to have the ability to continually innovate.   Let’s be clear, though.   Innovation must be aligned with the strategic goals and objectives of the company.   The landscape is littered with examples of innovative ideas that didn’t have a market.  

So that begs the question “What are the behaviors and characteristics of companies that are at the top of the food chain?”    The answer to that question can go in many different directions.   With respect to Data Darwinism, the following hierarchy illustrates the categories of capabilities that an organization needs to demonstrate to truly become a dominant force.


The impulse will be for an organization to want to immediately jump to implementing capabilities that they think will allow them to be at the top of the pyramid.   And while this is possible to a certain extent, you must put in place certain foundational capabilities to have a sustainable model.     Examples of capabilities at this level include data integration, data standardization, data quality, and basic reporting.

Without clean, integrated, accurate data that is aligned with the intended business goals, the ability to implement the more advanced capabilities is severely limited.    This does not mean that all foundational capabilities must be implemented before moving on to the next level.  Quite the opposite actually.   You must balance the need for the foundational components with the return that the more advanced capabilities will enable.


Transitional capabilities are those that allow an organization to move from silo’d, isolated, often duplicative efforts to a more ‘centralized’ platform in which to leverage their data.    Capabilities at this level of the hierarchy start to migrate towards an enterprise view of data and include such things as a more complete, integrated data set, increased collaboration, basic analytics and ‘coordinated governance’.

Again, you don’t need to fully instantiate the capabilities at this level before building capabilities at the next level.   It continues to be a balancing act.


Transformational capabilities are those that allow the company to start to truly differentiate themselves from their competition.   It doesn’t fully deliver the innovative capabilities that set them head and shoulders above other companies, but rather sets the stage for such.   This stage can be challenging for organizations as it can require a significant change in mind-set compared to the current way its conducts its operations.   Capabilities at this level of the hierarchy include more advanced analytical capabilities (such as true data mining), targeted access to data by users, and ‘managed governance’.


Innovative capabilities are those that truly set a company apart from its competitors.   They allow for innovative product offerings, unique methods of handling the customer experience and new ways in which to conduct business operations.   Amazon is a great example of this.   Their ability to customize the user experience and offer ‘recommendations’ based on a wealth of user buying  trend data has set them apart from most other online retailers.    Capabilities at this level of the hierarchy include predictive analytics, enterprise governance and user self-service access to data.

The bottom line is that moving up the hierarchy requires vision, discipline and a pragmatic approach.   The journey is not always an easy one, but the rewards more than justify the effort.

Check back for the next installment of this series “Data Darwinism – Evolving Your Data Environment.”