Data Profiling: The BI Grail

In Healthcare analytics, as in analytics for virtually all other businesses, the landscape facing the Operations, Finance, Clinical, and other organizations within the enterprise is almost always populated by a rich variety of systems which are prospective sources for decision support analysis.   I propose that we insert into the discussion some ideas about the inarguable value of, first, data profiling, and second, a proactive data quality effort as part of any such undertaking.

Whether done from the ground up or when the scope of an already successful initial project is envisioned to expand significantly, all data integration/warehousing/business intelligence efforts benefit from the proper application of these disciplines and the actions taken based upon their findings, early, often, and as aggressively as possible.

I like to say sometimes that in data-centric applications, the framework and mechanisms which comprise a solution are actually even more abstract in some respects than traditional OLTP applications because, up to the point at which a dashboard or report is consumed by a user, the entire application virtually IS the data, sans bells, whistles, and widgets which are the more “material” aspects of GUI/OLTP development efforts:

  • Data entry applications, forms, websites, etc. all exist generally outside the reach of the project being undertaken.
  • Many assertions and assumptions are usually made about the quality of that data.
  • Many, if not most, of those turn out not to be true, or at least not entirely accurate, despite the very earnest efforts of all involved.

What this means in terms of risk to the project cannot be overstated.   Because it is largely unknown in most instances it obviously can neither be qualified nor quantified.   It often turns what seems, on the face of it, to be a relatively simple “build machine X” with gear A, chain B, and axle C project into “build machine X” with gear A (with missing teeth), chain B (not missing any links but definitely rusty and needing some polishing), and axle C (which turns out not to even exist though it is much discussed, maligned, or even praised depending upon who is in the room and how big the company is).

Enter The Grail.   If there is a Grail in data integration and business intelligence, it may well be data profiling and quality management, on its own or as a precursor to true Master Data Management (if that hasn’t already become a forbidden term for your organization due to past failed tries at it).

Data Profiling gives us a pre-emptive strike against our preconceived notions about the quality and content of our data.   It gives us not only quantifiable metrics by which to measure and modify our judgement of the task before us, but frequently results in various business units spinning off immediately into the scramble to improve upon what they honestly did not realize was so flawed.

Data Quality efforts, following comprehensive profiling and any proactive quality correction which is possible, give a project the possibility of fixing problems without changing source systems per se, but before the business intelligence solution becomes either a burned out husk on the side of the EPM highway (failed because of poor data), or at the least a de facto data profiling tool in its own right, by coughing out whatever data doesn’t work instead of serving its intended purpose- to deliver key business performance information based upon a solid data foundation in which all have confidence.

The return on investment for such an effort is measurable, sustainable, and so compelling as an argument that no serious BI undertaking, large or small, should go forward without it.   Whether in Healthcare, Financial Services, Manufacturing, or another vertical,  its value is, I submit, inarguable.

Physicians Insist, Leave No Data Behind

“I want it all.” This sentiment is shared by nearly all of the clinicians we’ve met with, from the largest integrated health systems (IHS) to the smallest physician practices, in reference to what data they want access to once an aggregation solution like a data warehouse is implemented.  From discussions with organizations throughout the country and across care settings, we understand a problem that plagues many of these solutions: the disparity between what clinical users would like and what technical support staff can provide.

For instance, when building a Surgical Data Mart, an IHS can collect standard patient demographics from a number of its transactional systems.  When asked, “which ‘patient weight’ would you like to keep, the one from your OR system (Picis), your registration system (HBOC) or your EMR (Epic)?” and sure enough, the doctors will respond, “all 3”. Unfortunately, the doctors often do not consider the cost and effort associated with providing three versions of the same data element to end consumers before answering, “I want it all”.  And therein lies our theory for accommodating this request: Leave No Data Behind. In support of this principle, we are not alone.

By now you’ve all heard that Microsoft is making a play in healthcare with its Amalga platform. MS will continue its strategy of integrating expertise through acquisition and so far, it seems to be working. MS claims an advantage of Amalga is its ability to store and manage an infinite amount of data associated with a patient encounter, across care settings and over time, for a truly horizontal and vertical view of the patient experience. Simply put, No Data Left Behind.  The other major players (GE, Siemens, Google) are shoring up their offerings through partnerships that highlight the importance of access to and management of huge volumes of clinical and patient data.

pc-with-dataWhy is the concept of No Data Left Behind important? Clinicians have stated emphatically, “we do not know what questions we’ll be expected to answer in 3-5 years, either based on new quality initiatives or regulatory compliance, and therefore we’d like all the raw and unfiltered data we can get.” Additionally, the recent popularity of using clinical dashboards and alerts (or “interventional informatics”) in clinical settings further supports this claim. While alerts can be useful and help prevent errors, decrease cost and improve quality, studies suggest that the accuracy of alerts is critical for clinician acceptance; the type of alert and its placement and integration in the clinical workflow is also very important in determining its usefulness. As mentioned above, many organizations understand the need to accommodate the “I want it all” claim, but few combine this with expertise of the aggregation, presentation, and appropriate distribution of this information for improved decision making and tangible quality, compliance, and bottom-line impacts. Fortunately, there are a few of us who’ve witnessed and collaborated with institutions to help evolve from theory to strategy to solution.

mountais-of-dataProviders must formulate a strategy to capitalize on the mountains of data that will come once the healthcare industry figures out how to integrate technology across its outdated, paper-laden landscape.  Producers and payers must implement the proper technology and processes to consume this data via enterprise performance management front-ends so that the entire value chain becomes more seamless. The emphasis on data presentation (think BI, alerting, and predictive analytics) continues to dominate the headlines and budget requests. Healthcare institutions, though, understand these kinds of advanced analytics require the appropriate clinical and technical expertise for implementation. Organizations, now more than ever, are embarking on this journey. We’ve had the opportunity to help overcome the challenges of siloed systems, latent data, and an incomplete view of the patient experience to help institutions realize the promise of an EMR, the benefits of integrated data sets, and the decision making power of consolidated, timely reporting. None of these initiatives will be successful, though, with incomplete data sets; a successful enterprise data strategy, therefore, always embraces the principle of “No Data Left Behind”.