Are you Paralyzed by a Hoard of Big Data?

Lured by the promise of big data benefits, many organizations are leveraging cheap storage to hoard vast amounts of structured and unstructured data. Without a clear framework for big data governance and use, businesses run the risk of becoming paralyzed under an unorganized jumble of data, much of which has become stale and past its expiration date. Stale data is toxic to your business – it could lead you into taking the wrong action based on data that is no longer relevant.

You know there’s valuable stuff in there, but the thought of wading through all THAT to find it stops you dead in your tracks.  There goes your goal of business process improvement, which according to a recent Informatica survey, most businesses cite as their number one Big Data Initiative goal.

Just as the individual hoarder often requires a professional organizer to help them pare the hoard and institute acquisition and retention rules for preventing hoard-induced paralysis in the future, organizations should seek outside help when they find themselves unable to turn their data hoard into actionable information.

An effective big data strategy needs to include the following components:

  1. An appropriate toolset for analyzing big data and making it actionable by the right people. Avoid building an ivory tower big data bureaucracy, and remember, insight has to turn into action.
  2. A clear and flexible framework, such as social master data management, for integrating big data with enterprise applications, one that can quickly leverage new sources of information about your customers and your market.
  3. Information lifecycle management rules and practices, so that insight and action will be taken based on relevant, as opposed to stale  information.
  4. Consideration of how the enterprise application portfolio might need to be refined to maximize the availability and relevance of big data. In today’s world, that will involve grappling with the flow of information between cloud and internally hosted applications as well.
  5. Comprehensive data security framework that defines who is entitled to use the data, change the data and delete the data, as well as encryption requirements as well as any required upgrades in network security.

Get the picture? Your big data strategy isn’t just a data strategy. It has to be a comprehensive technology-process-people strategy.

All of these elements, should of course, be considered when building your big data business case, and estimating return on investment.

Implementing Healthcare Service Lines – Inherent Data Management Challenges and How to Overcome Them

Service Line management provides the healthcare industry the ability to determine which of its diverse services are profitable and how the market share of a given service compares to competing providers.  Service Lines are typically limited to a handful of well defined, mutually exclusive categories or groupings of individual services or interventions.  For example, a provider may choose to categorize clinical transactions (encounters, ambulatory or hospital visits, episodes, longitudinal courses of care) into Service Lines such as Oncology, Cardiovascular and Orthopedics.  Since no such Service Line designation exists in standard transactional encoding systems or taxonomies, Service Line assignment are derived based on attributes such as primary diagnosis codes, procedure codes, and other patient attributes such as age, gender or genetic characteristics, to name a few.

From a data management perspective, Service Line management presents a number of interesting challenges.  To illustrate these challenges, consider a large, multi–hospital, multi-specialty, multi-care-setting health system servicing millions of patients annually as they attempt to gain a single, consistent view of Service Lines across all facilities and settings.  Then consider the typical data management obstacles that arise in any one of these settings such as poor data quality, potentially caused by inconsistent coding practices or lack of validation within and across the various source systems in use.  It turns out that for such a provider, the challenges are not insurmountable if the solution adheres to the proper architectural approach and design principles.

Here are some challenges to consider when implementing Service Lines within your healthcare organization:

  • It’s not likely that a single set of attributes will provide the flexibility, or for that matter, the consistency across an enterprise that will be needed to define a Service Line.  The approach will have to take into account that business users will almost reflexively define Service Lines on the basis of one or more complex clinical conditions, including the primary treatment modalities.  For example, Oncology = DRG codes BETWEEN 140.00 and 239.99 OR Service IN (RAD, ONC) OR Physician IN (Frankenstein, Zhivago) OR …
  • Service Lines will likely overlap on any given healthcare transaction, either within the Service Line or across Service Lines, as a consequence both of the inherently multi-disciplinary care that is delivered, and the traditional department or specialty alignment of critical staff.  A patient that has a primary diagnosis code of lung cancer may be discharged after having a lung transplant procedure by a cardiovascular or thoracic surgeon. The patient transaction is arguably a candidate for both Service Lines, but by definition, only one Service Line may prevail, depending upon the analytic objectives and context.
  • Given the often unavoidable Service Line overlap, the need to resolve conflicts within and across Service Lines exists to meet the ultimate requirement that a transaction (case, episode, etc.) is ultimately assigned to one and only one Service Line.  However, there is value in capturing all Service Lines that apply to a given transaction, and specifically giving users visibility into the reality that, given current operating definitions, overlap exists.  This both informs the end-user and enables him/her to take this into consideration appropriately for the analytic objectives at hand.
  • Various models are possible for explicitly revealing and managing these important overlaps.  For example, the reporting structure might consider a three level hierarchy that explicitly represents and manages this overlap as a multi-level model:  level 1 resolves conflicts at the health system across all Service Lines; level 2 resolves conflicts between sub-categories within a Service Line; and level 3 allows overlap and may even accommodate multiple counting in well-circumscribed and special analytic contexts.
  • Organizations in the early stages of measuring performance by Service Lines might be motivated to influence the definition of any given Service Line to extend its reach by attaching to transactions currently falling into the ‘Other Service Line’; or by providing a more granular view (e.g. categories and sub-categories) of existing Service Lines.  To support the inevitable evolution of these definitions, Service Line definitions and specifications should be implemented as adaptable business rules, capable of being changed on a frequent basis.

To address these challenges, listed below are some technical implementation techniques you may want to consider:

  • Business rules should be implemented using a data model that allows for data-driven evaluation of rules, and should never be implemented as part of static code or ETL (extract, transformation, load).  Given the volume of data that will be processed, rule evaluation should be done within the database engine as a Join operation, rather than iterating record by record.
  • Complexity of the business rules and the frequency with which they change could drive a decision to use an off-the-shelf business rule tool.  Finding a tool that meets your implementation timeline and budget will enhance the users’ experience; typically the financial planners and analysts that will create and modify the rules.  The key is to find a business rule tool that does not require rule processing to be programmatic or iterative.
  • When evaluation of any business rule applies to a specific transaction, tag the transaction explicitly.  While the ultimate goal of the rules is to assign one and only one Service Line, tagging as a first step allows rules to be evaluated in isolation of one another and provides visibility to the level 2 and level 3 Service Line assignments described above.  A tag is equivalent to a provisional Service Line designation, although the ultimate Service Line assignment cannot occur until overlap conflict is resolved.
  • With individual transactions specifically tagged, conflicts that occur can be resolved at each level of the Service Line assignment hierarchy.  To achieve the desirable level 1, 2 and 3 hierarchy behavior, conflict is resolved in terms of the competing individual precedence assigned to each of the relevant business rules.  For example, a simple business rule precedence scheme for level 1 may be to state that where Oncology and Cardiovascular overlap, Cardiovascular always takes precedence.  Other strategies and specifications for resolving such conflicts can implemented using a consistent representation.

In summary, the key requirement for Service Line implementation is adaptability.  Implementing a data-driven platform capable of evaluating the complex business rules and supporting the different behaviors of Service Line tags by explicit representation within the relevant reporting hierarchies will allow your healthcare organization to constantly evolve and gain new insights on the performance and improvement opportunities of your services lines.  In the final analysis, that’s what it’s all about.

Data Darwinism – Evolving your data environment

In my previous posts, the concept of Data Darwinism was introduced, as well as the types of capabilities that allow a company to set itself apart from its competition.   Data Darwinism is the practice of using an organization’s data to survive, adapt, compete and innovate in a constantly changing and increasingly competitive business environment.   If you take an honest and objective look at how and why you are using data, you might find out that you are on the wrong side of the equation.  So the question is “how do I move up the food chain?”

The goal of evolving your data environment is to change from using your data in a reactionary manner and just trying to survive, to proactively using your data as a foundational component to constantly innovate to create a competitive advantage.

The plan is simple on the surface, but not always so easy in execution.   It requires an objective assessment of where you are compared to where you need to be, a plan/blueprint/roadmap to get from here to there, and flexible, iterative execution.


As mentioned before, taking an objective look at where you are compared to where you need to be is the first critical step.  This is often an interesting conversation among different parts of the organization that have competing interests and objectives. Many organizations can’t get past this first step. People get caught up in politics and self-interest and lose sight of the goal; to move the organization forward into a competitive advantage situation. Other organizations don’t have the in-house expertise or discipline to conduct the assessment. However, until this can be done, you remain vulnerable to other organizations that have moved past this step.


Great, now you’ve done the assessment, you know what your situation is and what your strengths and weaknesses are.  Without a roadmap of how to get to your data utopia, you’re going nowhere.   The roadmap is really a blueprint of inter-related capabilities that need to be implemented incrementally over time to constantly move the organization forward.   Now, I’ve seen this step end very badly for organizations that make some fundamental mistakes.  They try to do too much at once.  They make the roadmap too rigid to adapt to changing business needs.   They take a form over substance approach.  All these can be fatal to an organization.   They key to the roadmap is three-fold:

  • Flexible – This is not a sprint.   Evolving your data environment takes time.   Your business priorities will change, the external environment in which you operate will change, etc.   The roadmap needs to be flexible enough to enable it to adapt to these types of challenges.
  • – There will be an impulse to move quickly and do everything at once.   That almost never works.   It is important to align the priorities with the overall priorities of the organization.
  • Realistic – Just as you had to take an objective, and possibly painful, look at where you were with respect to your data, you have to take a similar look at what can be done given any number of constraints all organizations face.   Funding, people, discipline, etc. are all factors that need to be considered when developing the roadmap.   In some cases, you might not have the internal skill sets necessary and have to leverage outside talent.   In other cases, you will have to implement new processes, organizational constructs and enabling technologies to enable the movement to a new level.  

Execute Iteratively

The capabilities you need to implement will build upon each other and it will take time for the organization to adapt to the changes.   Taking an iterative approach that focuses on building capabilities based on the organization’s business priorities will greatly increase your chance of success.  It also gives you a chance to evaluate the capabilities to see if they are working as anticipated and generating the expected returns.   Since you are taking an iterative approach, you have the opportunity to make the necessary changes to continue moving forward.

The path to innovation is not always an easy one.   It requires a solid, yet flexible, plan to get there and persistence to overcome the obstacles that you will encounter.   However, in the end, it’s a journey well worth the effort.

Data Darwinism – Capabilities that provide a competitive advantage

In my previous post, I introduced the concept of Data Darwinism, which states that for a company to be the ‘king of the jungle’ (and remain so), they need to have the ability to continually innovate.   Let’s be clear, though.   Innovation must be aligned with the strategic goals and objectives of the company.   The landscape is littered with examples of innovative ideas that didn’t have a market.  

So that begs the question “What are the behaviors and characteristics of companies that are at the top of the food chain?”    The answer to that question can go in many different directions.   With respect to Data Darwinism, the following hierarchy illustrates the categories of capabilities that an organization needs to demonstrate to truly become a dominant force.


The impulse will be for an organization to want to immediately jump to implementing capabilities that they think will allow them to be at the top of the pyramid.   And while this is possible to a certain extent, you must put in place certain foundational capabilities to have a sustainable model.     Examples of capabilities at this level include data integration, data standardization, data quality, and basic reporting.

Without clean, integrated, accurate data that is aligned with the intended business goals, the ability to implement the more advanced capabilities is severely limited.    This does not mean that all foundational capabilities must be implemented before moving on to the next level.  Quite the opposite actually.   You must balance the need for the foundational components with the return that the more advanced capabilities will enable.


Transitional capabilities are those that allow an organization to move from silo’d, isolated, often duplicative efforts to a more ‘centralized’ platform in which to leverage their data.    Capabilities at this level of the hierarchy start to migrate towards an enterprise view of data and include such things as a more complete, integrated data set, increased collaboration, basic analytics and ‘coordinated governance’.

Again, you don’t need to fully instantiate the capabilities at this level before building capabilities at the next level.   It continues to be a balancing act.


Transformational capabilities are those that allow the company to start to truly differentiate themselves from their competition.   It doesn’t fully deliver the innovative capabilities that set them head and shoulders above other companies, but rather sets the stage for such.   This stage can be challenging for organizations as it can require a significant change in mind-set compared to the current way its conducts its operations.   Capabilities at this level of the hierarchy include more advanced analytical capabilities (such as true data mining), targeted access to data by users, and ‘managed governance’.


Innovative capabilities are those that truly set a company apart from its competitors.   They allow for innovative product offerings, unique methods of handling the customer experience and new ways in which to conduct business operations.   Amazon is a great example of this.   Their ability to customize the user experience and offer ‘recommendations’ based on a wealth of user buying  trend data has set them apart from most other online retailers.    Capabilities at this level of the hierarchy include predictive analytics, enterprise governance and user self-service access to data.

The bottom line is that moving up the hierarchy requires vision, discipline and a pragmatic approach.   The journey is not always an easy one, but the rewards more than justify the effort.

Check back for the next installment of this series “Data Darwinism – Evolving Your Data Environment.”

Data Darwinism – Are you on the path to extinction?

Most people are familiar with Darwinism.  We’ve all heard the term survival of the fittest.   There is even a humorous take on the subject with the annual Darwin Awards, given to those individuals who have removed themselves from the gene pool through, shall we say, less than intelligent choices.

Businesses go through ups and downs, transformations, up-sizing/down-sizing, centralization/ decentralization, etc.   In other words, they are trying to adapt to the current and future events in order to grow.   Just as in the animal kingdom, some will survive and dominate, some will not fare as well.   In today’s challenging business environment, while many are trying to merely survive, others are prospering, growing and dominating.  

So what makes the difference between being the king of the jungle or being prey?   The ability to make the right decisions in the face of uncertainty.     This is often easier said than done.   However, at the core of making the best decisions is making sure you have the right data.   That brings us back to the topic at hand:  Data Darwinism.   Data Darwinism can be defined as:

“The practice of using an organization’s data to survive, adapt, compete and innovate in a constantly changing and increasingly competitive business environment.”

When asked to assess where they are on the Data Darwinism continuum, many companies will say that they are at the top of the food chain, that they are very fast at getting data to make decisions, that they don’t see data as a problem, etc.   However, when truly asked to objectively evaluate their situation, they often come up with a very different, and often frightening, picture. 

  It’s as simple as looking at your behavior when dealing with data:

If you find yourself exhibiting more of the behaviors on the left side of the picture above, you might be a candidate for the next Data Darwin Awards.

Check back for the next installment of this series “Data Darwinism – Capabilities that Provide a Competitive Advantage.”

To Structure Or Not To Structure: That Is The Question

To structure or not to structure: that is the question: Whether it is nobler in the mind to suffer the slings and arrows of metadata, ontology, and sixth canonical normal form.  Or to take up arms against 30 years of data structure dogma and piety and by opposing the convert to Web 2.0 search technology (and potentially ruin my career, the remaining shards anyway).  Lately, I have felt as torn as Hamlet; stay with my data heritage or end it all with radical Web 2.0 abandon.

As one who came out of the late 1970’s as a DBA (data base administrator) religiously putting flat files and hierarchical DBMSs (data base management systems, IMS specifically) to the sword, evangelizing the purity of the CODASYL model and teleprocessing systems.  Naturally, I was put to the sword in turn by Code, Date, and relational DBMSs.  Later, we fought back with object oriented databases, but being older and wiser, detente reigned.  The only good data was analyzed and structured data, fourth normal form (sixth is extreme) at a minimum.  All carefully placed in some DBMS so it could be transacted, searched, and reported. Ultimately, this drive to structured data has lead to Business Intelligence (BI, oxymoron, like military intelligence), Corporate Performance Management (CPM) and Executive Dashboards (picture the Elmo dashboard toys you strap to the baby’s crib, spin it, ring it, beep it, Ha ha ha).

Like Galileo, unfortunately, I tasted some forbidden fruit and it has haunted me for years.  I was first taunted by the BLOB construct, which allowed unstructured data to be put in a data base container without the DBMS caring what it was, properly tagged the unsearchable could be found, but it was still labor intensive.  My second taste came from being one of the sorry set of individuals to develop on Apple’s Newton platform (great haiku, bad handwriting recognition).  The development platform and runtime were a rich object soup giving incredible flexibility as to what constituted data and instruction.  Now, I am severely tempted by HTML and Search in the guise of Web 2.0 (tie me to the stake and light me up, I confess).

Building out Enterprise Data Models for the average corporation or, even more difficult, Biomedical Data Stores for life sciences are extremely labor intensive, frustrating, and often futile endeavors.  The difficulty (cost, time) is directly correlated to the need for precise metadata and ontology.  Deriving, documenting, and retrofitting are massive efforts (and definitely not for the ADHD among us, who me?).  All of the investment is up front, before the first benefit can be realized (real scary career-wise).  However, this is the “right”, dogmatic, safe way to handle data.

This is why our data stores are embarrassing data dumps (landfills, complete with dozers and sea gulls).  It is the difficulty and cost of proper classification and maintenance of data in a structured environment that feeds this end.  Think of it as data entropy, devolving to the most basic disorganized state.  If this basic unstructured state is where data is going, why not just leave it in the “natural” state?  Use the human cognitive effort and Web 2.0 tools to promote the best and most useful data to the top of the heap and let the stuff of dubious integrity drop and disappear into the gravel in the bottom of the big data (fish) tank.  Rather than spend all that up front investment before the first benefit; the process would be one of steady refinement over time.

The raw data permeating the Web is greater than any structured data store and seems infinite in type and variety.  Like the ocean, people dip what they require and interests them with ever increasing success.  The rate of evolution of the supporting technology is astronomical. If we could put half the effort into molecule discovery we put into Britney Spears antics the world would be a much better place.

Data, The Ugly Stepsister of Web 2.0

The basket of technology comprising Web 2.0 is a wonderful thing and worthy of all of the press and commentary it receives, but what really scares me is the state of data in this new world.  Data sits in the basement of this wonderful technology edifice, ugly, dirty, surrounded by squalor, and chained in place.  It is much more fun to just buy the next storage array (disk is cheap, infinite, what power bill?), than it is to grind though it, clean it up, validate it, ensure proper governance and ontology.

What is Web 2.0 for, if not to expose more content? And data is the ultimate content.  Knowing what is hiding in the basement, there are going to be a lot of embarrassed organizations (Lucy, you got some ‘splaining to do!).  Imagine how difficult it is going to be to link and synchronize content and data in the Web 2.0 environment.  Imagine explaining the project delays and failures of Web 2.0 initiatives when the beast in the basement gets a grip on them.

Normally, the technology will be blamed.  Nobody wants to admit they store the corporate crown jewels in the local landfill.  Nobody will buy the new products fast enough.  The server farms being built to support Cloud Computing will sit spinning and melting Arctic Ice in vain (Microsoft’s container-based approach is cool).  This could seriously impact the market capitalization of our top tech giants Microsoft, Oracle, Google, Amazon.  Oh no! It could crash the stock market and bring on tech and financial Armageddon given our weakened state!  Even worse, my own career is at stake!  The devil with them, they are all rolling in money, I could starve!

Now that I have my inner chimp back in the box, we need to put together a mitigation strategy to allow for a steady phased improvement of the data situation in tandem with Web 2.0 initiatives.  It is too much to expect anybody to clean up the toxic data dump in one sitting and we can not tag Web 2.0 with the entire bill from years of neglect (just toss it in the basement, no one goes there).  If we do not ask IT to own up to the issue and instead allow projects to fail, senior management, (fade to The Office), will assume the technology is at fault and will not allocate the resources needed to make this key technological transition.