During an informal forum recently, (whose members shall remain nameless to protect my sorry existence a few more years), analytics projects came up as a topic. The question was a simple one. All of the industry analysts and surveys said analytic products and projects would be hot and soak up the bulk of the meager discretionary funds availed a CIO by his grateful company. If true, why were things so quiet? Why no “thundering” successes?
My answer was to put forward the “typical” project plan of a hypothetical predictive analytics project as a straw man to explore the topic:
- First, spend $50 to $100K on product selection.
- Second, hire a contractor in the product selected and tell him you want a forecasting model for revenue and cost.
- The contractor says fine, I’ll set up default questions, by the way where is the data?
- The contractor is pointed to the users. He successively moves down the organization until he passes through the hands-on user actually driving the applications and reporting (ultimately fingering IT as the source of all data). On the way the contractor finds a fair amount of the data he needs in Excel spreadsheets and Access databases on the user’s PCs (at this point a CFO in the group hails me as Nostradamus because that is where his data resides).
- IT gets some extracts together containing the remaining data required that seems to meet the needs the contractor described (as far as they can tell, then IT hits the Staple’s Easy Button — got to get back to keeping the lights on and the mainline applications running!).
- Contractor puts the extracts in the analytics product, does some back testing with what ever data he has, makes some neat graphics and charts and declares victory.
- Senior management is thrilled, the application is quite cool and predicts last month spot on. Next month even looks close to the current Excel spreadsheet forecast.
- During the ensuing quarter, the cool charts and graphs look stranger and stranger until the model flames out with bizarre error messages.
- The conclusion is drawn that the technology is obviously not ready for prime time and that lazy CIO should have warned us. It’s his problem and he should fix it, isn’t that why we keep him around?
At this point there are a number of shaking heads and muffled chuckles; we have seen this passion play before. The problem is not any product’s fault or really any individual’s fault (it is that evil nobody again, the bane of my life). The problem lies in the project approach.
So what would a better approach be? The following straw man ensued from the discussion:
- First, in this case, skip the product selection. There are only two leading commercial products for predictive analytic modeling (SAS, SPSS). Flip a coin (if you have a three-headed coin look at an open source solution, R or ESS), maybe it’s already on your shelf, blow the dust off. Better yet, would a standard planning and budgeting package fit (Oracle/Hyperion)? The next step should give us that answer anyway, no need to rush to buy, vendors are always ready to sell you something (especially at month/quarter end — my, that big a discount!).
Use the money saved for a strategic look at the questions that will be asked of the model: What are the key performance indicators for the industry? Are there any internal benchmarks, industry benchmarks or measures? Will any external data be needed to ensure optimal (correct?) answers to the projected questions?
- Now take this information and do some data analysis (much like dumpster diving). The key is to find the correct data in a format that is properly governed and updated (no Excel or Access need apply). The key is accurate sustainability of all data inputs, remember our friend GIGO (I feel 20 years old all over again!). This should sound very much like a standard Data Quality and Governance Project (boring, but necessary evil to prevent future embarrassment to the guilty).
- Now that all of the data is dropped into a cozy data mart and supporting extracts are targeted there, set up all production jobs to keep everything fresh.
- This is also a great time to give that contractor or consultant the questions and analysis done earlier, so it will be at hand with a companion sustainable datamart. Now iterations begin — computation, aggregation, correlation, derivation, deviation, visualization, (Oh My!). The controlled environment holds everybody’s feet to the fire and provides excellent history to tune the model with.
- A reasonable model should result, enjoy!
No approach is perfect, and all have their risks, but this one has a better probability of success than most.