Analytics Series: Part 1

A View Beyond the EDRM


I recently came across a blog article written by Josh Headley titled “Bacon, Eggs and a Tall Glass of Predictive Analytics”  In the article, Headley uses the example of the logistical complexity of creating a “Perfect” glass of orange juice to discuss commonplace application of Predictive Analytics.  Headley noted that producers of such morning favorites as Minute Made have turned to big data analytics using the massive volume of buying and drinking preference data generated by individual consumers in their selection preferences. It is not just Minute Maid taking advantage of analytics either; in a recent New York Times article, Tom Duhigg discussed an example of where Target marketing mailers, informed by big data analytics trends applied to individual buyers, began sending pregnancy related coupons to the home of a teenager  before she had even told her parents.  This has raised some ethical questions as well as brought Analytics to the front page.

In the insular world of eDiscovery, it may seem that application of analytics to make predictions (i.e. predictive coding) is something quite cutting edge, unproven and risky.  And yet, whether you look into the world of banking, pharmaceuticals, retail, food services or healthcare, Analytics are not only present but integral to key function within each vertical.  Somewhere lost in the discussion of cutting edge algorithms and lambda calculus and the merits of rule based vs. iterative reinforcement based analytics, is one important fact; none of this is novel and has been used all around us for years every day in medical diagnostics, customer relationship management (CRM) tools, insurance, telecommunication and even your personal credit score.

Some History on Predictive Coding & Analytics Beyond the EDRM


The seminal text on predictive analytics, (then called Exploratory Data Analysis) was published over three decades ago and although it has had multiple applications since the first publication, the core tenets have remained the same. Instead of following data collection with an artificially imposed model based on case assumptions or best guesses and then trudging through the output page by page manually, the process begins with analyzing the sample of data with the goal of inferring which model or algorithm is most appropriate. That model is then refined with new parameters as more of the data is analyzed or rules are refined to the point that the model reaches maximum accuracy and can run with minimal or no supervision throughout the body of data.

In ‘Competing on analytics’, Thomas Davenport simply defines analytics as “the extensive use of data, statistical and quantitative analysis, exploratory , predictive models, and fact based management to drive decisions and actions.” In layman’s terms it can be defined as “the analysis of data to draw hidden insights to aid decision-making”.

Once the concepts of predictive analytics were applied to computer based analysis, the ambiguous creature known as “machine learning” was born. Although it sounds more like science fiction than science, machine learning is no enigma. The most well-known explanation of what machine learning actually is came from text published in 1996: A computer program is said to learn from experience (E) with respect to some class of tasks (T) and performance measure (P), if its performance at tasks in (T), as measured by (P), improves with experience (E).  In laymen’s terms, machine learning techniques emulate human cognition and learn from training examples to better perform future tasks.  Data Analytics became a key tool in predicting consumer, patient, investor actions.


Why Use Analytics?

Whether you are looking at retail magnates like Best Buy and Walmart, internet dominant companies like Amazon, major financial institutions and hedge funds or Healthcare Analytics play a major role.  And this is not a new trend.  Looking back at the example of a cup of OJ. It seems like a pretty simple task that a few taste tests could solve, right?  As Headley noted, not exactly:

Each orange has a profile containing 600 or more individually-identifiable flavors. Multiple varieties of oranges must be mixed in order to create a consistent flavor profile that is both favorable to consumers and which synchs with the 3-month orange growing season. If the perfect OJ blend contains 10 varieties of oranges, this is about 6,046,617,600 (with 18 more zero’s) possible combinations of flavors. Even with only 3 different orange varieties in the mix, you’re looking at 216,000,000 combinations”

Obviously, more than a few panels of tasters is necessary to cull down to the prefect cup.  And the story is similar across industry verticals.  According to IBM, 90% of the data in this world has been produced in the last 2 years.  The human mind is not capable of dealing with such large amounts of data. This is especially true in business. Large businesses have thousands of gigabytes of data containing billions of pieces of information and are adding to this corpus by the second. We need specialized tools and techniques to make sense of so much information. As such, analytics has become a crucial part of managing any business today.

Read More in Part 2

Cat Casey

Cat Casey

2 Responses to “Analytics Series: Part 1”
Check out what others are saying...

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: