A lesson in traditional data history…
Once upon a time, we ran our businesses via mainframe computers. Essentially, these machines were a sort of massive server. Back then, traditional data meant files, lots and lots of files, because the databases we take for granted today, did not exist. These files, often referred to as data sets, had structure, were controlled by parameters, and had associated methods of access. Programmers were required to write specific, sophisticated code, using these access methods, to make applications work.
In other words, it was complicated.
During the 1970’s, scientists thought through a simpler, more organized approach called the relational model. This model could be accessed via a common method called structured query language or SQL. Soon, usage of the relational database management system (RDBMS) and SQL proliferated.
Though originally meant to support transaction processes, such as those found in an Enterprise Resource Planning (ERP), the RDBMS became a sort of Swiss Army Knife for all forms of data. In addition to transaction systems, the RDBMS eventually became the standard for analytics, with the advent of the data warehouse.
This wide-spread use proved to be both useful and challenging. Useful, because of familiarity; e.g. SQL. However, since the relational model was originally meant, and optimized for transaction systems. These systems stored and provided access to lots and lots of details, though typically in small doses, like to enter an order for a specific customer.
The following graphic depicts the very complicated, and often very expensive, process.
In the above data flow, relational data appears throughout. The challenge, especially in today’s world, is the fast and fluid requirements.
Analytics often requires summary information. Rather than a single customer order, one might want to see how much revenue a customer generates over a period of time. This analytic need requires numerous, perhaps thousands to millions, transactions to add up quickly because users do not like waiting. Unfortunately, users had to wait, often from minutes to hours, since the relational model was never intended to summarize transaction details so quickly.
At this stage, traditional data meant relational.
Enterprise BI tools were little help with speed, given most sourced data from relational. This analytic gap spurred a greater use of personal productivity tools like Microsoft Excel and Access. Users often rekeyed, copy and pasted, or imported volumes of transactions into these tools and created what they needed. PivotTables, VLOOKUP, and HLOOKUP became the go-to analytics for the business user.
Obtaining the supporting details was problematic unless the IT group could provide it regularly or one could find what was needed on existing reports. This less than systematic approach created an ungoverned, severely disjointed “solution” to analytics. Yet, users were happy, because they controlled, seemingly, their respective destinies.
The use of Excel for business intelligence continues. Why? Well, business users love and rely upon its ease of use. And yes, the ability to control. Unfortunately, using personal productivity tools for true enterprise analytics solves nothing. That said, observing business users and their chosen technologies provide valuable insight into what they need – ease of use and accessibility, from summary to detail.
Changing the status quo…
The first step in correction came with the “cube”. When E.F. Codd, the driving force behind the relational model, coined the term on-line analytical processing (OLAP) and wrote the “twelve laws of online analytical processing”, data marts transitioned from relational to multi-dimensional. In a nutshell, OLAP provides for a data structure that focused on data summarized by dimension; e.g. product, market, customer, time periods, etc. In the example below, 267 units of the Fruit Soda sold in California during January.
Definitions of dimensions vary and can be very technical. Simply put, dimensions provide a way to group and navigate data. For example, days to weeks to months to years. All are time. Some show details and some are summary. Another way to look at dimensions are by way of genealogy; i.e. family trees or relationships. Days are children to weeks, which are children to months, which are children to years, etc.
In many ways, cubes mimic what business users create with spreadsheets – summarized data from which to make decisions.
Building cubes can be quite simple. Today’s OLAP development tools gear to business users, rather than IT professionals. That said, populating cubes can be quite challenging. More times than not, the ERP systems are not structured in support of analytics. So, detailed data often transforms as it moves from ERP to OLAP. These transformations come via ETL technology; i.e. Extract Transform and Load. ETL requires a highly skilled, programmatic approach, meaning IT involvement. Further, the extraction or data movement happens periodically, providing the potential for a lack of synchronization.
What we know creates innovation…
In their day, mainframe computers were innovative. Typically, innovation comes from a need to improve. To improve what we know. We saw this as file systems became database management systems, and with the advent of cubes. For that matter, moving from paper-based to electronic spreadsheets was innovative.
A great challenge with cubes has been an inability to easily link back to detailed transactions. Often, business users require insight into what actually happened, in detail, as they review summary information. As mentioned, the linkage from relational detail to the summary cube is often, well, disconnected. We extract, transform, and load the data from detailed relational to cube and that lack of connection breeds challenges.
The next wave of innovation promises to solve this lack of connection. It promises to enable both details and summary. That innovation has begun.
The graphic below shows a more simplified, more progressive approach. The ETL processes have been eliminated. Instead of extracting subsets of data, all details come into an analytics platform, promoting ultimate visibility, from summary to transaction.
A new breed of data technology has arrived. The challenge will be to forget what we know and that can be difficult. We are comfortable with what we know.
Free eBook Download
In the eBook “Having a Conversation with Data”, learn what the current BI infrastructure has been and associated challenges with the traditional approach. How important the user experience is in order to best maximize data’s value to your organization. And how to gain a competitive advantage with modern analytics platforms. Click here to download.