In the beginning, there was nothing. Then, God created data. From that data, God created the Heavens and the Earth, or so it is told.
In reality, everything is data, and data is in everything. “Data” is defined as “a given thing, or a fact,” derived from the Latin word datum. When these facts are compiled and organized in orderly rows and columns, they are referred to as data files, and when these data files are matched with each other and stuffed together to be utilized, they are called databases. The purpose of collecting data and organizing them into logical structures is to preserve their historical values in order to understand them at a later date and perhaps to use them in a meaningful way to leverage the value of their information.
This is true of any data that has been captured from the beginning of time until just now. Books, pages of words, and the letters that form the words are both information and raw data, depending on the need. Predictive algorithms, formulae, coefficients, constants, and numeric values follow the same logical paths, although these haughty words are more often used to baffle and bullshit.
They are utility in their form but can be incredibly useful as objects of analysis. The fundamental questions of mankind, “who,” “what,” “where,” “when,” “why,” and “how” require various forms of data to address any and all possible outcomes.
As we collectively prepare to dive deeper into this new world of AI, we must understand the utmost importance of “the data” in the processes. Whether the analysis is simple or incredibly complex, the significance of data, analysts, and computational methods must be understood and balanced. Of these three elements, to be sure, the greatest of these is data.
When we turn our attention to data in business processes, it is essential to understand that various “business operations” create these data points to describe the “decisions” and subsequent “performance of the business.” Our business operations create substantial amounts of meaningful and meaningless data points in an unending cycle, and it is often difficult to tell the difference between the two. Analytical capabilities, from the simplest to the most complex, are the tools used to help us compare and contrast the value of the data.
Before rushing into analytics, dear reader, you must understand that there are four immutable tenets of data that must be strictly adhered to; it is perilous to avoid any.
1. Your data needs to serve both operational and analytical pursuits.
Business operations and analytics must coexist with the same data and database structures but never on the same platforms. The reason is simple: analytics processes use too many computational resources to coexist alongside essential operational functions.
Protect your operational functions and analytical functions by hosting them on different equipment. This means the servers hosting and using the data should be mutually exclusive.
However, the content of the operational and analytical databases should be the same. They should be mirrored, record-for-record, updated, and synced as often as is practicable to capture the most recent activities and results of the business operations.
The analytics databases will be enhanced with external and internal data sources to perform analytical functions. As a result, they will grow many times the size of the operational datasets. Still, real-time mirroring needs to be maintained to ensure overall structural integrity.
2. Your data must be well documented in both operational and analytical states.
Each data element stored in a data file or database should be documented in a “data dictionary.”
A data dictionary sounds as dystopian as Orwellian newspeak; it is probably twice as dry in reality. This document outlines the definition, use, format, and properties of every variable in the database. It tells you how it is used in the operation, where it is created, what it should look like, and, most importantly, how it can be matched to other data points in the database (and how not to match it).
It is the most basic of all motherhood documents and processes. Sadly, it is rarely intact or even in existence. The effort involved in creating and updating the data dictionary is perceived to be great when, in reality, it is not.
Whether through a lack of resources, attention to detail, or rates of change in business operations that have left their internal knowledge of how data is created and interpreted, there is usually a person or group who knows the data rules and how codes are interpreted. The organization stands a significant risk when these people leave and hand off processes to someone else.
Hire a temporary resource to build your company data dictionary and share the document broadly with the operations and analytical staff—money well spent.
3. Your data must reside in environments that are both safe and secure.
Data security breaches do not just happen to other organizations anymore. According to LinkedIn, 60% of businesses have experienced at least one type of data security breach in the last two years, and 31% of those have experienced more than one. Not only is your Personally Identifiable Information (PII) at risk but hackers and opportunists also covet your results and business performance.
Every database and directory should be securely encrypted “at rest.” That means when saved on a disk drive, in a database, or otherwise committed to memory. Encrypted data can wear many different masks, such as hashing the data so that it cannot be easily read as text when a data set is opened or, more commonly referred to, when the data has been scrambled with a long secret key. Only computers with the correct secret key can unscramble the data into a usable format.
Consistently and uniformly archive your databases. Data will change at incredibly high rates within your organization. Often, you need help to control when or how it gets updated. System administrators, with the highest levels of security and integrity, can mistakenly delete anything and everything. Proper backups and versioning can prevent calamitous incidents and even help restore prior events.
Keep archives in more than one location as well. Belts and suspenders.
4. You must expect your data to flow like water in pipes through your enterprise.
This is the new expectation to set. As AI abruptly emerges, it will become the most important tenet over the next twenty years. The AI world will not demonstrate patience for slow-moving solutions or excuses that data sources are not identifiable or trapped.
Shallow data access and lack of data integrity will clog these essential pipes and bring all your business opportunities to a grinding halt. Those organizations that see the future as a completely connected ecosystem of enterprise data sources can move their intents and operations to new products, methods, and mindsets.
Many challenges in current data environments result from moving too quickly and using too little strategy. This problem, often called “Technical Debt,” rears its ugly head in many different places and other ways (to be discussed in detail in an upcoming whitepaper). The challenge is to build solutions today that will not preclude you from opportunities tomorrow. Don’t choose quick and easy “fixes” today that will clog your pipes tomorrow.
Think about connecting the pipes of today with the sources of tomorrow.
So, here is some homework for you. Take some time to ask these six questions within your organization's technology leaders. Consider their approach to the answers and the answers themselves.
- Do we have separate databases and servers that handle data tasks for our operational processes, analytics, and MIS?
- If so, are they mirrored to each other and routinely synced? How often?
- Do we collect data on every business operational platform? This includes originations, servicing, accounting, customer contact centers, marketing, etc.
- Do we have data dictionaries for every source? Who is responsible for keeping these documents updated?
- Where is our data stored? On-premises or in a cloud environment? Is it routinely backed up, and where?
- Do we hold our data encrypted when we store it? Is it safe and secure from potential external and internal threats?
As we progress down our path of discovery, we will next discuss “The Business Imperatives of AI (and any other type of analytics.)