Turning data into assets
We have entered the Information Age. In "Data Driven", Thomas C. Redman helps us understand the best data management practices and the major data quality problems that get in the way
Redman sees three overlapping phases to the Information Age: the IT Buildout phase, the Data Quality phase and the Data Deployment phase. Many organisations are now nearing the end of the first phase, marked by the ubiquitous presence of advanced hardware and software. Next, a small but growing number of companies are entering a phase in which they devote attention to the quality of their data. Third, a very small number of companies have achieved control of data quality and are able to gain full usufruct of their data. The three phases of the Information Age then correspond to, one, accumulating data; two, retaining only such data that is of usasble quality; and, three, putting the quality data to best use.

The most common data problems
As companies enter the Data Quality phase they confront seven fundamental problems. The first is unfindable data. Redman estimates that knowledge workers spend 30% of their time searching for data that they need but cannot find. These fruitless data searches waste time, sap morale and result in reduced productivity.
The second problem is incorrect data. Redman estimates that 10-25% of data is inaccurate. And even a lower error rate can be misleading. Here’s the explanation. Start with the rule of ten: it costs ten times as much to complete a unit of work when the input data are defective than when they are correct ($10 to process an inaccurate order v. $1 for an accurate order). Then assume a 99% accuracy rate. Sounds fine? Not so fast. Assume the data concerns orders that contain 12 data items; for 10,000 orders that makes 120,000 data items. At a 1% error rate there will be 1,200 incorrect data items and so conceivably 1,200 improper orders (admittedly, a maximum). Applying the rule of ten, the new order processing cost would be (8,800 x $1) + (1,200 x $10) = $20,800. Versus $10,000 if error-free. This should convey clearly bad news that a mere 1% error rate has created a whopping 100% cost increase.
Problem Number Three is poor data definition, like an insurance company's labelling a datum poetically as “scion” where the workaday term “heir” would be far easier to locate, to retrieve. Due to poor data definition data banks can turn out to as useless, or even counter-productive, as was the Tower of Babel..
Four: data privacy and security. The importance of data privacy, and the consequent requirement to assure it, is illustrated by the need for companies to have store age data for retirement planning decisions yet also observe the potential conflicting need not to use that age data for promotion decisions. Data security - as contrasted to date privacy - is an even more notorious problem as the publicity surrounding identity theft in the US attests.

Five, is excess data. Redman cites an Accenture survey in which 40% of IT managers complained of information overload. Storing data is cheap and becoming cheaper. But excess information makes it hard for mangers to discriminate between more important information and less important information. Excess information is to information management what urgent-but-not-important tasks are to time management: it leads to deficient focus.
Six, is related to Five: organisational confusion. Most companies are unable to answer the questions: What data do we have? Where are they? Which are the most important? What are they worth?
Best data habits
So what are the habits of the companies with the best data management? How do the best companies attempt to surmount the six fundamental data problems?
The first habit is to target on the most important data needs of the most important customers. Organisations can’t meet all their customers’ needs because only the customer knows all of them. But, if the customer is known to be mainly interested in product features, then the supplier's focus should be on data about product design. If the customer is known to be very price-conscious, then the supplier should focus on data about costs through the value chain.
The second habit is to manage all critical sources of data, notably those of suppliers. This is demonstrably the case in just-in-time manufacturing where manufacturers need to work closely with their parts suppliers. Redman suggests that companies need to implement a similar relationship with their data suppliers. Cutting-edge organisations inform suppliers of their data needs and provide feedback on the quality of data provided; this applies to internal departments acting as data creators as it does to external suppliers. Redman does make the point that a data quality program is often easier to implement externally than internally. Internal politics often make data quality reform difficult to implement--but as relationships with external suppliers are less fraught with personality clash, data quality may be easier to achieve starting with them.
Another practice to keep in mind when thinking about data sourcing is effective data entry. The best companies devote attention to this somewhat neglected process, building in as many safeguards as possible, and rewarding efforts of employees who are helpful in this cause. To revert to Redman’s favourite metaphor, the best companies are making sure the data streams are clean in order not to end up having to clean up a dirty data lake.
A third habit is to measure quality at the source and in an actionable way. Unfortunately data, unlike manufactured products, do not have easily measure physical specifications. In Redman’s experience, the best companies cope by employing the techniques of quality control sampling. They take small samples of data records (say a sample of customer order records), and they sample continuously. They then compute the accuracy of the samples and plot this accuracy over time (a time series plot) to monitor the evolution of accuracy (hopefully its improvement). These plots are the data quality equivalents of the Shewhart charts used for manufacturing tolerances.
A fourth habit is to set and achieve progressive targets for improvement. Redman notes that it can be counter-productive to set targets like “achieve a quality level of 99%”. If your initial error rate is in the 30-50% bracket, such an overly ambitious goal can be discouraging. So try a target of the form “reduce the error rate by x % every n months”. Just as in sampling where you take small samples but you do so continuously, you set achievable targets which you improve continuously. One of the best-practice companies cited by Redman, Tele-Tech, sets out to halve the error rate every year (so, 10% to 5% to 3%, etc….).
A fifth habit involves clear accountability for data. Organisations need to think more about who specifically manages their data. Since data are stored in computers, most organisations have entrusted the IT department with the data management responsibilities. Redman sees this as a trap. A handful of companies have created the post of Chief Data Officer (CDO); Redman cites JP Morgan as one of the few companies to have gone this route. Senior management involvement is crucial to the success of data quality programs and so Redman suggests the creation of a data council to assist that Chief Data Officer. This council of senior managers provides support, legitimacy and occasional cover for the CDO with the objective of building an organisation that, over time, manages data and information as professionally as it manages other assets.
The companies that have acquired these habits will be able to respond positively to the following strategic questions:
-Do leaders have the data they need to set strategy?
-Are the organisation’s data adequate to execute the strategy?
-Do the organisation's strategies fully utilise its data?