Data Quality: The Truth Isn’t Out There

This is not an uplifting article. It makes a dismal claim about a dull topic: namely, that the information on which banks base their decisions—information about exposures, risks and customers—suffers from serious shortcomings. The humdrum matter of poor data explains many of the problems banks experience in achieving their goals, including avoiding insolvency.

Crisis commentary has focused on fundamentals of morality and common sense that were allegedly discarded during the pre-crisis boom. Bankers were consumed by greed; investors were fooled by leverage; regulators were blinded by complexity. You know the story. However, one fundamental has been ignored. Banks can only make good decisions if they have collated data and translate it into useful information. Good managerial decisions require a foundation of accurate information. It is well-documented that, pre-crisis, big decisions were made on little information. But it is less widely understood that this was often because the information was unavailable.

Risk management is built on three basic capabilities, each of which is a necessary precursor to the next:

1. Information: Knowing current positions or “exposures”

2. Measurement and forecasting: Understanding how and why the bank’s exposures and earnings might change

3. Management: Processes to bring exposures in line with an agreed risk appetite or tolerance

Post-crisis there has been much discussion of risk measurement and management but remarkably little of the data on which they depend. This is perverse. If, as we claim, the data on which risk measurement and management are based is seriously flawed, the material progress in these two areas is next to impossible.

Indeed, poor data can undermine not only risk management but most of the actions a bank takes. Be it a group-level strategy decision or something as specific as setting prices for consumer loans, the story is the same; the quality of any decision depends on both the skill of individuals involved and the information they rely on. If data quality is poor, the information will be poor and only luck can stop the decisions from being poor.

What is good data quality? Its hallmarks include accuracy at the point of entry, completeness of fields, congruency as it flows through the institution, consistency of interpretation and a stable approach to storage over time. Most importantly, high-quality information is well-structured to support business uses. For example, data processing systems often have no place for information that is relevant to the economics of transactions, such as the term of a collateral pledge. Even if the data is collected, it is not stored with the other relevant information, gets “lost in the system,” leading to errors in the bank’s understanding of its exposures.

So, how bad is the information on which banks base their decisions? We cannot prove definitively it is not fit for purpose, but we are prepared to state that there are serious problems at most institutions.

Prior to the crisis, institutions unwittingly generated concentrations, partly because different internal businesses ran separate systems that failed to identify where risk drivers were shared (e.g. exposures to the same counterparty, different risks to the same counterparty and so on). When executives ask how much exposure they have to an asset class, they normally receive a range of very different answers and no single conclusion is ever quite reached. Data paucity at some institutions is as simple as the inability to differentiate between OECD government debt (system identifier: “AAA bond”) and structured products (system identifier: “AAA bond”).

Compounding such structural data problems is the plethora of data errors or omissions. In our work with clients and their databases, we often encounter individual missing fields with direct financial implications exceeding $10,000. Consider a simple but common example: the omission of information about the term of a collateral pledge. This exposes banks to the risk that an apparently collateralized loan is in reality unsecured. In the event of a default, such an omission could cost the bank millions. This is but one of countless apparently trivial but cumulatively expensive and dramatic shortcomings in data quality.

Many executives were surprised by their exposures as the crisis unfolded. Yet the underlying problem is still to be solved. Most banks still devote considerable effort to manual “workarounds” or “clean ups” during their regular reporting cycles. Rational executives at most banks remain skeptical of the data presented to them. Strategic thinkers will ask themselves about the opportunity cost of lost customer insights or economic understanding. Little imagination is needed to see that radically different customer service propositions and operating models would be possible if data were of higher quality.

Given the centrality of information to banking, it may seem surprising that data quality at most banks remains poor. However, a brief examination of stakeholders’ incentives makes it less surprising:

Executives Fixing data quality is difficult, expensive and time consuming. Given the time horizon of most senior executives, other uses of capital and staff attention are likely to be more rewarding. Nor are data issues likely to be those that naturally concern such men and women. It is not a topic that one immediately associates with “Titans of Finance.”

Regulators Given that banks struggle to assess data quality internally, third parties may find it impossible. Moreover, the existence of poor data undermines the regulatory measures that underlie prudential supervision. If information quality is an issue, regulators themselves have some tough questions to answer.

CIOs cannot be directly accountable for data quality. Technology functions take the data provided, collate it, transform it according to models provided by other functions (such as Risk, Finance and the business lines) and then funnel the outputs to their users. It is a “plumbing” job that cannot in itself improve the quality of the base data provided or the design of the models that transform the data into information. The CIO can (fairly) protest that he and his staff were “only following orders.”

Risk Managers have an incentive to ensure that the data is accurate as this will improve the quality of their decisions. However, two facts about modern risk management blur this incentive. The first is that the performance of risk managers is unobservable over any reasonable time frame. Their primary job is estimating “tail risk”—e.g. the size of annual losses with extremely low probabilities. Since annual losses with very low probability do not happen often, there is no way of measuring the accuracy of risk managers’ estimates, and thus the whole incentive area is a difficult one. From a different angle, many modern risk managers have committed themselves to mathematical techniques that are heavily reliant on data accuracy. If they become vocal about the poverty of the data with which they are working, what does that say about the value of what they do? A “quant” who values his or her role cannot be a fierce critic of the bank’s data quality.

Investors and analysts cannot touch or feel the internal data of institutions to get a handle on what is really going on. Agreeing that basic data is flawed does not fit well with their avowed, information-based strategies for delivering alpha.

Politicians, the press and public opinion are unlikely to recognize something as mundane as data quality as being a crisis cause. Greedy bankers make more emotionally satisfying culprits.

What can be done? Data quality programs start (and often end) with a framework consisting of measurement, governance, ownership, processes and organization. While these are undeniably important, implementing such frameworks has failed to improve information at many banks. It is striking that these “solutions” do not even attempt to address the incentive problems described above. The problem of collective myopia and incentives is replaced with a more tractable set of process-type issues. We wonder how many times the “tried and tested” approach must fail before more creative approaches are considered.

Building data quality into the DNA of an organization is not a short-term project that can be undertaken in any given operating function. Instead, radical changes to the culture, priorities and incentives of the top-level executive through to branch staff are required. Graveyards packed with failed or aborted data quality projects demonstrate that bottom-up IT-led solutions are likely to prove costly, time-consuming and potentially futile diversions. Buy-in from the Executive Committee is necessary to ensure that the “life or death” influence that data quality has on strategy and operations is recognized. Progress is seen from the few banks that have grasped the nettle, understood the extent of the problem and dedicated resource and mind space from across the organization.

Metaphors aside, what can executives actually “do” differently? The following are critical:

Build a case for action: Quantifying the cost and/or opportunity of data to underline the importance of the issue. FTE costs of manual data review will likely be visible to some degree; the potential benefit of better customer service or more precise credit decisions may not be.
Embrace the challenge: Critical issues pervasive to organizations are governed day-to-day by the executive, and data quality should be no exception. If group-wide financial planning is worthy of ongoing executive attention, then so is data quality.
Tackle incentives head on: Data quality is an incentives problem—for individuals that impact data quality (e.g. most executives and a majority of staff members) data quality should appear in incentive schemes, job descriptions and performance evaluation.

Articulate what good looks like:

Most organizations have several data quality definitions, occupying dusty drawers. Internal marketing of a business-relevant definition helps to ‘sell’ the issue and lays the groundwork for other initiatives.

Instill discipline over “proprietary” data: In a data quality vacuum, many functions in a bank will have their own “proprietary” version of the truth, undermining transparency and access to information. Phasing out “proprietary” data has to be done carefully (it was created for a reason), but the executive should set and stick to hard standards for data sources to meet, and these should include a single version of the truth.

Win something (for a change): Steps in a typical data program life cycle include “scope creep”, “overwhelmed” and “capitulation.” Maintaining focus on a small set of business critical decisions, and proving that success is possible, helps to break the failure cycle.

Pessimists who propose that data quality cannot be improved face a stark choice. They must either continue “as is” or simplify the organization’s operations to reflect the poverty of their data. Choosing between “flying blind” and severely constraining your ambition is an ugly choice—and reason to resist the pessimistic view about the possibility of improving data quality.

The first step of treatment is recognizing you have a problem. It is not clear that a majority of stakeholders in banks have reached this stage yet. But information quality has become a CEO agenda item at some forward-thinking institutions. Will data quality be a differentiator of future performance? Alas, we do not have the data.