Today’s Chief Data Officers (CDOs) are tasked with a significant set of unique challenges from developing leading analytics strategies to modernizing governance programs, and even as far as creating a data-driven culture. In each of these ambitious tasks, they are trying to balance two conflicting postures. Firstly, taking a defensive position to focus on risk management, regulatory compliance, data privacy and security. Secondly, adopting an offensive stance which includes developing data products, machine learning-enabled services, and insights for increased revenue and customer satisfaction.
The success that underpins these increasingly business critical initiatives often comes down to their ability to leverage cloud technology in a well-architected, automated, and controlled way to improve the reliability, availability, and observability of data assets. Without this cloud enablement, organizations can find it hard to operate, let alone innovate.
To illustrate this point, we often see organizations struggling in the wake of acquisitions and mergers to determine authoritative sources of data, along with the three 'Cs' of good data quality, namely completeness, consistency, and correctness. This means both decision makers and application development teams are often left with untrustworthy data, inconsistent insights, and debilitating wait times while central IT teams must work to untangle complex data pipelines.
The solution? We believe that by proactively assessing their current data management maturity and leveraging automated compliance technologies, data leaders can avoid project delays caused by untrustworthy data. The Cloud Data Management Capabilities (CDMC) Framework, developed and adopted by the world’s largest cloud data consumers, seeks to offer all organizations a clear set of maturity metrics of which to guide best practice adoption.
Below are three ways organizations can use cloud technology to improve the management of their data and work toward compliance with privacy regulations such as General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), Comprehensive Capital Analysis and Review (CCAR), and the Health Insurance Portability and Accountability Act (HIPAA). This will enable them to access high quality, trustworthy data for all their business teams.
1. Promote predictive, machine learning-based data quality and classification
Critical to data assets being reliable and trustworthy is the ability to determine the authoritativeness, completeness, and accuracy of incoming data. Organizations should move from managing complex, manually written quality rules to instead focus on introducing quality measures and metrics, leveraging machine learning to generate appropriate rules which can adapt over time.
By moving from a fully manual data labelling workflow to using a supervised machine learning model, one regulatory supervisor we worked with was able to classify data at higher scales, requiring human intervention only for training and testing activities. This more efficient use of time allowed their teams to concentrate instead on higher value tasks.
2. Ensure data availability with automated data discovery and classification
As growing organizations rapidly onboard new data sources, finding the “right” data becomes difficult and often over-burdens central IT teams which struggle to understand the context of data assets. Leveraging cloud automation and event-driven architectures, data catalogs can be populated instantly when data changes or moves within the organization – providing a unified, self-serve “data shopping” view for teams.
For example, following a merger, one financial services organization was struggling to determine authoritative sources of customer data, such as duplicated customer records with conflicting attributes. Leveraging automation tools removed the need for their technology teams to untangle complex data pipelines and resulted in better data quality and lineage visibility. This gave decision makers and development teams more trustworthy data and more consistent insights, and reduced project delays caused by poor data.
3. Meet privacy obligations at scale with enhanced data observability
Businesses already utilizing cloud-based logging and monitoring for operational oversight should extend the reach of these services to cover data storage, movement, and usage. Tracking lineage, retention, cross-border movement, and sensitivity of data assets is critical for regulatory compliance, with mature data teams proactively creating alerts and triggering actions when suspected policy violations are detected.
We recently partnered with a world-leading data provider, to design and implement governance and monitoring technologies. Through better management of their data, they are now confident to be able to meet – even exceed – compliance requirements, even though this involves thousands of highly sensitive data assets.
The future of cloud data management
From increasing revenue to driving innovation, the cloud is radically changing how businesses operate. For businesses already innovating in the cloud, or planning to move data assets to the cloud, it’s important to avoid common pitfalls like the degradation of data quality, lapses in security and privacy controls, and simply becoming lost in vast quantities of unknown and unowned data.
Taking a “compliance by design” approach in the earliest stages of strategic and technology projects is one way to address this. Likewise, by proactively assessing current data capabilities and leveraging cloud providers' automation and observability capabilities, data leaders can enhance decision-making, operational efficiency, and adapt to regulatory changes.