Improving Healthcare Data Quality for the AI Era
Healthcare data quality has been a prime concern for several years. As healthcare AI applications grow, however, data quality has become more critical than ever.
Artificial intelligence is revolutionizing healthcare. Providers are better positioned to deliver personalized treatments, predict diseases before they occur, and save countless lives—if they have quality data to feed AI.
That’s because AI is only as good as the data it’s trained on. In healthcare, where decisions can mean the difference between life and death, the quality of that data becomes paramount.
It’s one of those challenges that keeps healthtech leaders up at night. As we learned at the recent LSX World Congress USA, it’s a big topic of conversation.
In this article, we’ll examine why quality healthcare data is essential, the cost of poor data quality, and how the METRIC framework may help ensure that our healthcare AI applications are trained and operating off of high-quality data.
The Cost of Poor Healthcare Data Quality
Poor data quality can impact healthcare organizations in many ways. Incorrect or missing information can result in compromised patient care, delayed treatments, inaccurate diagnoses and treatment plans, wasted resources, increased healthcare costs, and—most importantly—adverse patient outcomes.
According to Gartner, poor data quality costs companies an average of $12.9 million annually. In healthcare, those costs aren’t just financial—they’re measured in human lives.
Building Trust in Healthcare AI
Trust is the cornerstone of healthcare. Patients trust their doctors, and doctors need to trust the tools they use. To earn that trust, healthcare AI tools must consistently deliver accurate, reliable results. And that all starts with high-quality data.
To address this critical need, researchers have developed the METRIC framework.
What is the METRIC framework?
The METRIC framework is a way to comprehensively assess healthcare data quality for use in medical AI models. Developed by German researchers Daniel Schwabe, Katinka Becker, Martin Seyferth, Andreas Klaß, and Tobias Schaeffter, this framework lays the foundation for trustworthy AI in medicine.
While several documentation efforts and frameworks for evaluating AI models already exist, Schwabe et al. felt that none of them comprehensively assess the content of data sets and their suitability for use in machine learning (ML).
In order to establish guidelines for trustworthy AI in medicine, they sought to identify which characteristics should be used to evaluate data quality. The result of their research is the METRIC framework.
The METRIC framework has five categories and 15 sub-dimensions that researchers and healthcare organizations can use to evaluate the quality of their data.
METRIC’s five categories are:
- Measurement Process
- Timeliness
- Representativeness
- Informativeness
- Consistency
Measurement Process
This category assesses uncertainty during data collection. Imagine we’re trying to teach an AI to understand brain activity from an EEG. If we only train it on perfectly quiet EEGs, it may give low-quality results when it encounters a real-world patient who moves a bit and introduces noise. Therefore, a little noise in the training data can make the AI more robust and adaptable.
The measurement process category’s dimensions are device error, human-induced error, completeness, and source credibility.
Device error addresses the accuracy and precision of the data values originating from a sensor.
Human-induced error covers noisy labels, carelessness, and outliers to ascertain how much manual inputs introduce wrong data.
Completeness is self-explanatory—how many entries are missing from a dataset?
Source credibility is concerned with how true and believable the data is. What is the level of expertise of the data source? How traceable is it? What is the level of data poisoning or intentional falsification?
Timeliness
The timeliness cluster’s dimensions are age and currency. Is the age of the data appropriate for the application it’s being used for? Is it up to date?
It also ensures the data meets modern standards, such as current medical coding practices and indications for diagnosis.
Representativeness
Variety, depth of data, and target class balance comprise the three dimensions of the representativeness cluster.
Variety refers to not only the diversity of the demographics but also the variety of data sources.
Depth of data looks at the size of the dataset, its granularity, and its coverage (defined as “the extent to which relevant subsets of the dataset satisfy the dimensions’ variety’ and ‘target class balance.’”)
Target class balance addresses how similar the classes of a target variable are in size.
Informativeness
The informativeness cluster reminds us to be selective about the data we feed these AI systems. Its dimensions address the data’s understandability, redundancy, “informative missingness,” and feature importance.
Understandability is pretty self-explanatory. It’s about the amount of ambiguity in the data and how easily it can be comprehended.
Likewise, redundancy is also intuitive. It’s about data deduplication, conciseness, and uniqueness.
That brings us to “informative missingness,” which almost sounds like something Dr. Seuss made up. The authors define it as “the extent to which missing data values provide useful information.” The example they give is if a patient’s EHR doesn’t have a cholesterol level recorded, it could indicate the doctor’s belief that the patient is at low risk for cardiovascular disease.
Feature importance, the final dimension of informativeness, addresses how beneficial certain data is. Knowing the patient’s insurance company doesn’t help predict their risk of stroke, for example.
Consistency
The consistency cluster is about making sure the data makes sense internally and in the context of the real world. It defines three types of data consistency: rule-based, logical, and distribution.
Rule-based consistency looks at the syntactic consistency and the data’s regulatory compliance.
Logical consistency measures the plausibility and semantic consistency of the data.
Distribution consistency looks at the homogeneity and distribution drift present in a dataset.
The Path Forward for Healthcare AI
As we stand on the brink of an AI-driven healthcare revolution, we must remember that the quality of our data will determine the quality of our care. It’s not just about having more data—it’s about having better data.
The future of healthcare is bright, filled with possibilities that were once the stuff of science fiction. But to reach that future, we need to build on a solid foundation of high-quality data. Because when we get it right, we’re not just improving data quality—we’re improving lives.
Building software that makes people’s lives better is one of Taazaa’s core values. From healthcare software to help people recover from concussions or learn to walk again to home health and newborn assessment solutions, we design and create secure, compliant digital products for today’s healthcare providers.