How to maximize HEDIS scores with synthetic data
In the U.S. healthcare industry, the Healthcare Effectiveness Data and Information Set (HEDIS) serves as the primary report card for health plans.
Sendmarc Releases DMARCbis Fireside Chat Featuring Co-Editor Todd Herr
In the U.S. healthcare industry, the Healthcare Effectiveness Data and Information Set (HEDIS) serves as the primary report card for health plans. Developed and maintained by the National Committee for Quality Assurance (NCQA), HEDIS is a standardized suite of performance measures used by more than 90% of health plans to track the quality of care and service provided to members.
The stakes for these scores are high. For payers, HEDIS performance is a core component of Medicare Advantage Star Ratings. These ratings determine eligibility for Quality Bonus Payments (QBPs)—which can amount to millions of dollars annually—as well as a plan’s ability to offer competitive rebates. Poor performance can lead to financial penalties and loss of enrollment.
However, organizations face a significant data roadblock. Improving HEDIS scores requires analyzing massive volumes of sensitive patient data, including claims, lab results, and clinical notes. Accessing this data for development and testing is often blocked by stringent HIPAA compliance requirements. This creates friction: engineers need data to build tools that close care gaps, but legal restrictions prevent them from using real Protected Health Information (PHI).
Tonic.ai provides a solution to this deadlock by generating high-fidelity synthetic data. This allows developers to work with data that maintains the utility of production datasets without the inherent privacy risks.
Understanding HEDIS and the quest for 5-star ratings
HEDIS measures evaluate specific clinical processes and outcomes. A typical measure asks: “Of the members diagnosed with diabetes, what percentage received a hemoglobin A1c (HbA1c) test in the last year?” To maximize a score, a plan must prove that the numerator (patients who received the service) is as close to the denominator (eligible patients) as possible.
The domains of care
HEDIS includes more than 90 measures across several domains:
Preventive screening: Breast cancer screening, immunizations, and wellness visits.
Chronic condition management: Controlling high blood pressure and diabetes care.
Behavioral health: Follow-up after hospitalization for mental illness and antidepressant medication management.
Access/availability of care: Timeliness of prenatal and postpartum care.
The financial incentive
HEDIS scores are the engine behind the CMS Star Rating system. The financial implications are binary:
4 stars and above: Plans typically qualify for significant bonus payments and have their rebate percentage increased.
Below 4 stars: Plans lose access to these bonuses, making it difficult to offer the supplemental benefits required to attract and retain members.
Identifying care gaps
The most effective way to boost scores is through proactive care gap analysis. A care gap occurs when a patient is eligible for a service but hasn’t received it. Closing these gaps requires sophisticated software that can ingest disparate data feeds and alert providers or members in real-time. Building these tools, however, is where the data bottleneck begins.
The HEDIS data logjam: complexity vs compliance
Software engineers and data scientists in the healthcare industry tasked with building HEDIS reporting engines face three primary hurdles:
1. Heterogeneous data sources
HEDIS reporting is not a single-source operation. It requires data from:
Claims databases: Billing codes that indicate procedures and diagnoses.
Electronic health records (EHRs): Detailed clinical results that claims might miss.
Lab feeds: Specific values (like blood glucose levels) necessary for outcome-based measures.
2. The risks of using real PHI
Using production data in development or staging environments is a high-risk practice. A single breach of a database containing patient histories can lead to HIPAA fines and irreparable reputational damage. Consequently, access to real-world data is typically restricted.
3. Development friction
When developers are denied access to production-grade data, they often resort to manual dummy data that lacks the complexity of real clinical records. Alternatively, they wait months for legal de-identification approvals. This red tape slows the development of AI-driven tools meant to identify care gaps, leaving potential HEDIS points—and revenue—on the table.
Transforming HEDIS reporting with Tonic Structural
Tonic Structural solves the structured data bottleneck by providing high-fidelity masking and synthesis for relational databases.
High-fidelity masking
Structural transforms sensitive claims and lab data into synthetic versions that look and act like the original. It preserves referential integrity. If a patient record is linked to five different lab results in the source database, the synthetic version will maintain those same links. This allows engineers to test complex joins and queries without seeing real patient names or Social Security numbers.
Note on Digital Standards: While Structural supports traditional SQL-based claims and clinical data, it also provides native support for de-identifying FHIR (Fast Healthcare Interoperability Resources). Its JSON Document View allows developers to mask nested patient resources while keeping the schema valid for digital HEDIS (dQMs) testing.
Maintaining statistical utility
For HEDIS math to be useful, synthetic data must retain the statistical distribution of the original set. If 15% of your members have Type 2 Diabetes in production, the synthetic set should reflect that same 15%. This ensures that when developers test scoring logic or dashboards, the results mirror what will happen in production. Structural offers a comprehensive library of data generators to ensure that statistical distributions and relationships within production data are maintained.
Accelerating the SDLC
Structural offers features like subsetting, allowing developers to create smaller, portable versions of massive healthcare databases. These targeted datasets enable engineers to iterate 24/7 in isolated silos, accelerating the SDLC for quality improvement tools.
Unlocking clinical insights with Tonic Textual
While administrative claims provide some data, a significant portion of HEDIS proof is buried in unstructured doctor’s notes or clinical narratives. This is the hidden data problem.
The value of unstructured data
In many cases, a patient may have received a screening, but the claim was never filed or was coded incorrectly. The only evidence exists in the narrative text of an EHR. To capture this for HEDIS reporting, organizations use Natural Language Processing (NLP) to scan notes for proof of care.
Safeguarding clinical narratives
Training an NLP model requires access to thousands of clinical notes. However, these notes are saturated with PHI (names, addresses, specific dates). Tonic Textual uses proprietary Named Entity Recognition (NER) models to automatically detect and scrub these identifiers from clinical text.
AI training for HEDIS
By using Textual, data scientists can safely train Large Language Models (LLMs) or Retrieval-Augmented Generation (RAG) systems to analyze clinical notes. These models can then be deployed to identify care evidence that would have otherwise been missed. This directly boosts the numerator of HEDIS measures by finding hidden completions, leading to higher scores without changing the actual care delivered.
The future of digital HEDIS and synthetic data
The NCQA is currently transitioning toward Digital Quality Measures (dQMs) and the use of Electronic Clinical Data Systems (ECDS).
Automation and interoperability: The industry is moving away from manual chart reviews toward fully automated, interoperable systems based on the FHIR (Fast Healthcare Interoperability Resources) standard.
Predictive analytics: Future HEDIS success will rely on AI to predict which members are at the highest risk of missing a screening before the measurement year ends.
Future-proofing with Tonic.ai: As these digital standards evolve, teams can use Tonic Fabricate to generate fully synthetic clinical scenarios. This allows for testing edge cases and new dQM logic long before real-world data is available, ensuring the system is ready for the audit season.
Conclusion: build the tools to win the HEDIS race
Maximizing HEDIS scores is a data engineering challenge as much as a clinical one. While the goal is improved patient outcomes and higher Star Ratings, the primary hurdle is safe, rapid access to high-quality data.
Tonic.ai provides the infrastructure to streamline compliance and mitigate security risks. By using high-fidelity synthetic data, healthcare organizations can build, test, and deploy the AI-driven tools necessary to close care gaps and secure quality bonus payments.
Stop waiting for data access. Start building with safe data today. Book a demo to begin exploring the Tonic product suite today.
*** This is a Security Bloggers Network syndicated blog from Expert Insights on Synthetic Data from the Tonic.ai Blog authored by Expert Insights on Synthetic Data from the Tonic.ai Blog. Read the original post at: https://tonicfakedata.webflow.io/blog/maximize-hedis-scores-with-synthetic-data
