The Office of the National Coordinator for Health Information Technology (ONC), a division of the Department of Health and Human Services, has led and collaborated on many projects supporting the adoption and implementation of a patient-centered outcomes research (PCOR) data infrastructure. Projects funded by the Patient-Centered Outcomes Research Trust Fund, administered by the Assistant Secretary for Planning and Evaluation (ASPE), support the development of data capacity and infrastructure that can engage patients in health care decision-making and incorporate their responses into research. The Synthetic Health Data Challenge (Challenge) is an important component of the Synthetic Health Data Generation to Accelerate PCOR Project, through which ONC seeks to accelerate PCOR by furthering the development of Synthea™, a synthetic health data engine. The Challenge invites providers, researchers, and technology developers to develop innovative tools and resources that support validation and novel uses of synthetic data for PCOR researchers and/or health IT developers.
Clinical data are critical for patient-centered outcomes research (PCOR), which focuses on the effectiveness of prevention and treatment options. However, high-quality health care data are often difficult to access due to cost, patient consent, privacy concerns, or other legal or institutional review board (IRB) restrictions.
Synthetic health data can augment the PCOR infrastructure by providing researchers with a low risk, readily available, synthetic data source to complement their use of real clinical data. Early access to synthetic data while researchers await access to real clinical data may enhance their ability to test rigorous analyses and/or software systems that may generate relevant findings to inform health and treatment decisions.
Synthea is an open-source synthetic patient generator that models the medical history of synthetic patients. The resulting data are free from cost and privacy- and security restrictions and have the potential to support a variety of academia, research, industry, and government initiatives. Synthea can use publicly available health statistics and other research sources. Because the software uses publicly available statistical data to generate synthetic data sets, the barriers to resource availability and privacy concerns are lower than for other synthetic data generation technologies that rely on manipulating actual patient data. The software includes a temporal model that covers a patient’s entire lifetime instead of focusing on one health problem or disease recorded at any singular point in time. Similar to other synthetic data sets, the synthetic data generated by Synthea must be validated to ensure they are clinically relevant and realistic.
The Challenge seeks a wide array of innovators, researchers, and technology developers to create and test innovative and novel solutions that will further cultivate the capabilities of Synthea and the synthetic data it generates.
The Challenge will be conducted in two (2) phases:
- Phase I – Proposal for Innovative Models: Participants will submit a written proposal describing their proposed solution, including methodology and intended outcomes. Selected Phase I proposals will proceed to Phase II. There is no limit to the number of qualified proposals that may be selected to move to Phase II; however, a minimum of four (4) qualified proposals are required for the Challenge to proceed to Phase II.
- Phase II – Prototype/Solution Development: Phase I proposals that are selected to proceed to Phase II will develop their prototype/solution at this stage.
Participants will propose a solution in one of two (2) Challenge categories.
- Category I – Enhancements to Synthea: Solutions in this category include, but are not limited to, development and/or enhancement of Synthea modules and development of solutions that enhance or address limitations of Synthea.
- Category II – Novel Uses of Synthea Generated Synthetic Data: Solutions in this category include, but are not limited to, novel uses of Synthea generated data for research and technology development.