This NIH Long COVID Computational Challenge (L3C) seeks to spur and reward the development of AI/ML models to identify which patients infected with SARS-CoV-2 are likely to develop PASC/Long COVID.
The National Institutes of Health, through the Office of the Director, is launching the NIH Long COVID Computational Challenge (L3C). The overall prevalence of post-acute sequelae of SARS-CoV-2 (PASC), also known as Long COVID, is currently unknown, but there is growing evidence that more than half of COVID-19 survivors experience at least one symptom of PASC/Long COVID at six months after recovery of the acute illness. Reports also reflect an underlying heterogeneity of symptoms, multi-organ involvement, and persistence of PASC/Long COVID in some patients. Research is ongoing to understand prevalence, duration, and clinical outcomes of PASC/Long COVID. Symptoms of fatigue, cognitive impairment, shortness of breath, and cardiac damage, among others, have been observed in patients who had only mild initial disease. The breadth and complexity of data created in today’s health care encounters require advanced analytics to extract meaning from longitudinal data on symptoms, laboratory results, images, functional tests, genomics, mobile health/wearable devices, written notes, electronic health records (EHR), and other relevant data types. Advanced development of software tools and computing capacity has allowed artificial intelligence (AI)/machine learning (ML) approaches that are increasingly demonstrating the potential to provide insight into patient-level data from large amount of data, to better understand the effects of SARS-CoV-2 on patients.
In addition to differentiating the diverse clinical manifestations of PASC/Long COVID and better defining the sub-types, identifying prognostic factors leading to PASC/Long COVID is essential for providers who wish to better predict and prevent PASC/Long COVID from developing in their patients. Understanding risk factors may also help clinicians better understand the underlying etiology of PASC/Long COVID. In addition to individual features, clusters of pre-existing comorbidities, or combinations of those conditions with certain early medical managements, may be predictive of whether patients may eventually experience PASC/Long COVID.
The primary objective of the Challenge is to spur and reward the development of AI/ML models and algorithms that serve as open-source tools for using structured medical records to identify which patients infected with SARS-CoV-2 have a high likelihood of developing PASC/Long COVID. This Challenge invites solutions that explore the probability of developing PASC/Long COVID among patients who have tested positive for SARS-CoV-2 in an outpatient or inpatient (ICU or non-ICU) setting. Models will be evaluated using patients who have an ICD code U09.9 recorded in the dataset to label true positive of patients with PASC/Long COVID.
Depending on the success of the Challenge, unmet needs, and availability of funds, there is a possibility that NIH might announce future challenges or other funding opportunities to further develop these models and algorithms into clinical decision support tools for evaluating susceptibility to PASC/Long COVID as well as understanding the risk of specific clinical outcomes and the effectiveness of early interventions in preventing PASC/Long COVID.
The working definition of PASC/Long COVID continues to change with new evidence and is challenging given the multiorgan effects seen in this new disease (see PASC Fact Sheet (recovercovid.org)). An ICD-10 code, U09.9, was released for clinical use on October 21, 2021, and is a starting point for the medical community to start classifying suspected PASC/Long COVID cases in a structured manner. While many patients in the National COVID Cohort Collaborative (N3C) Data Enclave are already identified by this code for reimbursement purposes, other N3C patients with PASC/Long COVID may not have the code but could still potentially have undiagnosed PASC. For this Challenge, the U09.9 ICD-10 code will be used to identify patients with PASC/Long COVID for model-training purposes. Evaluation of top features that could be important predictors of developing PASC but have not previously been identified in the scientific literature will be considered as part of the qualitative metrics.
Challenge participants are expected to develop, train, and test their models to aid in prognosing the susceptibility to and likelihood of developing PASC/Long COVID in patients with SARS-CoV-2 infection. Challenge participants must utilize de-identified electronic health record data available through National Center for Advancing Translational Science’s (NCATS’s) N3C Data Enclave. Data will be made available through NCATS’s – N3C Data Enclave, which is a central, harmonized data repository that represents electronic health records from over 74 health centers across the U.S. To protect patient privacy, de-identified data provides information useful to researchers without revealing any information that could identify individual patients. The N3C Data Enclave uses the Observational Medical Outcomes Partnership (OMOP) common data model version 5.3 (https://ohdsi.github.io/CommonDataModel/cdm53.html), which facilitates reproducibility and interoperability.