The Bench to Bassinet (B2B) PCGC and CDDRC are establishing a Challenge Prize Program that will use PCGC-generated data sets available through the NHLBI BioData Catalyst ecosystem and Cardiovascular Development Consortium (CvDC) generated datasets available through the CDDRC platform. The purpose of the Challenge Prize Program is to promote the application of computational analyses and machine learning approaches for hypothesis generation/testing and research tool development for congenital heart disease research.
Congenital heart disease (CHD) affects approximately 40,000 infants in the United States each year and is one of the leading causes of infant mortality. Improved medical and surgical management over the past 20 years has produced a growing number of adults living with CHD. It is estimated that there are as many as 2 million adults and 800,000 children in the US living with CHD. Outcomes have improved significantly for some patients with CHD, but mortality and morbidity for some lesions remain unacceptably high. Many genetic contributions to the pathogenesis of CHD have been well-established by animal models and human studies. However, the influence of genetics on clinical outcomes remains largely unexplored, and finding new targets for therapy or novel approaches for prevention and risk stratification requires a deeper understanding of CHD genetics. Major barriers to genetic/genomic studies in CHD include the heterogeneity of conditions, the complexity of the involved molecular pathways and networks, the small numbers of human subjects with a particular malformation at any one institution, the large number of possible causative genes, and the low frequency of causative variants.
Since its creation in 2009, the PCGC has explored the genetic underpinnings of CHD and has accumulated the largest collection, to date, of data and DNA from patients with CHD (>12,000 probands enrolled). The PCGC has discovered the etiology, by various genetic mechanisms, of approximately 25% of unexplained CHD and identified many new CHD-associated genes. Much of the phenotypic and genotypic data collected by the PCGC on this cohort is already available in dbGaP (phs001194 and sub-studies, phs001735, as well as phs001138), with the consortium being committed to releasing all data through dbGaP and the NHLBI BioData Catalyst ecosystem. Sequence data available includes Whole Exome Sequencing, Whole Genome Sequencing, Molecular Inversion Probe Sequencing as well as Genome-wide Single Nucleotide Polymorphism array data. The phenotypic data of this cross-sectional cohort was collected using eCRF forms and includes detailed cardiac lesion descriptions, extracardiac phenotype descriptions, demographics and birth history, etc.
The CDDRC multi-omics datasets include RNA-seq, ChIP-seq, Single-Cell RNA-seq, Hi-C as well as WES and WGS from multiple organisms (mouse, zebrafish, human and others), with hundreds of experiments and thousands of FASTQ and BAM files. Legacy data from the CvDC as well as incoming new datasets will be available. The CDDRC Mosaic platform facilitates pointing analytic and visualization tools to external sources, with the longterm goal of melding the cloud-compatible platform with the NHLBI BioData Catalyst ecosystem.
This challenge aims to reward innovative, computational Challenge Solutions utilizing the large genomic and phenotypic data obtained by the PCGC and CDDRC to advance CHD research. Successful Challenge Solutions will be free and openly available to the research community. In addition to PCGC and CDDRC data, participants are strongly encouraged to take advantage of additional NHLBI-funded datasets in the development of their Challenge Solution and are also welcome to bring other relevant data to their analyses.
Examples of potential Challenge Solutions include, but are not limited to:
· Develop an AI approach to identify causative variants of CHD and/or predict outcomes
· Develop and validate a strategy to harmonize non-PCGC datasets with PCGC datasets
· Polygenic risk scores
· Phenotypic variation within a class of CHD variants
· Identification of non-coding variants associated with CHD outcomes
· Pipeline to utilize CDDRC data to validate human variants
· Incorporate existing RNA-seq (bulk and single cell) analytic tools and visualization tools to function on the Mosaic CDDRC platform
· Develop and incorporate tools for RNA structural and expression analyses
· Develop and incorporate tools for Hi-C data analysis
· Cross-species comparative analyses and visualization