EM-mosaic detects mosaic point mutations that contribute to congenital heart disease


Hsieh A, Morton SU, Willcox JAL, Gorham JM, Tai AC, Qi H, DePalma S, McKean D, Griffin E, Manheimer KB, Bernstein D, Kim RW, Newburger JW, Porter GA Jr, Srivastava D, Tristani-Firouzi M, Brueckner M, Lifton RP, Goldmuntz E, Gelb BD, Chung WK, Seidman CE, Seidman JG, Shen Y.
Genome Med. 2020 Apr 29;12(1):42. doi: 10.1186/s13073-020-00738-1.
PMID: 32349777 Free PMC Article
Similar articles
Select item 32348326



Background: The contribution of somatic mosaicism, or genetic mutations arising after oocyte fertilization, to congenital heart disease (CHD) is not well understood. Further, the relationship between mosaicism in blood and cardiovascular tissue has not been determined.

Methods: We developed a new computational method, EM-mosaic (Expectation-Maximization-based detection of mosaicism), to analyze mosaicism in exome sequences derived primarily from blood DNA of 2530 CHD proband-parent trios. To optimize this method, we measured mosaic detection power as a function of sequencing depth. In parallel, we analyzed our cohort using MosaicHunter, a Bayesian genotyping algorithm-based mosaic detection tool, and compared the two methods. The accuracy of these mosaic variant detection algorithms was assessed using an independent resequencing method. We then applied both methods to detect mosaicism in cardiac tissue-derived exome sequences of 66 participants for which matched blood and heart tissue was available.

Results: EM-mosaic detected 326 mosaic mutations in blood and/or cardiac tissue DNA. Of the 309 detected in blood DNA, 85/97 (88%) tested were independently confirmed, while 7/17 (41%) candidates of 17 detected in cardiac tissue were confirmed. MosaicHunter detected an additional 64 mosaics, of which 23/46 (50%) among 58 candidates from blood and 4/6 (67%) of 6 candidates from cardiac tissue confirmed. Twenty-five mosaic variants altered CHD-risk genes, affecting 1% of our cohort. Of these 25, 22/22 candidates tested were confirmed. Variants predicted as damaging had higher variant allele fraction than benign variants, suggesting a role in CHD. The estimated true frequency of mosaic variants above 10% mosaicism was 0.14/person in blood and 0.21/person in cardiac tissue. Analysis of 66 individuals with matched cardiac tissue available revealed both tissue-specific and shared mosaicism, with shared mosaics generally having higher allele fraction.

Conclusions: We estimate that ~ 1% of CHD probands have a mosaic variant detectable in blood that could contribute to cardiac malformations, particularly those damaging variants with relatively higher allele fraction. Although blood is a readily available DNA source, cardiac tissues analyzed contributed ~ 5% of somatic mosaic variants identified, indicating the value of tissue mosaicism analyses.

Conflict of interest statement

The authors declare that they have no competing interests.

Fig. 1 Mosaic detection pipeline flowchart. Summary of approach for detecting mosaic variants in our cohort of n = 2530 CHD proband-parent trios. EM-mosaic flowchart (left). We first processed our SAMtools de novo calls using our upstream filters (n = 2396 sites passing all filters). We then applied the same upstream filters to the published dnSNVs from Jin et al. (n = 2650 sites passing all filters) before finally taking the union of these two call sets (n = 3192). High-confidence mosaics (n = 309) were defined as mosaics passing IGV inspection and having posterior odds > 10. Italicized text indicates which filters removed candidate mosaic variants called by MosaicHunter but not by EM-mosaic. MosaicHunter workflow (right). Quality control filters excluded any sites that were (1) present in ExAC (2) G>T with Nalt < 10 (3) parent Nalt > 2. Outliers were defined as probands carrying more than 20 mosaics, or non-unique sites. We also removed sites called as germline by GATK Haplotype Caller. High-confidence mosaics (n = 116) were defined as having a likelihood ratio > 80 and affecting coding regions excluding MUC/HLA genes. Italicized text indicates which filters removed variants called by EM-mosaic but not by MosaicHunter

Fig. 2 Mosaic detection by Expectation-Maximization. a Expectation-Maximization (EM) estimation to decompose the variant allele fraction (VAF) distribution of our input variants into mosaic and germline distributions. The EM-estimated prior mosaic fraction was 12.15% and the mean of the mosaic VAF distribution was 0.15. b Read depth vs. VAF distribution of individual variants. The blue line denotes mean VAF (0.49) and the red lines denote the 95% confidence interval under our Beta-Binomial model. Mosaic variants are defined as sites with posterior odds > 10, corresponding to a false discovery rate of 9.1%. Germline variants are represented in black and mosaic variants are represented in red. c Estimated mosaic detection power as a function of average sample depth for values between 40× and 500×

Fig. 3 Mutation spectrum of detected germline and mosaic variants. Rates of specific mutations were compared in a germline, b blood mosaic, and c cardiac tissue mosaic variants. Transitions predominated in both variant sets

Fig. 4 Validated mosaics detected in probands with matched blood and cardiovascular tissue samples available. Validation VAF from blood compared to validation VAF from cardiovascular tissue demonstrated tissue-specific mosaicism (red) as well as shared mosaicism (blue). Predicted effect of mosaic variants corresponds to marker shape

Fig. 5 Damaging mosaics in CHD-related genes have higher variant allele fraction than likely benign mosaics. a Among the 76 mosaics in CHD-related genes, likely damaging variants have a higher VAF than likely benign (Mann-Whitney U p = 0.001). b Among the 233 mosaics in other (non-CHD-related) genes, there is no difference in VAF based on predicted effect (p = 0.985)


source: https://pubmed.ncbi.nlm.nih.gov/32349777