A Diploid Assembly-based Benchmark for Variants in the Major Histocompatibility Complex
Chen-Shan Chin, Justin Wagner, Qiandong Zeng, Erik Garrison, Shilpa Garg, Arkarachai Fungtammasan, Mikko Rautiainen, Tobias Marschall, Alexander T Dilthey, Justin M. Zook
Received Date: 21st December 19
We develop the first human genome benchmark derived from a diploid assembly for the openly-consented Genome in a Bottle/Personal Genome Project Ashkenazi son (HG002). As a proof-of-principle, we focus on a medically important, highly variable, 5 million base-pair region - the Major Histocompatibility Complex (MHC). Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long reads and linked reads now enable us to construct base-level accurate, phased de novo assemblies from the reads. We assemble a single haplotig (haplotype-specific contig) for each haplotype, and align reads back to each assembled haplotig to identify two regions of lower confidence. We align the haplotigs to the reference, call phased small and structural variants, and define the first small variant benchmark for the MHC, covering 21496 small variants in 4.58 million base-pairs (92 % of the MHC). The assembly-based benchmark is 99.95 % concordant with a draft mapping-based benchmark from the same long and linked reads within both benchmark regions, but covers 50 % more variants outside the mapping-based benchmark regions. The haplotigs and variant calls are completely concordant with phased clinical HLA types for HG002. This benchmark reliably identifies false positives and false negatives from mapping-based callsets, and enables performance assessment in regions with much denser, complex variation than regions covered by previous benchmarks. These methods demonstrate a path towards future diploid assembly-based benchmarks for other complex regions of the genome.
Read in full at bioRxiv.
This is an abstract of a preprint hosted on an independent third party site. It has not been peer reviewed but is currently under consideration at Nature Communications.