In silico prediction of high-resolution Hi-C interaction matrices
Shilu Zhang, Deborah Chasman, Sara Knaack, and Sushmita Roy
Received Date: 31st August 2018
The three-dimensional organization of the genome plays an important role in gene regulation by en- abling distal sequence elements to control the expression level of genes hundreds of kilobases away. Hi-C is a powerful genome-wide technique to measure the contact count of pairs of genomic loci needed to study three-dimensional organization. Due to experimental costs high resolution Hi-C datasets are available only for a handful of cell lines. Computational prediction of Hi-C contact counts can offer a scalable and inexpensive approach to examine three-dimensional genome organization across many cellular contexts. Here we present HiC-Reg, a novel approach to predict contact counts from one-dimensional regulatory signals such as epigenetic marks and regulatory protein binding. HiC-Reg exploits the signal from the region spanning two interacting regions and from across multiple cell lines to generalize to new contexts. Using existing feature importance measures and a new matrix factorization based approach, we found CTCF and chromatin marks, especially repressive and elongation marks, as important for pre- dictive performance. Predicted counts from HiC-Reg identify topologically associated domains as well as significant interactions that are enriched for CTCF bi-directional motifs and agree well with interac- tions identified from complementary long-range interaction assays. Taken together, HiC-Reg provides a powerful framework to generate high-resolution profiles of contact counts that can be used to study individual locus level interactions as well as higher-order organizational units of the genome.
Read in full at bioRxiv.
This is an abstract of a preprint hosted on an independent third party site. It has not been peer reviewed but is currently under consideration at Nature Communications.