Sequence-level instructions direct transcription at polyT short tandem repeats

Chloe Bessiere, Manu Saraswat, Mathys Grapotte, Christophe Menichelli, Jordan A. Ramilowski, Jessica Severin, Yoshihide Hayashizaki, Masayoshi Itoh, Akira Hasegawa, Harukazu Suzuki, Piero Carninci, Michiel J.L. de Hoon, Wyeth W. Wasserman, Laurent Brehelin & Charles-Henri Lecellier

Jun 04, 2019

Received Date: 20th May 19

Using the Cap Analysis of Gene Expression technology, the FANTOM5 consortium providedone of the most comprehensive maps of Transcription Start Sites (TSSs) in several species.Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventionalregions, outside promoters or enhancers. To determine whether these unconventionalTSSs, sometimes referred to as ’transcriptional noise’ or ’junk’, are relevant nonetheless,we look for novel and conserved regulatory motifs located in their vicinity. We show that,in all species studied, a significant fraction of CAGE peaks initiate at short tandem repeats(STRs) corresponding to homopolymers of thymidines. Biochemical and genetic evidencefurther demonstrate that several of these CAGEs correspond to TSSs of mostly sense andintronic non-coding RNAs, whose transcription rate can be predicted with ~81% accuracyby a sequence-based deep learning model. Excitingly, our model further predicts that geneticvariants linked to human diseases affect this STR-associated transcription. Together, our resultsextend the repertoire of non-coding transcription and provides a valuable resource forfuture studies of complex traits.

Read in full at bioRxiv.

This is an abstract of a preprint hosted on an independent third party site. It has not been peer reviewed but is currently under consideration at Nature Communications.

Nature Communications

Nature Research, Springer Nature