hide
Free keywords:
-
Abstract:
We present a method to detect copy number variants (CNVs) that are differentially present between two groups of sequenced samples. We use a multi-tape finite-state transducer where the emitted read depth is conditioned on the mappability and GC-content of all reads that could cover a given base position. In this model, the read depth within a region is a mixture of binomials, which in simulations matches the read depth more closely than the often-used negative binomial distribution. The method analyzes all samples simultaneously, preserving uncertainty as to the breakpoints and magnitude of CNVs present in an individual when it identifies CNVs differentially present between the two groups. This unified approach outperforms alternative methods that execute these tasks serially, first identifying copy number variants in individuals and then identifying which variants are consistently correlated with a trait of interest. We apply this transducer method to identify CNVs that are recurrently associated with postglacial adaptation of marine Threespine Stickleback (Gasterosteus aculeatus) to freshwater. We identify 6664 regions of the stickleback genome, totaling 1.7Mbp, which show consistent copy number differences between multiple different marine and freshwater populations. These deletions and duplications affect both protein-coding genes and cis-regulatory elements, including a noncoding intronic telencephalon enhancer of DCHS1. The functions of the genes near or included within the 6664 CNVs are enriched for immunity and muscle development, as well as head and limb morphology. These functions match consistent phenotypic differences that have evolved repeatedly between marine and freshwater stickleback populations. While freshwater stickleback have been iteratively derived from ancestral marine populations that are thought to have been relatively static, we show that freshwater stickleback populations can also act as reservoirs for ancient sequences that are conserved to other teleosts, but largely missing from marine stickleback due to recent selective sweeps in marine populations.