English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Poster

Multiple reference genomes and transcriptomes for Arabidopsis thaliana

MPS-Authors
/persons/resource/persons84969

Stegle,  O       
Department Molecular Biology, Max Planck Institute for Developmental Biology, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Drewe, P., Stegle, O., Behr, J., Clark, R., Rätsch, G., & Mott, R. (2011). Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Poster presented at 4th Berlin Summer Meeting: “Computational & Experimental Molecular Biology Meet”, Berlin, Germany.


Cite as: https://hdl.handle.net/21.11116/0000-000C-292D-5
Abstract
We have sequenced the genomes of 18 inbred accessions of Arabidopsis thaliana at ~40x coverage using paired-end Illumina sequencing with different insert sizes. We developed an assembly pipeline that uses iterative read mapping and de novo assembly to accurately recover genome sequences with an error rate close to 1 in 10kb in single copy regions of the genome, and 1 in 1kb in repetitive or transposon rich loci, as assessed with independent data. Naive projection of the coordinates of the 27,416 protein coding genes in the reference annotation onto the 18 genomes predicted large effect disruptions in 8,652 (32%), suggestion that A. thaliana can survive prevalent gene disruptions. We developed a pipeline for de novo annotation combining computational gene prediction and RNA-seq data from plant seedlings. We re-annotated each genome, finding that whilst there is considerable variation in gene structure, compensating changes help to ensure that many altered transcripts still retain function. Thus 8,757 genes had at least one additional or modified transcript in at least one accession. We also investigated transcript's diversity in relation to the variation of their 40,578 inferred protein sequences, finding 3,840 (9.5%) proteins that had less than 50% amino-acid sequence identity with the corresponding TAIR10 proteins. Protein diversity varied across gene models and we found isoforms with severe disruptions to occur with low frequency. To complement the genotype-focused analysis, we investigated the quantitative transcriptome variation using RNA-seq. We found 20,963 (78%) of all protein genes to be expressed in at least one strain, with 9,360 (45%) exhibiting significant variation between strains. Mapping causal variants that affect gene expression, we identified variants associated with expression polymorphisms near 941 (10%) of differentially expressed genes. These candidate cis-eQTLs are tightly mapped, and analysis of the location of eQTLs relative to local gene models revealed an excess of associations in regulatory regions.