English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity

Jaegle, B., Pisupati, H., Soto-Jiménez, L., Burns, R., Rabanal, F., & Nordborg, M. (2023). Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity. Genome Biology, 24(1): 44. doi:10.1186/s13059-023-02875-3.

Item is

Files

show Files

Locators

show

Creators

show
hide
 Creators:
Jaegle, B, Author
Pisupati, H, Author
Soto-Jiménez, LM, Author
Burns, R, Author
Rabanal, FA1, Author                 
Nordborg, M, Author
Affiliations:
1Department Molecular Biology, Max Planck Institute for Developmental Biology, Max Planck Society, ou_3375790              

Content

show
hide
Free keywords: -
 Abstract: Background: It is apparent that genomes harbor much structural variation that is largely undetected for technical reasons. Such variation can cause artifacts when short-read sequencing data are mapped to a reference genome. Spurious SNPs may result from mapping of reads to unrecognized duplicated regions. Calling SNP using the raw reads of the 1001 Arabidopsis Genomes Project we identified 3.3 million (44%) heterozygous SNPs. Given that Arabidopsis thaliana (A. thaliana) is highly selfing, and that extensively heterozygous individuals have been removed, we hypothesize that these SNPs reflected cryptic copy number variation.
Results: The heterozygosity we observe consists of particular SNPs being heterozygous across individuals in a manner that strongly suggests it reflects shared segregating duplications rather than random tracts of residual heterozygosity due to occasional outcrossing. Focusing on such pseudo-heterozygosity in annotated genes, we use genome-wide association to map the position of the duplicates. We identify 2500 putatively duplicated genes and validate them using de novo genome assemblies from six lines. Specific examples included an annotated gene and nearby transposon that transpose together. We also demonstrate that cryptic structural variation produces highly inaccurate estimates of DNA methylation polymorphism.
Conclusions: Our study confirms that most heterozygous SNP calls in A. thaliana are artifacts and suggest that great caution is needed when analyzing SNP data from short-read sequencing. The finding that 10% of annotated genes exhibit copy-number variation, and the realization that neither gene- nor transposon-annotation necessarily tells us what is actually mobile in the genome suggests that future analyses based on independently assembled genomes will be very informative.

Details

show
hide
Language(s):
 Dates: 2023-03
 Publication Status: Published online
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: DOI: 10.1186/s13059-023-02875-3
PMID: 36895055
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Genome Biology
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: London : BioMed Central Ltd.
Pages: 19 Volume / Issue: 24 (1) Sequence Number: 44 Start / End Page: - Identifier: ISSN: 1465-6906
CoNE: https://pure.mpg.de/cone/journals/resource/1000000000224390_1