English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Marker discovery in the large

MPS-Authors
/persons/resource/persons278324

Mourato,  Beatriz Vieira
IMPRS for Evolutionary Biology, Max Planck Institute for Evolutionary Biology, Max Planck Society;
Research Group Bioinformatics (Haubold), Max Planck Institute for Evolutionary Biology, Max Planck Society;

/persons/resource/persons287255

Tsers,  Ivan       
Research Group Bioinformatics (Haubold), Max Planck Institute for Evolutionary Biology, Max Planck Society;

/persons/resource/persons56719

Haubold,  Bernhard
Research Group Bioinformatics (Haubold), Max Planck Institute for Evolutionary Biology, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Mourato, B. V., Tsers, I., Denker, S., Klotzl, F., & Haubold, B. (submitted). Marker discovery in the large.


Cite as: https://hdl.handle.net/21.11116/0000-000E-751E-E
Abstract
Motivation: PCR markers are routinely constructed by taking regions common to the genomes of a target organism and subtracting the regions found in the targets’ closest relatives, their neighbors. This approach is implemented in the package Fur for quickly finding all regions common to a set of target genomes that are absent from their neighbors. The original Fur required memory proportional to the size of the neighborhood, which does not scale well.

Results: Our new version of Fur only requires memory proportional to the longest neighbor. In spite of its greater memory efficiency, the new Fur remains fast and is accurate. We demonstrate this through application to simulated sequences and comparison to an efficient alternative. Then we use the new Fur to extract markers from 118 reference bacteria, some of which have hundreds of sequenced target genomes and over 1000 neighbor genomes. We pick the best primers from the ten most sequenced taxa and show their excellent in silico sensitivity and specificity. In six taxa the marker amplicons intersect genes encoding a “hypothetical protein”, underscoring the importance of hypothesis-free marker discovery from whole genome sequences. To make this feasible, we also introduce software for automatically finding target and marker genomes and for assessing markers.