hide
Free keywords:
-
Abstract:
Motivation: PCR markers are routinely constructed by taking regions common to the genomes of a target organism and subtracting the regions found in the targets’ closest relatives, their neighbors. This approach is implemented in the package Fur for quickly finding all regions common to a set of target genomes that are absent from their neighbors. The original Fur required memory proportional to the size of the neighborhood, which does not scale well.
Results: Our new version of Fur only requires memory proportional to the longest neighbor. In spite of its greater memory efficiency, the new Fur remains fast and is accurate. We demonstrate this through application to simulated sequences and comparison to an efficient alternative. Then we use the new Fur to extract markers from 118 reference bacteria, some of which have hundreds of sequenced target genomes and over 1000 neighbor genomes. We pick the best primers from the ten most sequenced taxa and show their excellent in silico sensitivity and specificity. In six taxa the marker amplicons intersect genes encoding a “hypothetical protein”, underscoring the importance of hypothesis-free marker discovery from whole genome sequences. To make this feasible, we also introduce software for automatically finding target and marker genomes and for assessing markers.