hide
Free keywords:
Microarray; oligo selection; probe design; suffix array; matching statistics; longest common substring; longest common factor
Abstract:
We present a fast method that selects oligonucleotide probes (such as DNA 25-mers) for microarray experiments on a truly large scale. For example, reliable oligos for human genes can be found within four days, a speedup of one to two orders of magnitude compared to previous approaches. This speed is attained by using the longest common substring as a specificity measure for candidate oligos. We present a space- and time-efficient algorithm, based on a suffix array with additional information, to compute matching statistics (lengths of longest matches) between all candidate oligos and all remaining sequences. With the matching statistics available, we show how to incorporate constraints such as oligo length, melting temperature, and self-complementarity into the selection process at a postprocessing stage. As a result, we can now design custom oligos for any sequenced genome, just as the technology for on-site chip synthesis is becoming increasingly mature.