hide
Free keywords:
-
Abstract:
Motivation: The number of Single Nucleotide Polymorphisms (SNPs) detectable in an alignment is a function of the length and the number of the aligned sequences. The latter is called sample size. However, a typical alignment, for instance obtained as a BLAST-search result of a query sequence against an EST database, does not evenly cover the query sequence. Therefore, it is usually not clear what the actual sample size is.
Results: We present a method to calculate the effective sample size, called n(eff), for a given BLAST alignment. This method takes into account that multiple coverage contributes only logarithmically to the SNP yield of a given sequence stretch. We show that the effective sample size n(eff) is usually much smaller than would be expected for a given amount of coverage and illustrate this with two typical examples.