Genome comparison without alignment using shortest unique substrings

Haubold, B.; Pierstorff, N.; Möller, F.; Wiehe, T.

doi:10.1186/1471-2105-6-123

Genome comparison without alignment using shortest unique substrings

Haubold, B., Pierstorff, N., Möller, F., & Wiehe, T. (2005). Genome comparison without alignment using shortest unique substrings. BMC Bioinformatics, 6. doi:10.1186/1471-2105-6-123.

Item is 公開

表示: 全項目非表示: 全項目

基本情報

表示: 非表示:

アイテムのパーマリンク: https://hdl.handle.net/11858/00-001M-0000-0010-0FC5-F 版のパーマリンク: https://hdl.handle.net/11858/00-001M-0000-0010-0FC6-D

資料種別: 学術論文

ファイル

表示: ファイル

作成者

表示:

非表示:

作成者:
Haubold, B.¹, 著者
Pierstorff, N.², 著者
Möller, F.², 著者
Wiehe, T.², 著者

所属:
1External, ou_persistent22
2external, ou_persistent22

内容説明

表示:

非表示:

キーワード: -

要旨: Background: Sequence comparison by alignment is a fundamental tool of molecular biology. In this paper we show how a number of sequence comparison tasks, including the detection of unique genomic regions, can be accomplished efficiently without an alignment step. Our procedure for nucleotide sequence comparison is based on shortest unique substrings. These are substrings which occur only once within the sequence or set of sequences analysed and which cannot be further reduced in length without losing the property of uniqueness. Such substrings can be detected using generalized suffix trees. Results: We find that the shortest unique substrings in Caenorhabditis elegans, human and mouse are no longer than 11 bp in the autosomes of these organisms. In mouse and human these unique substrings are significantly clustered in upstream regions of known genes. Moreover, the probability of finding such short unique substrings in the genomes of human or mouse by chance is extremely small. We derive an analytical expression for the null distribution of shortest unique substrings, given the GC-content of the query sequences. Furthermore, we apply our method to rapidly detect unique genomic regions in the genome of Staphylococcus aureus strain MSSA476 compared to four other staphylococcal genomes. Conclusion: We combine a method to rapidly search for shortest unique substrings in DNA sequences and a derivation of their null distribution. We show that unique regions in an arbitrary sample of genomes can be efficiently detected with this method. The corresponding programs shustring (SHortest Unique subSTRING) and shulen are written in C and available at http://adenine. biz.fh-weihenstephan.de/shustring/.

資料詳細

表示:

非表示:

言語:

日付: 作成: 2005出版: 2005

出版の状態: 出版

ページ: -

出版情報: -

目次: -

査読: -

識別子（DOI, ISBNなど）: ISI: 000230143100001
DOI: 10.1186/1471-2105-6-123

学位: -

訴訟

表示:

Project information

表示:

出版物 1

表示:

非表示:

出版物名: BMC Bioinformatics

種別: 学術雑誌

著者・編者:

所属:

出版社, 出版地: BioMed Central

ページ: - 巻号: 6 通巻号: - 開始・終了ページ: - 識別子（ISBN, ISSN, DOIなど）: ISSN: 1471-2105
CoNE: https://pure.mpg.de/cone/journals/resource/111000136905000

アイテム詳細

基本情報

ファイル

関連URL

作成者

内容説明

資料詳細

関連イベント

訴訟

Project information

出版物 1