Spaced words and kmacs: Fast alignment-free sequence comparison based on 
inexact word matches.

Horwege, S.; Lindner, S.; Boden, M.; Hatje, K.; Kollmar, M.; Leimeister, C. A.; Morgenstern, B.

doi:10.1093/nar/gku398

Lokale TagsFreigabegeschichteDetailsÜbersicht

Spaced words and kmacs: Fast alignment-free sequence comparison based on inexact word matches.

Horwege, S., Lindner, S., Boden, M., Hatje, K., Kollmar, M., Leimeister, C. A., et al. (2014). Spaced words and kmacs: Fast alignment-free sequence comparison based on inexact word matches. Nucleic Acids Research, 42(W1), W7-W11. doi:10.1093/nar/gku398.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-0023-C6E4-C Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-0027-CC8A-1

Genre: Zeitschriftenartikel

Dateien

einblenden: Dateien

ausblenden: Dateien

:

2053244.pdf (Verlagsversion), 576KB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/11858/00-001M-0000-0024-1B91-1

Name:
2053244.pdf

Beschreibung:
-

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
-

Externe Referenzen

einblenden:

ausblenden:

externe Referenz:
http://nar.oxfordjournals.org/content/42/W1/W7.full.pdf+html (Verlagsversion) Open Access Status unbekannt

Beschreibung:
-

OA-Status:

Urheber

einblenden:

ausblenden:

Urheber:
Horwege, S., Autor
Lindner, S., Autor
Boden, M., Autor
Hatje, K.¹, Autor
Kollmar, M.¹, Autor
Leimeister, C. A., Autor
Morgenstern, B., Autor

Affiliations:
1Research Group of Systems Biology of Motor Proteins, MPI for biophysical chemistry, Max Planck Society, ou_578570

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: In this article, we present a user-friendly web interface for two alignment-free sequence-comparison methods that we recently developed. Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms are based on inexact word matches. The first of these approaches uses the relative frequencies of so-called spaced words in the input sequences, i.e. words containing 'don't care' or 'wildcard' symbols at certain pre-defined positions. Various distance measures can then be defined on sequences based on their different spaced-word composition. Our second approach defines the distance between two sequences by estimating for each position in the first sequence the length of the longest substring at this position that also occurs in the second sequence with up to k mismatches. Both approaches take a set of deoxyribonucleic acid (DNA) or protein sequences as input and return a matrix of pairwise distance values that can be used as a starting point for clustering algorithms or distance-based phylogeny reconstruction.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Online veröffentlicht: 2014-05-14Erschienen: 2014-07-01

Publikationsstatus: Erschienen

Seiten: -

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: Expertenbegutachtung

Identifikatoren: DOI: 10.1093/nar/gku398

Art des Abschluß: -

ausblenden:

Titel: Nucleic Acids Research

Genre der Quelle: Zeitschrift

Urheber:

Affiliations:

Ort, Verlag, Ausgabe: -

Seiten: - Band / Heft: 42 (W1) Artikelnummer: - Start- / Endseite: W7 - W11 Identifikator: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1