English
 
User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Spaced words and kmacs: Fast alignment-free sequence comparison based on inexact word matches.

Horwege, S., Lindner, S., Boden, M., Hatje, K., Kollmar, M., Leimeister, C. A., et al. (2014). Spaced words and kmacs: Fast alignment-free sequence comparison based on inexact word matches. Nucleic Acids Research, 42(W1), W7-W11. doi:10.1093/nar/gku398.

Item is

Basic

show hide
Item Permalink: http://hdl.handle.net/11858/00-001M-0000-0023-C6E4-C Version Permalink: http://hdl.handle.net/11858/00-001M-0000-0027-CC8A-1
Genre: Journal Article

Files

show Files
hide Files
:
2053244.pdf (Publisher version), 576KB
Name:
2053244.pdf
Description:
-
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show
hide
Description:
-

Creators

show
hide
 Creators:
Horwege, S., Author
Lindner, S., Author
Boden, M., Author
Hatje, K.1, Author              
Kollmar, M.1, Author              
Leimeister, C. A., Author
Morgenstern, B., Author
Affiliations:
1Research Group of Systems Biology of Motor Proteins, MPI for biophysical chemistry, Max Planck Society, ou_578570              

Content

show
hide
Free keywords: -
 Abstract: In this article, we present a user-friendly web interface for two alignment-free sequence-comparison methods that we recently developed. Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms are based on inexact word matches. The first of these approaches uses the relative frequencies of so-called spaced words in the input sequences, i.e. words containing 'don't care' or 'wildcard' symbols at certain pre-defined positions. Various distance measures can then be defined on sequences based on their different spaced-word composition. Our second approach defines the distance between two sequences by estimating for each position in the first sequence the length of the longest substring at this position that also occurs in the second sequence with up to k mismatches. Both approaches take a set of deoxyribonucleic acid (DNA) or protein sequences as input and return a matrix of pairwise distance values that can be used as a starting point for clustering algorithms or distance-based phylogeny reconstruction.

Details

show
hide
Language(s): eng - English
 Dates: 2014-05-142014-07-01
 Publication Status: Published in print
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Method: Peer
 Identifiers: DOI: 10.1093/nar/gku398
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Nucleic Acids Research
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: 42 (W1) Sequence Number: - Start / End Page: W7 - W11 Identifier: -