English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences

Waldmann, J., Gerken, J., Hankeln, W., Schweer, T., & Glöckner, F. (2014). FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences. BMC Research Notes, 7: 365, pp. 1-4.

Item is

Files

show Files
hide Files
:
Waldmann14.pdf (Publisher version), 306KB
Name:
Waldmann14.pdf
Description:
-
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Waldmann, J.1, Author           
Gerken, J.1, Author           
Hankeln, W.1, Author           
Schweer, T.1, Author           
Glöckner, F.O.1, Author           
Affiliations:
1Microbial Genomics Group, Department of Molecular Ecology, Max Planck Institute for Marine Microbiology, Max Planck Society, ou_2481697              

Content

show
hide
Free keywords: -
 Abstract: Background:
Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered.
Findings:
FastaValidator represents a platform-independent, standardized, light-weight software library written in
the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines.
Conclusions:
The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data.

Details

show
hide
Language(s): eng - English
 Dates: 2014-06-14
 Publication Status: Issued
 Pages: 4
 Publishing info: -
 Table of Contents: -
 Rev. Type: Internal
 Identifiers: eDoc: 700987
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: BMC Research Notes
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: 7 Sequence Number: 365 Start / End Page: 1 - 4 Identifier: -