English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences

MPS-Authors
/persons/resource/persons210836

Waldmann,  J.
Microbial Genomics Group, Department of Molecular Ecology, Max Planck Institute for Marine Microbiology, Max Planck Society;

/persons/resource/persons210397

Gerken,  J.
Microbial Genomics Group, Department of Molecular Ecology, Max Planck Institute for Marine Microbiology, Max Planck Society;

/persons/resource/persons210435

Hankeln,  W.
Microbial Genomics Group, Department of Molecular Ecology, Max Planck Institute for Marine Microbiology, Max Planck Society;

/persons/resource/persons210772

Schweer,  T.
Microbial Genomics Group, Department of Molecular Ecology, Max Planck Institute for Marine Microbiology, Max Planck Society;

/persons/resource/persons210403

Glöckner,  F.O.
Microbial Genomics Group, Department of Molecular Ecology, Max Planck Institute for Marine Microbiology, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

Waldmann14.pdf
(Publisher version), 306KB

Supplementary Material (public)
There is no public supplementary material available
Citation

Waldmann, J., Gerken, J., Hankeln, W., Schweer, T., & Glöckner, F. (2014). FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences. BMC Research Notes, 7: 365, pp. 1-4.


Cite as: https://hdl.handle.net/21.11116/0000-0001-C561-3
Abstract
Background:
Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered.
Findings:
FastaValidator represents a platform-independent, standardized, light-weight software library written in
the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines.
Conclusions:
The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data.