English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Robust cross-platform workflows: how technical and scientific communities collaborate to develop, test and share best practices for data analysis

Möller, S., Prescott, S. W., Wirzenius, L., Reinholdtsen, P., Chapman, B., Prins, P., et al. (2017). Robust cross-platform workflows: how technical and scientific communities collaborate to develop, test and share best practices for data analysis. Data Science and Engineering, 2, 232-244. doi:10.1007/s41019-017-0050-4.

Item is

Files

show Files

Locators

show
hide
Description:
-
OA-Status:

Creators

show
hide
 Creators:
Möller, Steffen, Author
Prescott, Stuart W., Author
Wirzenius, Lars, Author
Reinholdtsen, Petter, Author
Chapman, Brad, Author
Prins, Pjotr, Author
Soiland-Reyes, Stian, Author
Klötzl, Fabian1, Author           
Bagnacani, Andrea, Author
Kalaš, Matúš, Author
Tille, Andreas, Author
Crusoe, Michael R., Author
Affiliations:
1Research Group Bioinformatics, Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Max Planck Society, ou_1445644              

Content

show
hide
Free keywords: Continuous integration testing; Common workflow language; Container; Software distribution; Automated installation
 Abstract: Information integration and workflow technologies for data analysis have always been major fields of investigation in bioinformatics. A range of popular workflow suites are available to support analyses in computational biology. Commercial providers tend to offer prepared applications remote to their clients. However, for most academic environments with local expertise, novel data collection techniques or novel data analysis, it is essential to have all the flexibility of open-source tools and open-source workflow descriptions. Workflows in data-driven science such as computational biology have considerably gained in complexity. New tools or new releases with additional features arrive at an enormous pace, and new reference data or concepts for quality control are emerging. A well-abstracted workflow and the exchange of the same across work groups have an enormous impact on the efficiency of research and the further development of the field. High-throughput sequencing adds to the avalanche of data available in the field; efficient computation and, in particular, parallel execution motivate the transition from traditional scripts and Makefiles to workflows. We here review the extant software development and distribution model with a focus on the role of integration testing and discuss the effect of common workflow language on distributions of open-source scientific software to swiftly and reliably provide the tools demanded for the execution of such formally described workflows. It is contended that, alleviated from technical differences for the execution on local machines, clusters or the cloud, communities also gain the technical means to test workflow-driven interaction across several software packages.

Details

show
hide
Language(s): eng - English
 Dates: 2017
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: URI: https://doi.org/10.1007/s41019-017-0050-4
Other: Möller2017
DOI: 10.1007/s41019-017-0050-4
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: Data Science and Engineering
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: Springer Berlin Heidelberg
Pages: - Volume / Issue: 2 Sequence Number: - Start / End Page: 232 - 244 Identifier: Other: 2364-1185
Other: 2364-1541
CoNE: https://pure.mpg.de/cone/journals/resource/2364-1185