English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Poster

DataJoint: Managing Big Scientific Data Using Matlab or Python

MPS-Authors
/persons/resource/persons83896

Ecker,  A
Research Group Computational Vision and Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;
Department Physiology of Cognitive Processes, Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

Link
(Any fulltext)

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Reimer, J., Yatsenko, D., Ecker, A., Walker, E., Sinz, F., Berens, P., et al. (2016). DataJoint: Managing Big Scientific Data Using Matlab or Python. Poster presented at AREADNE 2016: Research in Encoding And Decoding of Neural Ensembles, Santorini, Greece.


Cite as: https://hdl.handle.net/21.11116/0000-0000-7B76-2
Abstract
The rise of big data in modern research poses serious challenges for data management: Large and intricate datasets from diverse instrumentation must be precisely aligned, annotated, and organized in a flexible way that allows swift exploration and analysis. Data management should guarantee consistency of intermediate results in subsequent multi-step processing pipelines such that changes in one part automatically propagate to all downstream results. Finally, data organization should be self-documenting to ensure that lab members and collaborators can access the data with minimal effort, even years after it was collected. While high levels of data integrity are expected, research teams have diverse backgrounds, are geographically dispersed, and rarely possess a primary interest in data science. While the challenges associated with large, complex data sets may be new to biologists, they have been addressed quite successfully in other contexts by relational databases, which provide a principled approach for effective data management. DataJoint is an open-source framework that provides a clean implementation of core concepts of the relational data model to facilitate multi-user access, effcient queries, distributed computing, and cascading dependencies across multiple data modalities. Critically, while DataJoint relies on an established relational database management system (MySQL) as its backend, data access and manipulation are performed transparently in the commonly-used languages MATLAB or Python, and DataJoint can be integrated into new and existing analyses written in these languages with minimal effort or additional training. DataJoint is not limited to particular file formats, acquisition systems, or data modalities and can be quickly adapted to new experimental designs. DataJoint and related resources are available at http://datajoint.github.com.