Abstract
This workshop introduces a set of tools and resources supporting plant research data analysis and management developed in the MAdLand context. These tools and resources are all open, FAIR, and free to use. Galaxy is a free and open-source, web-based platform for conducting data-intensive life science research. It offers thousands of popular data analysis tools to enable researchers who lack coding expertise to engage in computational analyses. Moreover, the Galaxy platform promotes collaboration and knowledge sharing among scientists. Its features include accessibility, reproducibility, and data- sharing capabilities to create a collaborative environment that encourages researchers to exchange/publish data analysis protocols, reproduce results, and foster an open science culture within the community. We will introduce the European Galaxy server (https://usegalaxy.eu), which is managed by the University of Freiburg, and freely available to all researchers. The Galaxy Training Network (GTN) offers over 400 comprehensive tutorials across a wide range of scientific topics, which walk learners through real-world analyses in Galaxy step-by-step. MAdLandDB (Genome Zoo) represents a comprehensive protein database accessible through the Galaxy web-based platform (https://usegalaxy.eu). Focusing on non-seed plants and streptophyte algae, it delivers non-redundant, reliable protein sequences by utilizing BLAST and Diamond search functionalities for comparative and evolutionary questions in plant biology. The database contains a collection of 21 million sequences, representing a diverse group of more than 600 species belonging to various lineages, featuring e.g., fungi, animals, phylo-diverse algae, bacteria, and archaea for comparative purposes. It is actively developed and maintained within the MAdLand framework. Additionally, we offer a guided tutorial within the GTN for performing sequence similarity search against the MAdLandDB. This training resource facilitates and enhances the user’s understanding of efficient navigation and analysis. TAPscan v4 represents an advanced tool for genome-wide annotation of plant transcription-associated proteins (TAPs), comprising transcription factors (TFs) and transcriptional regulators (TRs). For v4 of TAPscan, we updated the web interface accessible at https://tapscan.plantcode.cup.uni-freiburg.de which enables an in-depth representation of the distribution of 138 TAP families (including 23 subfamilies) for all species of MAdLandDB. Moreover, TAPscan v4 has also been integrated into Galaxy, empowering users to analyze their datasets. Additionally, we plan to convert PEATmoss (https://peatmoss.plantcode.cup.uni-freiburg.de/), which currently hosts expression data for Physcomitrium patens, into the MAdLand Expression Atlas. This new resource will include expression data for additional MAdLand species such as Anthoceros agrestis, Chara braunii and Spirogyra pratensis. Furthermore, MAdLand is dedicated to its contribution to the DataPLANT Research Data Management (RDM) platform. This platform supports the generation of ARCs (Annotated Research Contexts), ensuring that all research data, including raw data and metadata, is encapsulated in a FAIR, open, and standardized format. ARCs are an RO-crate (Research Object Crate; https://www.researchobject.org/ro- crate/) implementation and provide an easy-to-use way to re-analyze data generated by other labs, e.g. in Galaxy. Conversely, analyses performed in Galaxy can also be exported directly as RO-crates. In this workshop, we aim to introduce these various MAdLand resources for plant research, ways to access them using Galaxy, and how to create ARCs for your research data. We will do this through a combination of slide presentations and live demonstrations.