date: 2021-12-21T08:06:55Z pdf:unmappedUnicodeCharsPerPage: 17 pdf:PDFVersion: 1.7 pdf:docinfo:title: The Reasonable Effectiveness of Randomness in Scalable and Integrative Gene Regulatory Network Inference and Beyond xmp:CreatorTool: LaTeX with hyperref Keywords: scalable gene regulatory network inference; randomized algorithms; multi-omics data integration access_permission:modify_annotations: true access_permission:can_print_degraded: true subject: Gene regulation is orchestrated by a vast number of molecules, including transcription factors and co-factors, chromatin regulators, as well as epigenetic mechanisms, and it has been shown that transcriptional misregulation, e.g., caused by mutations in regulatory sequences, is responsible for a plethora of diseases, including cancer, developmental or neurological disorders. As a consequence, decoding the architecture of gene regulatory networks has become one of the most important tasks in modern (computational) biology. However, to advance our understanding of the mechanisms involved in the transcriptional apparatus, we need scalable approaches that can deal with the increasing number of large-scale, high-resolution, biological datasets. In particular, such approaches need to be capable of efficiently integrating and exploiting the biological and technological heterogeneity of such datasets in order to best infer the underlying, highly dynamic regulatory networks, often in the absence of sufficient ground truth data for model training or testing. With respect to scalability, randomized approaches have proven to be a promising alternative to deterministic methods in computational biology. As an example, one of the top performing algorithms in a community challenge on gene regulatory network inference from transcriptomic data is based on a random forest regression model. In this concise survey, we aim to highlight how randomized methods may serve as a highly valuable tool, in particular, with increasing amounts of large-scale, biological experiments and datasets being collected. Given the complexity and interdisciplinary nature of the gene regulatory network inference problem, we hope our survey maybe helpful to both computational and biological scientists. It is our aim to provide a starting point for a dialogue about the concepts, benefits, and caveats of the toolbox of randomized methods, since unravelling the intricate web of highly dynamic, regulatory events will be one fundamental step in understanding the mechanisms of life and eventually developing efficient therapies to treat and cure diseases. dc:creator: Michael Banf and Thomas Hartwig dcterms:created: 2021-12-21T07:58:07Z Last-Modified: 2021-12-21T08:06:55Z dcterms:modified: 2021-12-21T08:06:55Z dc:format: application/pdf; version=1.7 title: The Reasonable Effectiveness of Randomness in Scalable and Integrative Gene Regulatory Network Inference and Beyond Last-Save-Date: 2021-12-21T08:06:55Z pdf:docinfo:creator_tool: LaTeX with hyperref access_permission:fill_in_form: true pdf:docinfo:keywords: scalable gene regulatory network inference; randomized algorithms; multi-omics data integration pdf:docinfo:modified: 2021-12-21T08:06:55Z meta:save-date: 2021-12-21T08:06:55Z pdf:encrypted: false dc:title: The Reasonable Effectiveness of Randomness in Scalable and Integrative Gene Regulatory Network Inference and Beyond modified: 2021-12-21T08:06:55Z cp:subject: Gene regulation is orchestrated by a vast number of molecules, including transcription factors and co-factors, chromatin regulators, as well as epigenetic mechanisms, and it has been shown that transcriptional misregulation, e.g., caused by mutations in regulatory sequences, is responsible for a plethora of diseases, including cancer, developmental or neurological disorders. As a consequence, decoding the architecture of gene regulatory networks has become one of the most important tasks in modern (computational) biology. However, to advance our understanding of the mechanisms involved in the transcriptional apparatus, we need scalable approaches that can deal with the increasing number of large-scale, high-resolution, biological datasets. In particular, such approaches need to be capable of efficiently integrating and exploiting the biological and technological heterogeneity of such datasets in order to best infer the underlying, highly dynamic regulatory networks, often in the absence of sufficient ground truth data for model training or testing. With respect to scalability, randomized approaches have proven to be a promising alternative to deterministic methods in computational biology. As an example, one of the top performing algorithms in a community challenge on gene regulatory network inference from transcriptomic data is based on a random forest regression model. In this concise survey, we aim to highlight how randomized methods may serve as a highly valuable tool, in particular, with increasing amounts of large-scale, biological experiments and datasets being collected. Given the complexity and interdisciplinary nature of the gene regulatory network inference problem, we hope our survey maybe helpful to both computational and biological scientists. It is our aim to provide a starting point for a dialogue about the concepts, benefits, and caveats of the toolbox of randomized methods, since unravelling the intricate web of highly dynamic, regulatory events will be one fundamental step in understanding the mechanisms of life and eventually developing efficient therapies to treat and cure diseases. pdf:docinfo:subject: Gene regulation is orchestrated by a vast number of molecules, including transcription factors and co-factors, chromatin regulators, as well as epigenetic mechanisms, and it has been shown that transcriptional misregulation, e.g., caused by mutations in regulatory sequences, is responsible for a plethora of diseases, including cancer, developmental or neurological disorders. As a consequence, decoding the architecture of gene regulatory networks has become one of the most important tasks in modern (computational) biology. However, to advance our understanding of the mechanisms involved in the transcriptional apparatus, we need scalable approaches that can deal with the increasing number of large-scale, high-resolution, biological datasets. In particular, such approaches need to be capable of efficiently integrating and exploiting the biological and technological heterogeneity of such datasets in order to best infer the underlying, highly dynamic regulatory networks, often in the absence of sufficient ground truth data for model training or testing. With respect to scalability, randomized approaches have proven to be a promising alternative to deterministic methods in computational biology. As an example, one of the top performing algorithms in a community challenge on gene regulatory network inference from transcriptomic data is based on a random forest regression model. In this concise survey, we aim to highlight how randomized methods may serve as a highly valuable tool, in particular, with increasing amounts of large-scale, biological experiments and datasets being collected. Given the complexity and interdisciplinary nature of the gene regulatory network inference problem, we hope our survey maybe helpful to both computational and biological scientists. It is our aim to provide a starting point for a dialogue about the concepts, benefits, and caveats of the toolbox of randomized methods, since unravelling the intricate web of highly dynamic, regulatory events will be one fundamental step in understanding the mechanisms of life and eventually developing efficient therapies to treat and cure diseases. Content-Type: application/pdf pdf:docinfo:creator: Michael Banf and Thomas Hartwig X-Parsed-By: org.apache.tika.parser.DefaultParser creator: Michael Banf and Thomas Hartwig meta:author: Michael Banf and Thomas Hartwig dc:subject: scalable gene regulatory network inference; randomized algorithms; multi-omics data integration meta:creation-date: 2021-12-21T07:58:07Z created: 2021-12-21T07:58:07Z access_permission:extract_for_accessibility: true access_permission:assemble_document: true xmpTPg:NPages: 28 Creation-Date: 2021-12-21T07:58:07Z pdf:charsPerPage: 3713 access_permission:extract_content: true access_permission:can_print: true meta:keyword: scalable gene regulatory network inference; randomized algorithms; multi-omics data integration Author: Michael Banf and Thomas Hartwig producer: pdfTeX-1.40.21 access_permission:can_modify: true pdf:docinfo:producer: pdfTeX-1.40.21 pdf:docinfo:created: 2021-12-21T07:58:07Z