English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Algorithms for finding RNA sequence-structure motifs

Winkler, J. (2023). Algorithms for finding RNA sequence-structure motifs. PhD Thesis. doi:10.17169/refubium-40588.

Item is

Files

show Files

Locators

show

Creators

show
hide
 Creators:
Winkler, Jörg1, 2, Author           
Reinert, Knut3, Referee                 
Affiliations:
1Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society, ou_1433547              
2Fachbereich Mathematik und Informatik der Freien Universität Berlin, ou_persistent22              
3Efficient Algorithms for Omics Data (Knut Reinert), Max Planck Fellow Group, Max Planck Institute for Molecular Genetics, Max Planck Society, ou_2385698              

Content

show
hide
Free keywords: Bioinformatics RNA secondary structure Alignment algorithm
 Abstract: The function of non-coding RNA sequences is largely determined by their spatial conformation. This is the secondary structure of the molecule, which is formed by Watson–Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. Essential tasks for discovering yet unknown RNA families and inferring their possible functions are the structural alignment of RNAs and the subsequent search of the derived structural motifs. These tasks demand a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains pseudoknots, which are overlapping interactions that add additional complexity to the analysis and are often ignored in available software.

In this thesis, I present LaRA 2 and MaRs, two SeqAn-based software tools that implement algorithms for finding sequence-structure motifs in genomic sequences. In contrast to other programs, my tools can handle arbitrary pseudoknots. They use multithreading for parallel execution and are implemented in modern C++ code for maximal longevity and performance.

LaRA2 is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. It uses a new heuristic for computing a lower boundary to the solution and employs vectorization techniques for speeding up the time-critical parts of the algorithm.

MaRs can be applied in a workflow right after LaRA2 and derives sequence-structure motifs from the structural alignments. The motifs are descriptors of the RNA sequences’ properties and drive the search for homologs in genomic sequences. MaRs employs a bidirectional index on the genomic sequences and an optimized multithreaded search strategy for finding the matches really fast. The use of a thread pool, effective pruning strategies, and a low memory footprint ensure that MaRs is capable of processing extremely large data sets.

Details

show
hide
Language(s): eng - English
 Dates: 20232023-09-18
 Publication Status: Published online
 Pages: XIII, 137 S.
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: DOI: 10.17169/refubium-40588
URN: urn:nbn:de:kobv:188-refubium-40867-6
 Degree: PhD

Event

show

Legal Case

show

Project information

show

Source

show