Establishing graph based pan-genomics in Arabidopsis thaliana

Kubica, C

Local TagsRelease HistoryDetailsSummary

Establishing graph based pan-genomics in Arabidopsis thaliana

Kubica, C. (2024). Establishing graph based pan-genomics in Arabidopsis thaliana. PhD Thesis, Eberhard-Karls-Universität, Tübingen, Germany.

Item is Released

show all

Basic

hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0010-0626-D Version Permalink: https://hdl.handle.net/21.11116/0000-0010-0627-C

Genre: Thesis

Files

show Files

Locators

show

Creators

hide

Creators:
Kubica, C¹, Author

Affiliations:
1Department Molecular Biology, Max Planck Institute for Biology Tübingen, Max Planck Society, ou_3371687

Content

hide

Free keywords: -

Abstract: Reference genomes are foundational to modern genetics, yet by their nature, they cannot capture the full extent of genetic diversity within a species. Representing the genetic potential of an entire species as a single linear sequence introduces inherent biases in all subsequent analyses. This reference bias has long been acknowledged, but only recent advances in sequencing technology have made it possible to address it effectively. The advent of long-read sequencing and the generation of multiple genome assemblies for the same species have enabled a more comprehensive exploration of intraspecies genome variation. These new technologies have also made possible the implementation of a longstanding concept: the genome graph. It integrates multiple reference genomes into a single data structure, offering a better representation of sequence diversity than linear references can provide.
In my work, I apply the genome graph approach to Arabidopsis thaliana by constructing a complex genome graph from six highly contiguous, de-novo assembled genomes, each annotated through the novel pan-genome aware annotation pipeline auto-ant. I demonstrate that building such a graph is not only theoretically possible, but also practically feasible. The resulting graph captures the complete pan-genome of the input assemblies, including sequences absent from the current linear reference genome. Using the reference-agnostic variant detection algorithm panSV, I am able to access this graph-based pan-genome. Furthermore, I show that short-read alignments to this genome graph are feasible and show a reduced reference bias due to the expanded reference structure. Additionally, even a graph constructed from only seven genomes proves capable of representing the broader pan-genome of a larger mapping population.
Although the method is in need of further development and improvements, I have made a first case for the use of highly complex graphs in plant species.

Details

hide

Language(s):

Dates: Published Online: 2024-11-12Date issued: 2024

Publication Status: Issued

Pages: -

Publishing info: Tübingen, Germany : Eberhard-Karls-Universität

Table of Contents: -

Rev. Type: -

Identifiers: -

Degree: PhD

Event

show

Legal Case

show

Project information

show

Source

show