PINAPL: A Flexible Pipeline for the Detection of Novel Genes in Annotated 
Genomes

Förster, Leo

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Thesis

PINAPL: A Flexible Pipeline for the Detection of Novel Genes in Annotated Genomes

MPS-Authors

There are no MPG-Authors in the publication available

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Förster, L. (2018). PINAPL: A Flexible Pipeline for the Detection of Novel Genes in Annotated Genomes. Master Thesis, Universität Hamburg, Hamburg.

Cite as: https://hdl.handle.net/21.11116/0000-0005-05FF-7

Abstract

As sequencing technologies expand, more individual genomes and transcriptomes become available and can be used to refine existing species annotations. Annotations are currently generated by automated pipelines and maintained by manual curation. These pipelines are limited in their ability to generalise across all domains of life. Even highly accurate pipelines produce erroneous predictions due to the large number of genes in most genomes.
PINAPL is a software suite designed to identify unannotated genes in annotated genomes. Evidence-based gene predictions are compared to existing annotations to highlight genes not present in a species‟ annotation. Evidences are collected which can help to identify a gene as plausibly existing in that species. These data are visualized in an interactive manner aimed at facilitating manual curation of results.
A number of well-supported, unannotated genes are identified by PINAPL, including RAG2 in Cod, KNL1 in Cow, and DOLK in Stickleback. These genes exhibit highly significant BLAST hits to existing genes with hundreds of orthologs, but which are missing in the respective genome annotations. Each of these genes plays an important role in cell biology and is linked to severe knockdown phenotypes. It is likely that these genes were missed by automated annotation methods.
PINAPL offers a comprehensive tool for running, visualizing, and curating unannotated genes in annotated genomes. The genes identified through PINAPL can be used to improve existing annotations to better represent an organism‟s biology, benefitting experimental and analytical work carried out based on that annotation.