hide
Free keywords:
-
Abstract:
Author summary The emerging field of paleo-metagenomics (i.e., metagenomics from ancient DNA) holds great promise for novel discoveries in fields as diverse as pathogen evolution and paleoenvironmental reconstruction. However, there is presently a lack of computational methods for species identification from microbial communities in both degraded and nondegraded DNA material. Here, we present “HAYSTAC”, a user-friendly software package that implements a novel probabilistic model for species identification in metagenomic data obtained from both degraded and non-degraded DNA material. Through extensive benchmarking, we show that HAYSTAC can be used for accurately profiling the community composition, as well as for direct hypothesis testing for the presence of extremely low-abundance taxa, in complex metagenomic samples. After analysing simulated and publicly available datasets, HAYSTAC consistently produced the lowest number of false positive identifications during taxonomic profiling, produced robust results when databases of restricted size were used, and showed increased sensitivity for pathogen detection compared to other specialist methods. The newly proposed probabilistic model and software employed by HAYSTAC can have a substantial impact on the robust and rapid pathogen discovery in degraded/shallow sequenced metagenomic samples while optimising the use of computational resources.