English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Sourcepredict: Prediction of metagenomic sample sources using dimension reduction followed by machine learning classification

Borry, M. (2019). Sourcepredict: Prediction of metagenomic sample sources using dimension reduction followed by machine learning classification. The Journal of Open Source Software, 01540. doi:10.21105/joss.01540.

Item is

Basic

show hide
Genre: Journal Article

Files

show Files
hide Files
:
shh2440.pdf (Publisher version), 148KB
Name:
shh2440.pdf
Description:
OA
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-

Locators

show

Creators

show
hide
 Creators:
Borry, Maxime1, Author              
Affiliations:
1Archaeogenetics, Max Planck Institute for the Science of Human History, Max Planck Society, ou_2074310              

Content

show
hide
Free keywords: -
 Abstract: SourcePredict is a Python package distributed through Conda, to classify and predict the origin of metagenomic samples, given a reference dataset of known origins, a problem also known as source tracking. DNA shotgun sequencing of human, animal, and environmental samples has opened up new doors to explore the diversity of life in these different environments, a field known as metagenomics (Hugenholtz & Tyson, 2008). One aspect of metagenomics is investigating the community composition of organisms within a sequencing sample with tools known as taxonomic classifiers, such as Kraken (Wood & Salzberg, 2014). In cases where the origin of a metagenomic sample, its source, is unknown, it is often part of the research question to predict and/or confirm the source. For example, in microbial archaelogy, it is sometimes necessary to rely on metagenomics to validate the source of paleofaeces. Using samples of known sources, a reference dataset can be established with the taxonomic composition of the samples, i.e., the organisms identified in the samples as features, and the sources of the samples as class labels. With this reference dataset, a machine learning algorithm can be trained to predict the source of unknown samples (sinks) from their taxonomic composition. Other tools used to perform the prediction of a sample source already exist, such as Source- Tracker (Knights et al., 2011), which employs Gibbs sampling. However, the Sourcepredict results are more easily interpreted since the samples are embedded in a human observable low-dimensional space. This embedding is performed by a dimension reduction algorithm followed by K-Nearest-Neighbours (KNN) classification.

Details

show
hide
Language(s): eng - English
 Dates: 2019-09-04
 Publication Status: Published online
 Pages: 3
 Publishing info: -
 Table of Contents: Summary
Method
- Prediction of the proportion of unknown sources
- Prediction of the proportion of known sources
- Combining unknown and source proportions
 Rev. Type: Peer
 Identifiers: DOI: 10.21105/joss.01540
Other: shh2440
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: The Journal of Open Source Software
  Other : Journal of Open Source Software
  Abbreviation : JOSS
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: - Sequence Number: 01540 Start / End Page: - Identifier: ISSN: 2475-9066
CoNE: https://pure.mpg.de/cone/journals/resource/2475-9066