In silico phenotyping via co-training for improved phenotype prediction from 
genotype

Roqueiro, Damian; Witteveen, Menno J.; Anttila, Verneri; Terwindt, Gisela M.; van den Maagdenberg, Arn; Borgwardt, Karsten

doi:10.1093/bioinformatics/btv254

Local TagsRelease HistoryDetailsSummary

In silico phenotyping via co-training for improved phenotype prediction from genotype

Roqueiro, D., Witteveen, M. J., Anttila, V., Terwindt, G. M., van den Maagdenberg, A., & Borgwardt, K. (2015). In silico phenotyping via co-training for improved phenotype prediction from genotype. Bioinformatics, 31(12), i303-i310. doi:10.1093/bioinformatics/btv254.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-000C-F2F7-C Version Permalink: https://hdl.handle.net/21.11116/0000-000C-F2F8-B

Genre: Journal Article

Files

show Files

Locators

show

hide

Locator:
https://bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/co-training.html (Any fulltext) Open Access status unknown

Description:
-

OA-Status:
Not specified

Creators

show

hide

Creators:
Roqueiro, Damian, Author
Witteveen, Menno J., Author
Anttila, Verneri, Author
Terwindt, Gisela M., Author
van den Maagdenberg, Arn, Author
Borgwardt, Karsten¹, Author

Affiliations:
1ETH Zürich, ou_persistent22

Content

show

hide

Free keywords: -

Abstract: Motivation: Predicting disease phenotypes from genotypes is a key challenge in medical applications in the postgenomic era. Large training datasets of patients that have been both genotyped and phenotyped are the key requisite when aiming for high prediction accuracy. With current genotyping projects producing genetic data for hundreds of thousands of patients, large-scale phenotyping has become the bottleneck in disease phenotype prediction. Results: Here we present an approach for imputing missing disease phenotypes given the genotype of a patient. Our approach is based on co-training, which predicts the phenotype of unlabeled patients based on a second class of information, e.g. clinical health record information. Augmenting training datasets by this type of in silico phenotyping can lead to significant improvements in prediction accuracy. We demonstrate this on a dataset of patients with two diagnostic types of migraine, termed migraine with aura and migraine without aura, from the International Headache Genetics Consortium. Conclusions: Imputing missing disease phenotypes for patients via co-training leads to larger training datasets and improved prediction accuracy in phenotype prediction. Availability and implementation: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/co-training.html Contact: karsten.borgwardt@bsse.ethz.ch or menno.witteveen@bsse.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online.

Details

show

hide

Language(s):

Dates: Published Online: 2015-06-15Date issued: 2015

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: Peer

Identifiers: DOI: 10.1093/bioinformatics/btv254
ISSN: 1367-4811, 1367-4803

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: Bioinformatics

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: -

Pages: - Volume / Issue: 31 (12) Sequence Number: - Start / End Page: i303 - i310 Identifier: -