An efficient data partitioning to improve classification performance while 
keeping parameters interpretable

Korjus, Kristjan; Hebart, Martin N.; Vicente, Raul

doi:10.1371/journal.pone.0161788

Lokale TagsFreigabegeschichteDetailsÜbersicht

An efficient data partitioning to improve classification performance while keeping parameters interpretable

Korjus, K., Hebart, M. N., & Vicente, R. (2016). An efficient data partitioning to improve classification performance while keeping parameters interpretable. PLoS One, 11(8): e0161788. doi:10.1371/journal.pone.0161788.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/21.11116/0000-0005-210D-8 Versions-Permalink: https://hdl.handle.net/21.11116/0000-0005-210E-7

Genre: Zeitschriftenartikel

Dateien

einblenden: Dateien

ausblenden: Dateien

:

Korjus_2016.PDF (Verlagsversion), 2MB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/21.11116/0000-0005-210F-6

Name:
Korjus_2016.PDF

Beschreibung:
-

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Korjus, Kristjan ¹, Autor
Hebart, Martin N.¹, Autor
Vicente, Raul ¹, Autor

Affiliations:
1External Organizations, ou_persistent22

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier’s generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term “Cross-validation and cross-testing” improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Eingereicht: 2016-05-09Angenommen: 2016-08-11Online veröffentlicht: 2016-08-26

Publikationsstatus: Online veröffentlicht

Seiten: -

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: Expertenbegutachtung

Identifikatoren: DOI: 10.1371/journal.pone.0161788
PMID: 27564393
PMC: PMC5001642
Anderer: eCollection 2016

Art des Abschluß: -

ausblenden:

Titel: PLoS One

Genre der Quelle: Zeitschrift

Urheber:

Affiliations:

Ort, Verlag, Ausgabe: San Francisco, CA : Public Library of Science

Seiten: - Band / Heft: 11 (8) Artikelnummer: e0161788 Start- / Endseite: - Identifikator: ISSN: 1932-6203
CoNE: https://pure.mpg.de/cone/journals/resource/1000000000277850

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1