Identifying Consistent Statements about Numerical Data with 
Dispersion-Corrected Subgroup Discovery

Boley, Mario; Goldsmith, Bryan; Ghiringhelli, Luca M.; Vreeken, Jilles

doi:10.1007/s10618-017-0520-3

DetailsÜbersicht

Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery

Boley, M., Goldsmith, B., Ghiringhelli, L. M., & Vreeken, J. (2017). Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery. Data Mining and Knowledge Discovery, 31(5), 1391-1418. doi:10.1007/s10618-017-0520-3.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-002D-99F7-B Versions-Permalink: https://hdl.handle.net/21.11116/0000-0003-F6EF-B

Genre: Zeitschriftenartikel

Dateien

einblenden: Dateien

ausblenden: Dateien

:

s10618-017-0520-3.pdf (Verlagsversion), 2MB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/11858/00-001M-0000-002D-F21E-9

Name:
s10618-017-0520-3.pdf

Beschreibung:
-

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
2017

Copyright Info:
© The Author(s)

Lizenz:
http://creativecommons.org/licenses/by/3.0/

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Boley, Mario¹, Autor
Goldsmith, Bryan², Autor
Ghiringhelli, Luca M.², Autor
Vreeken, Jilles¹, Autor

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018
2Theory, Fritz Haber Institute, Max Planck Society, ou_634547

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic
estimator framework for optimal subgroup discovery to a new class of objective func-
tions: we show how tight estimators can be computed efficiently for all functions that
are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the mean absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Eingereicht: 2017-06-28Angenommen: 2017-06-12Online veröffentlicht: 2017-09Erschienen: 2017-01-19

Publikationsstatus: Erschienen

Seiten: 28

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: Expertenbegutachtung

Identifikatoren: DOI: 10.1007/s10618-017-0520-3

Art des Abschluß: -

Projektname : NoMaD - The Novel Materials Discovery Laboratory

Grant ID : 676580

Förderprogramm : Horizon 2020 (H2020)

Förderorganisation : European Commission (EC)

Quelle 1

einblenden:

ausblenden:

Titel: Data Mining and Knowledge Discovery

Genre der Quelle: Zeitschrift

Urheber:

Affiliations:

Ort, Verlag, Ausgabe: London : Springer

Seiten: 28 Band / Heft: 31 (5) Artikelnummer: - Start- / Endseite: 1391 - 1418 Identifikator: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1