English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups

Kalofolias, J., Boley, M., & Vreeken, J. (2017). Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups. Retrieved from http://arxiv.org/abs/1709.07941.

Item is

Files

show Files
hide Files
:
arXiv:1709.07941.pdf (Preprint), 508KB
Name:
arXiv:1709.07941.pdf
Description:
File downloaded from arXiv at 2017-10-13 11:13 To appear in ICDM17
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-

Locators

show

Creators

show
hide
 Creators:
Kalofolias, Janis1, Author           
Boley, Mario1, Author           
Vreeken, Jilles1, Author           
Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018              

Content

show
hide
Free keywords: Computer Science, Databases, cs.DB,Computer Science, Artificial Intelligence, cs.AI
 Abstract: Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-$k$ subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time.

Details

show
hide
Language(s): eng - English
 Dates: 2017-09-222017
 Publication Status: Published online
 Pages: 10 p.
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: arXiv: 1709.07941
URI: http://arxiv.org/abs/1709.07941
BibTex Citekey: Kalofolias_arXiv2017
 Degree: -

Event

show

Legal Case

show

Project information

show

Source

show