Near-optimal supervised feature selection among frequent subgraphs

Thoma, M; Cheng, H; Gretton, A; Han, J; Kriegel, H-P; Smola, AJ; Song, L; Yu, PS; Yan, X; Borgwardt, KM

doi:10.1137/1.9781611972795.92

Local TagsRelease HistoryDetailsSummary

Near-optimal supervised feature selection among frequent subgraphs

Thoma, M., Cheng, H., Gretton, A., Han, J., Kriegel, H.-P., Smola, A., et al. (2009). Near-optimal supervised feature selection among frequent subgraphs. In H. Park, S. Parthasarathy, & H. Liu (Eds.), 9th SIAM Conference on Data Mining (SDM 2009) (pp. 1076-1087). Society for Industrial and Applied Mathematics: Philadelphia, PA, USA.

Item is Released

show all

Basic

hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0010-7A2A-7 Version Permalink: https://hdl.handle.net/21.11116/0000-0010-7A2B-6

Genre: Conference Paper

Files

show Files

Locators

show

Creators

hide

Creators:
Thoma, M, Author
Cheng, H, Author
Gretton, A, Author
Han, J, Author
Kriegel, H-P, Author
Smola, AJ, Author
Song, L, Author
Yu, PS, Author
Yan, X, Author
Borgwardt, KM¹, Author

Affiliations:
1Department Molecular Biology, Max Planck Institute for Developmental Biology, Max Planck Society, ou_3375790

Content

hide

Free keywords: -

Abstract: Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually binary) vectors, with components indicating whether a graph contains a particular subgraph that is frequent across the dataset. On large graphs, however, one faces the enormous problem that the number of these frequent subgraphs may grow exponentially with the size of the graphs, but only few of them possess enough discriminative power to make them useful for graph classification. Efficient and discriminative feature selection among frequent subgraphs is hence a key challenge for graph mining. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining.

Details

hide

Language(s):

Dates: Date issued: 2009-05

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.1137/1.9781611972795.92

Degree: -

Event

hide

Title: 9th SIAM Conference on Data Mining (SDM 2009)

Place of Event: Sparks, NV, USA

Start-/End Date: 2009-04-30 - 2009-05-02

Legal Case

show

Project information

show

Source 1

hide

Title: 9th SIAM Conference on Data Mining (SDM 2009)

Source Genre: Proceedings

Creator(s):
Park, H, Editor
Parthasarathy, S, Editor
Liu, H, Editor

Affiliations:
-

Publ. Info: Society for Industrial and Applied Mathematics : Philadelphia, PA, USA

Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 1076 - 1087 Identifier: ISBN: 978-1-615-67109-0