Capricorn: An Algorithm for Subtropical Matrix Factorization

Karaev, Sanjar; Miettinen, Pauli

doi:10.1137/1.9781611974348.79

Capricorn: An Algorithm for Subtropical Matrix Factorization

Karaev, S., & Miettinen, P. (2016). Capricorn: An Algorithm for Subtropical Matrix Factorization. In S., Chawla Venkatasubramanian, & W., Meira (Eds.), Proceedings of the Sixteenth SIAM International Conference on Data Mining (pp. 702-710). Philadelphia, PA: SIAM. doi:10.1137/1.9781611974348.79.

Item is 公開

表示: 全項目非表示: 全項目

基本情報

表示: 非表示:

アイテムのパーマリンク: https://hdl.handle.net/11858/00-001M-0000-0029-542F-3 版のパーマリンク: https://hdl.handle.net/11858/00-001M-0000-002B-A932-E

資料種別: 会議論文

LaTeX : Capricorn: {An} Algorithm for Subtropical Matrix Factorization

ファイル

表示: ファイル

作成者

表示:

非表示:

作成者:
Karaev, Sanjar¹, 著者
Miettinen, Pauli¹, 著者

所属:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018

内容説明

表示:

非表示:

キーワード: -

要旨: Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.

資料詳細

表示:

非表示:

言語: eng - English

日付: 受理: 2016オンライン出版: 2016出版: 2016

出版の状態: 出版

ページ: -

出版情報: -

目次: -

査読: -

識別子（DOI, ISBNなど）: BibTex参照ID: karaev16capricorn
DOI: 10.1137/1.9781611974348.79

学位: -

訴訟

表示:

Project information

表示:

出版物 1

表示:

非表示:

出版物名: Proceedings of the Sixteenth SIAM International Conference on Data Mining

省略形 : SDM 2016

副タイトル : Miami, Florida, USA, May 5 - May 7, 2016

種別: 会議論文集

著者・編者:
Chawla Venkatasubramanian, Sanjay¹, 編集者
Meira, Wagner¹, 編集者

所属:
1 External Organizations, ou_persistent22

出版社, 出版地: Philadelphia, PA : SIAM

ページ: - 巻号: - 通巻号: - 開始・終了ページ: 702 - 710 識別子（ISBN, ISSN, DOIなど）: ISBN: 978-1-61197-434-8

アイテム詳細

基本情報

ファイル

関連URL

作成者

内容説明

資料詳細

関連イベント

訴訟

Project information

出版物 1