User Manual Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse





Matrix factorization over diods and its applications in data mining


Karaev,  Sanjar
Databases and Information Systems, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Karaev, S. (2019). Matrix factorization over diods and its applications in data mining. PhD Thesis, Universität des Saarlandes, Saarbrücken. doi:10.22028/D291-28661.

Cite as: http://hdl.handle.net/21.11116/0000-0005-4369-A
Matrix factorizations are an important tool in data mining, and they have been used extensively for finding latent patterns in the data. They often allow to separate structure from noise, as well as to considerably reduce the dimensionality of the input matrix. While classical matrix decomposition methods, such as nonnegative matrix factorization (NMF) and singular value decomposition (SVD), proved to be very useful in data analysis, they are limited by the underlying algebraic structure. NMF, in particular, tends to break patterns into smaller bits, often mixing them with each other. This happens because overlapping patterns interfere with each other, making it harder to tell them apart. In this thesis we study matrix factorization over algebraic structures known as dioids, which are characterized by the lack of additive inverse (“negative numbers”) and the idempotency of addition (a + a = a). Using dioids makes it easier to separate overlapping features, and, in particular, it allows to better deal with the above mentioned pattern breaking problem. We consider different types of dioids, that range from continuous (subtropical and tropical algebras) to discrete (Boolean algebra). Among these, the Boolean algebra is perhaps the most well known, and there exist methods that allow one to obtain high quality Boolean matrix factorizations in terms of the reconstruction error. In this work, however, a different objective function is used – the description length of the data, which enables us to obtain compact and highly interpretable results. The tropical and subtropical algebras, on the other hand, are much less known in the data mining field. While they find applications in areas such as job scheduling and discrete event systems, they are virtually unknown in the context of data analysis. We will use them to obtain idempotent nonnegative factorizations that are similar to NMF, but are better at separating the most prominent features of the data.