Causal Inference on Multivariate Mixed-Type Data by Minimum Description Length

Marx, Alexander; Vreeken, Jilles

Local TagsRelease HistoryDetailsSummary

Causal Inference on Multivariate Mixed-Type Data by Minimum Description Length

Marx, A., & Vreeken, J. (2017). Causal Inference on Multivariate Mixed-Type Data by Minimum Description Length. Retrieved from http://arxiv.org/abs/1702.06385.

Item is Released

show all

Basic

hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-002D-90EF-3 Version Permalink: https://hdl.handle.net/11858/00-001M-0000-002D-90F0-E

Genre: Paper

Files

hide Files

:

arXiv:1702.06385.pdf (Preprint), 2MB

View Save

File Permalink:
https://hdl.handle.net/11858/00-001M-0000-002D-90F1-C

Name:
arXiv:1702.06385.pdf

Description:
File downloaded from arXiv at 2017-07-10 12:12

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
http://arxiv.org/help/license

Locators

show

Creators

hide

Creators:
Marx, Alexander¹, Author
Vreeken, Jilles¹, Author

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018

Content

hide

Free keywords: Statistics, Machine Learning, stat.ML,Computer Science, Learning, cs.LG

Abstract: Given data over the joint distribution of two univariate or multivariate random variables $X$ and $Y$ of mixed or single type data, we consider the problem of inferring the most likely causal direction between $X$ and $Y$. We take an information theoretic approach, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. For practical inference, we propose a score for causal models for mixed type data based on the Minimum Description Length (MDL) principle. In particular, we model dependencies between $X$ and $Y$ using classification and regression trees. Inferring the optimal model is NP-hard, and hence we propose Crack, a fast greedy algorithm to infer the most likely causal direction directly from the data. Empirical evaluation on synthetic, benchmark, and real world data shows that Crack reliably and with high accuracy infers the correct causal direction on both univariate and multivariate cause--effect pairs over both single and mixed type data.

Details

hide

Language(s): eng - English

Dates: Created: 2017-02-21Published Online: 2017

Publication Status: Published online

Pages: 16 p.

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: arXiv: 1702.06385
URI: http://arxiv.org/abs/1702.06385
BibTex Citekey: DBLP:journals/corr/MarxV17

Degree: -

Event

show

Legal Case

show

Project information

show

Source

show