On the Complexity of Gene Expression Classification Data Sets

Lorena, Ana C.; Costa, Ivan G.; de Souto, Marcilio C. P.

doi:10.1109/HIS.2008.163

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

会議論文

On the Complexity of Gene Expression Classification Data Sets

MPS-Authors

/persons/resource/persons50127

Costa, Ivan G.
Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society;

External Resource

There are no locators available

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

公開されているフルテキストはありません

付随資料 (公開)

There is no public supplementary material available

引用

Lorena, A. C., Costa, I. G., & de Souto, M. C. P. (2008). On the Complexity of Gene Expression Classification Data Sets. In Hybrid Intelligent Systems, 2008. HIS '08. Eighth International Conference on (pp. 825-830). IEEE.

引用: https://hdl.handle.net/11858/00-001M-0000-0010-7F09-F

要旨

One of the main kinds of computational tasks regarding gene expression data is the construction of classifiers (models), often via some machine learning (ML) technique and given data sets, to automatically discriminate expression patterns from cancer (tumor) and normal tissues or from subtypes of cancers. A very distinctive characteristic of these data sets is its high dimensionality and the fewer number of data items. Such a characteristic makes the induction of accurate ML models difficult (e.g., it could lead to model overfitting). In this context, we present an empirical study on the complexity of the classification task of gene expression data sets, related to cancer, used for classification purposes. In order to do so, we measure the complexity of the ML models used to perform the tumors' classification. The results indicate that most of these data sets can be effectively discriminated by a simple linear function.