Combination Methods for Automatic Document Organization

Siersdorfer, Stefan

doi:10.22028/D291-23769

Local TagsRelease HistoryDetailsSummary

Combination Methods for Automatic Document Organization

Siersdorfer, S. (2005). Combination Methods for Automatic Document Organization. PhD Thesis, Universität des Saarlandes, Saarbrücken. doi:10.22028/D291-23769.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-000F-24FD-3 Version Permalink: https://hdl.handle.net/21.11116/0000-000C-7AD5-B

Genre: Thesis

Files

show Files

hide Files

:

phd05siers.pdf (Any fulltext), 2MB

File Permalink:
-

Name:
phd05siers.pdf

Description:
-

OA-Status:

Visibility:
Restricted (Max Planck Institute for Informatics, MSIN; )

MIME-Type / Checksum:
application/pdf

Technical Metadata:

Copyright Date:
-

Copyright Info:
-

License:
-

Locators

show

hide

Locator:
http://scidok.sulb.uni-saarland.de/volltexte/2006/495/ (Any fulltext) Open Access Green

Description:
-

OA-Status:
Green

Locator:
http://scidok.sulb.uni-saarland.de/doku/lic_ohne_pod.php?la=de (Copyright transfer agreement) Open Access status unknown

Description:
-

OA-Status:
Not specified

Creators

show

hide

Creators:
Siersdorfer, Stefan^{1, 2}, Author
Weikum, Gerhard¹, Advisor

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018
2International Max Planck Research School, MPI for Informatics, Max Planck Society, Campus E1 4, 66123 Saarbrücken, DE, ou_1116551

Content

show

hide

Free keywords: -

Abstract: Automatic document classification and clustering are useful for a wide range of applications such as organizing Web, intranet, or portal pages into topic directories, filtering news feeds or mail, focused crawling on the Web or in intranets, and many more. This thesis presents ensemble-based meta methods for supervised classification. In addition, we show how these techniques can be carried forward to clustering based on unsupervised learning (i.e., automatic structuring of document corpora without training data). The algorithms are applied in a restrictive manner, i.e., by leaving out some 'uncertain' documents (rather than assigning them to inappropriate topics or clusters with low confidence). We show how restrictive meta methods can be used to combine different document representations in the context of Web document classification and author recognition. As another application for meta methods we study the combination of different information sources in distributed environments, such as peer-to-peer information systems. Furthermore we address the problem of semi-supervised classification on document collections using retraining.

Details

show

hide

Language(s): eng - English

Dates: Modified: 2006-02-09Accepted: 2005-08-26Published Online: 2005Date issued: 2005

Publication Status: Issued

Pages: -

Publishing info: Saarbrücken : Universität des Saarlandes

Table of Contents: -

Rev. Type: -

Identifiers: eDoc: 278869
Other: Local-ID: C1256DBF005F876D-FB4676D1A2860172C12570D10032E9D9-Siersdorfer2005
DOI: 10.22028/D291-23769
URN: urn:nbn:de:bsz:291-scidok-4956
Other: hdl:20.500.11880/23825

Degree: PhD

Event

show

Legal Case

show

Project information

show

Source

show