Knowledge-driven Entity Recognition and Disambiguation in Biomedical Text

Siu, Amy

doi:10.22028/D291-26790

DetailsÜbersicht

Knowledge-driven Entity Recognition and Disambiguation in Biomedical Text

Siu, A. (2017). Knowledge-driven Entity Recognition and Disambiguation in Biomedical Text. PhD Thesis, Universität des Saarlandes, Saarbrücken. doi:10.22028/D291-26790.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-002D-DD18-E Versions-Permalink: https://hdl.handle.net/21.11116/0000-0001-A79D-2

Genre: Hochschulschrift

Dateien

einblenden: Dateien

ausblenden: Dateien

:

PhD_thesis_Siu.pdf (beliebiger Volltext), 2MB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/11858/00-001M-0000-002D-DD1A-A

Name:
PhD_thesis_Siu.pdf

Beschreibung:
-

OA-Status:
Keine Angabe

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
-

Externe Referenzen

einblenden:

ausblenden:

externe Referenz:
https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26803 (beliebiger Volltext) Open Access Grün

Beschreibung:
-

OA-Status:
Grün

Urheber

einblenden:

ausblenden:

Urheber:
Siu, Amy^{1, 2}, Autor
Weikum, Gerhard³, Ratgeber
Berberich, Klaus³, Gutachter
Leser, Ulf⁴, Gutachter

Affiliations:
1Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society, ou_40046
2International Max Planck Research School, MPI for Informatics, Max Planck Society, Campus E1 4, 66123 Saarbrücken, DE, ou_1116551
3Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018
4External Organizations, ou_persistent22

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: Entity recognition and disambiguation (ERD) for the biomedical domain are notoriously difficult problems due to the variety of entities and their often long names in many variations. Existing works focus heavily on the molecular level in two ways. First, they target scientific literature as the input text genre. Second, they target single, highly specialized entity types such as chemicals, genes, and proteins. However, a wealth of biomedical information is also buried in the vast universe of Web content. In order to fully utilize all the information available, there is a need to tap into Web content as an additional input. Moreover, there is a need to cater for other entity types such as symptoms and risk factors since Web content focuses on consumer health. The goal of this thesis is to investigate ERD methods that are applicable to all entity types in scientific literature as well as Web content. In addition, we focus on under-explored aspects of the biomedical ERD problems -- scalability, long noun phrases, and out-of-knowledge base (OOKB) entities. This thesis makes four main contributions, all of which leverage knowledge in UMLS (Unified Medical Language System), the largest and most authoritative knowledge base (KB) of the biomedical domain. The first contribution is a fast dictionary lookup method for entity recognition that maximizes throughput while balancing the loss of precision and recall. The second contribution is a semantic type classification method targeting common words in long noun phrases. We develop a custom set of semantic types to capture word usages; besides biomedical usage, these types also cope with non-biomedical usage and the case of generic, non-informative usage. The third contribution is a fast heuristics method for entity disambiguation in MEDLINE abstracts, again maximizing throughput but this time maintaining accuracy. The fourth contribution is a corpus-driven entity disambiguation method that addresses OOKB entities. The method first captures the entities expressed in a corpus as latent representations that comprise in-KB and OOKB entities alike before performing entity disambiguation.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Angenommen: 2017-09-04Online veröffentlicht: 2017Erschienen: 2017

Publikationsstatus: Erschienen

Seiten: 169 p.

Ort, Verlag, Ausgabe: Saarbrücken : Universität des Saarlandes

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: BibTex Citekey: siuphd17
DOI: 10.22028/D291-26790

Art des Abschluß: Doktorarbeit

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle