An Approach for Weakly-Supervised Deep Information Retrieval

MacAvaney, Sean; Hui, Kai; Yates, Andrew

Lokale TagsFreigabegeschichteDetailsÜbersicht

An Approach for Weakly-Supervised Deep Information Retrieval

MacAvaney, S., Hui, K., & Yates, A. (2017). An Approach for Weakly-Supervised Deep Information Retrieval. Retrieved from http://arxiv.org/abs/1707.00189.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-002E-06C5-C Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-002E-06C6-A

Genre: Forschungspapier

Dateien

einblenden: Dateien

ausblenden: Dateien

:

arXiv:1707.00189.pdf (Preprint), 632KB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/11858/00-001M-0000-002E-06C7-8

Name:
arXiv:1707.00189.pdf

Beschreibung:
File downloaded from arXiv at 2017-10-13 12:03 Neu-IR 2017 SIGIR Workshop on Neural Information Retrieval

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
http://arxiv.org/help/license

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
MacAvaney, Sean¹, Autor
Hui, Kai², Autor
Yates, Andrew², Autor

Affiliations:
1External Organizations, ou_persistent22
2Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018

Inhalt

einblenden:

ausblenden:

Schlagwörter: Computer Science, Information Retrieval, cs.IR,Computer Science, Computation and Language, cs.CL

Zusammenfassung: Recent developments in neural information retrieval models have been promising, but a problem remains: human relevance judgments are expensive to produce, while neural models require a considerable amount of training data. In an attempt to fill this gap, we present an approach that---given a weak training set of pseudo-queries, documents, relevance information---filters the data to produce effective positive and negative query-document pairs. This allows large corpora to be used as neural IR model training data, while eliminating training examples that do not transfer well to relevance scoring. The filters include unsupervised ranking heuristics and a novel measure of interaction similarity. We evaluate our approach using a news corpus with article headlines acting as pseudo-queries and article content as documents, with implicit relevance between an article's headline and its content. By using our approach to train state-of-the-art neural IR models and comparing to established baselines, we find that training data generated by our approach can lead to good results on a benchmark test collection.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Erstellt: 2017-07-01Geändert: 2017-07-24Online veröffentlicht: 2017

Publikationsstatus: Online veröffentlicht

Seiten: 5 p.

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: arXiv: 1707.00189
URI: http://arxiv.org/abs/1707.00189
BibTex Citekey: MacAvaney_arXiv2017

Art des Abschluß: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle