English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Using deep learning for title-based semantic subject indexing to reach competitive performance to full-text

Mai, F., Galke, L., & Scherp, A. (2018). Using deep learning for title-based semantic subject indexing to reach competitive performance to full-text. In J. Chen, M. A. Gonçalves, J. M. Allen, E. A. Fox, M.-Y. Kan, & V. Petras (Eds.), JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 169-178). New York: ACM.

Item is

Basic

show hide
Genre: Conference Paper

Files

show Files
hide Files
:
Mai_Galke_Scherp_2018_Using deep learning for....pdf (Publisher version), 2MB
Name:
Mai_Galke_Scherp_2018_Using deep learning for....pdf
Description:
-
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Mai, Florian, Author
Galke, Lukas1, Author           
Scherp, Ansgar, Author
Affiliations:
1Kiel University, Kiel, Germany, ou_persistent22              

Content

show
hide
Free keywords: -
 Abstract: For (semi-)automated subject indexing systems in digital libraries, it is often more practical to use metadata such as the title of a publication instead of the full-text or the abstract. Therefore, it is desirable to have good text mining and text classification algorithms that operate well already on the title of a publication. So far, the classification performance on titles is not competitive with the performance on the full-texts if the same number of training samples is used for training. However, it is much easier to obtain title data in large quantities and to use it for training than full-text data. In this paper, we investigate the question how models obtained from training on increasing amounts of title training data compare to models from training on a constant number of full-texts. We evaluate this question on a large-scale dataset from the medical domain (PubMed) and from economics (EconBiz). In these datasets, the titles and annotations of millions of publications are available, and they outnumber the available full-texts by a factor of 20 and 15, respectively. To exploit these large amounts of data to their full potential, we develop three strong deep learning classifiers and evaluate their performance on the two datasets. The results are promising. On the EconBiz dataset, all three classifiers outperform their full-text counterparts by a large margin. The best title-based classifier outperforms the best full-text method by 9.4%. On the PubMed dataset, the best title-based method almost reaches the performance of the best full-text classifier, with a difference of only 2.9%.

Details

show
hide
Language(s): eng - English
 Dates: 2018
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: Peer
 Identifiers: DOI: 10.1145/3197026.3197039
 Degree: -

Event

show
hide
Title: Joint Conference on Digital Libraries (JCDL 2018)
Place of Event: Fort Worth, TX, USA
Start-/End Date: 2018-06-03 - 2018-06-06

Legal Case

show

Project information

show

Source 1

show
hide
Title: JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries
Source Genre: Proceedings
 Creator(s):
Chen, J., Editor
Gonçalves, M. A., Editor
Allen, J. M., Editor
Fox, E. A., Editor
Kan, M.-Y., Editor
Petras, V., Editor
Affiliations:
-
Publ. Info: New York : ACM
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 169 - 178 Identifier: ISBN: 978-1-4503-5178-2