date: 2022-10-20T14:03:33Z
pdf:unmappedUnicodeCharsPerPage: 0
pdf:PDFVersion: 1.7
pdf:docinfo:title: CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents
xmp:CreatorTool: LaTeX with hyperref
Keywords: sphaera; object detection; historical illustrations; digital humanities; artificial intelligence; dataset
access_permission:modify_annotations: true
access_permission:can_print_degraded: true
subject: Recent advances in object detection facilitated by deep learning have led to numerous solutions in a myriad of fields ranging from medical diagnosis to autonomous driving. However, historical research is yet to reap the benefits of such advances. This is generally due to the low number of large, coherent, and annotated datasets of historical documents, as well as the overwhelming focus on Optical Character Recognition to support the analysis of historical documents. In this paper, we highlight the importance of visual elements, in particular illustrations in historical documents, and offer a public multi-class historical visual element dataset based on the Sphaera corpus. Additionally, we train an image extraction model based on YOLO architecture and publish it through a publicly available web-service to detect and extract multi-class images from historical documents in an effort to bridge the gap between traditional and computational approaches in historical studies.
dc:creator: Jochen Büttner, Julius Martinetz, Hassan El-Hajj and Matteo Valleriani
dcterms:created: 2022-10-20T13:53:03Z
Last-Modified: 2022-10-20T14:03:33Z
dcterms:modified: 2022-10-20T14:03:33Z
dc:format: application/pdf; version=1.7
title: CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents
Last-Save-Date: 2022-10-20T14:03:33Z
pdf:docinfo:creator_tool: LaTeX with hyperref
access_permission:fill_in_form: true
pdf:docinfo:keywords: sphaera; object detection; historical illustrations; digital humanities; artificial intelligence; dataset
pdf:docinfo:modified: 2022-10-20T14:03:33Z
meta:save-date: 2022-10-20T14:03:33Z
pdf:encrypted: false
dc:title: CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents
modified: 2022-10-20T14:03:33Z
cp:subject: Recent advances in object detection facilitated by deep learning have led to numerous solutions in a myriad of fields ranging from medical diagnosis to autonomous driving. However, historical research is yet to reap the benefits of such advances. This is generally due to the low number of large, coherent, and annotated datasets of historical documents, as well as the overwhelming focus on Optical Character Recognition to support the analysis of historical documents. In this paper, we highlight the importance of visual elements, in particular illustrations in historical documents, and offer a public multi-class historical visual element dataset based on the Sphaera corpus. Additionally, we train an image extraction model based on YOLO architecture and publish it through a publicly available web-service to detect and extract multi-class images from historical documents in an effort to bridge the gap between traditional and computational approaches in historical studies.
pdf:docinfo:subject: Recent advances in object detection facilitated by deep learning have led to numerous solutions in a myriad of fields ranging from medical diagnosis to autonomous driving. However, historical research is yet to reap the benefits of such advances. This is generally due to the low number of large, coherent, and annotated datasets of historical documents, as well as the overwhelming focus on Optical Character Recognition to support the analysis of historical documents. In this paper, we highlight the importance of visual elements, in particular illustrations in historical documents, and offer a public multi-class historical visual element dataset based on the Sphaera corpus. Additionally, we train an image extraction model based on YOLO architecture and publish it through a publicly available web-service to detect and extract multi-class images from historical documents in an effort to bridge the gap between traditional and computational approaches in historical studies.
Content-Type: application/pdf
pdf:docinfo:creator: Jochen Büttner, Julius Martinetz, Hassan El-Hajj and Matteo Valleriani
X-Parsed-By: org.apache.tika.parser.DefaultParser
creator: Jochen Büttner, Julius Martinetz, Hassan El-Hajj and Matteo Valleriani
meta:author: Jochen Büttner, Julius Martinetz, Hassan El-Hajj and Matteo Valleriani
dc:subject: sphaera; object detection; historical illustrations; digital humanities; artificial intelligence; dataset
meta:creation-date: 2022-10-20T13:53:03Z
created: 2022-10-20T13:53:03Z
access_permission:extract_for_accessibility: true
access_permission:assemble_document: true
xmpTPg:NPages: 18
Creation-Date: 2022-10-20T13:53:03Z
pdf:charsPerPage: 3979
access_permission:extract_content: true
access_permission:can_print: true
meta:keyword: sphaera; object detection; historical illustrations; digital humanities; artificial intelligence; dataset
Author: Jochen Büttner, Julius Martinetz, Hassan El-Hajj and Matteo Valleriani
producer: pdfTeX-1.40.21
access_permission:can_modify: true
pdf:docinfo:producer: pdfTeX-1.40.21
pdf:docinfo:created: 2022-10-20T13:53:03Z