date: 2022-10-20T14:03:33Z pdf:unmappedUnicodeCharsPerPage: 0 pdf:PDFVersion: 1.7 pdf:docinfo:title: CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents xmp:CreatorTool: LaTeX with hyperref Keywords: sphaera; object detection; historical illustrations; digital humanities; artificial intelligence; dataset access_permission:modify_annotations: true access_permission:can_print_degraded: true subject: Recent advances in object detection facilitated by deep learning have led to numerous solutions in a myriad of fields ranging from medical diagnosis to autonomous driving. However, historical research is yet to reap the benefits of such advances. This is generally due to the low number of large, coherent, and annotated datasets of historical documents, as well as the overwhelming focus on Optical Character Recognition to support the analysis of historical documents. In this paper, we highlight the importance of visual elements, in particular illustrations in historical documents, and offer a public multi-class historical visual element dataset based on the Sphaera corpus. Additionally, we train an image extraction model based on YOLO architecture and publish it through a publicly available web-service to detect and extract multi-class images from historical documents in an effort to bridge the gap between traditional and computational approaches in historical studies. dc:creator: Jochen Büttner, Julius Martinetz, Hassan El-Hajj and Matteo Valleriani dcterms:created: 2022-10-20T13:53:03Z Last-Modified: 2022-10-20T14:03:33Z dcterms:modified: 2022-10-20T14:03:33Z dc:format: application/pdf; version=1.7 title: CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents Last-Save-Date: 2022-10-20T14:03:33Z pdf:docinfo:creator_tool: LaTeX with hyperref access_permission:fill_in_form: true pdf:docinfo:keywords: sphaera; object detection; historical illustrations; digital humanities; artificial intelligence; dataset pdf:docinfo:modified: 2022-10-20T14:03:33Z meta:save-date: 2022-10-20T14:03:33Z pdf:encrypted: false dc:title: CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents modified: 2022-10-20T14:03:33Z cp:subject: Recent advances in object detection facilitated by deep learning have led to numerous solutions in a myriad of fields ranging from medical diagnosis to autonomous driving. However, historical research is yet to reap the benefits of such advances. This is generally due to the low number of large, coherent, and annotated datasets of historical documents, as well as the overwhelming focus on Optical Character Recognition to support the analysis of historical documents. In this paper, we highlight the importance of visual elements, in particular illustrations in historical documents, and offer a public multi-class historical visual element dataset based on the Sphaera corpus. Additionally, we train an image extraction model based on YOLO architecture and publish it through a publicly available web-service to detect and extract multi-class images from historical documents in an effort to bridge the gap between traditional and computational approaches in historical studies. pdf:docinfo:subject: Recent advances in object detection facilitated by deep learning have led to numerous solutions in a myriad of fields ranging from medical diagnosis to autonomous driving. However, historical research is yet to reap the benefits of such advances. This is generally due to the low number of large, coherent, and annotated datasets of historical documents, as well as the overwhelming focus on Optical Character Recognition to support the analysis of historical documents. In this paper, we highlight the importance of visual elements, in particular illustrations in historical documents, and offer a public multi-class historical visual element dataset based on the Sphaera corpus. Additionally, we train an image extraction model based on YOLO architecture and publish it through a publicly available web-service to detect and extract multi-class images from historical documents in an effort to bridge the gap between traditional and computational approaches in historical studies. Content-Type: application/pdf pdf:docinfo:creator: Jochen Büttner, Julius Martinetz, Hassan El-Hajj and Matteo Valleriani X-Parsed-By: org.apache.tika.parser.DefaultParser creator: Jochen Büttner, Julius Martinetz, Hassan El-Hajj and Matteo Valleriani meta:author: Jochen Büttner, Julius Martinetz, Hassan El-Hajj and Matteo Valleriani dc:subject: sphaera; object detection; historical illustrations; digital humanities; artificial intelligence; dataset meta:creation-date: 2022-10-20T13:53:03Z created: 2022-10-20T13:53:03Z access_permission:extract_for_accessibility: true access_permission:assemble_document: true xmpTPg:NPages: 18 Creation-Date: 2022-10-20T13:53:03Z pdf:charsPerPage: 3979 access_permission:extract_content: true access_permission:can_print: true meta:keyword: sphaera; object detection; historical illustrations; digital humanities; artificial intelligence; dataset Author: Jochen Büttner, Julius Martinetz, Hassan El-Hajj and Matteo Valleriani producer: pdfTeX-1.40.21 access_permission:can_modify: true pdf:docinfo:producer: pdfTeX-1.40.21 pdf:docinfo:created: 2022-10-20T13:53:03Z