English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Image Classification for Historical Documents: A Study on Chinese Local Gazetteers

MPS-Authors
/persons/resource/persons195955

Chen,  Shih-Pei
Department Artifacts, Action, Knowledge, Max Planck Institute for the History of Science, Max Planck Society;

External Resource
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

fqad065.pdf
(Any fulltext), 2MB

Supplementary Material (public)
There is no public supplementary material available
Citation

Chen, J.-A., Hou, J.-C., Tsai, R.-T.-H., Liao, H.-M., Chen, S.-P., & Chang, M.-C. (2024). Image Classification for Historical Documents: A Study on Chinese Local Gazetteers. Digital Scholarship in the Humanities, 39(1), 61-73. doi:10.1093/llc/fqad065.


Cite as: https://hdl.handle.net/21.11116/0000-000F-BC97-3
Abstract
We present a novel approach for automatically classifying illustrations from historical Chinese local gazetteers using modern deep learning techniques. Our goal is to facilitate the digital organization and study of a large quantity of digitized local gazetteers. We evaluate the performance of eight state-of-the-art deep neural networks on a dataset of 4,309 manually labeled and organized images of Chinese local gazetteer illustrations, grouped into three coarse categories and nine fine classes according to their contents. Our experiments show that DaViT achieved the highest classification accuracy of 93.9 per cent and F1-score of 90.6 per cent. Our results demonstrate the effectiveness of deep learning models in accurately recognizing and categorizing historical local gazetteer illustrations. We also developed a user-friendly web service to enable researchers easy access to the developed models. The potential for extending this method to other collections of scanned documents beyond Chinese local gazetteers makes a significant contribution to the study of visual materials in the arts and history in the digital humanities field. The dataset used in this study is publicly available and can be used for further research in the field.