English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  CLDFBench: Give your cross-linguistic data a lift

Forkel, R., & List, J.-M. (2020). CLDFBench: Give your cross-linguistic data a lift. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, et al. (Eds.), Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) (pp. 6995-7002). Paris: European Language Resources Association (ELRA). doi:10.17613/8t0e-w639.

Item is

Basic

show hide
Genre: Conference Paper

Files

show Files
hide Files
:
shh2600.pdf (Publisher version), 2MB
Name:
shh2600.pdf
Description:
OA
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-

Locators

show

Creators

show
hide
 Creators:
Forkel, Robert1, Author           
List, Johann-Mattis2, Author           
Affiliations:
1Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Max Planck Society, ou_2074311              
2CALC, Max Planck Institute for the Science of Human History, Max Planck Society, ou_2385703              

Content

show
hide
Free keywords: cross-linguistic data, retro-standardization, data curation
 Abstract: While the amount of cross-linguistic data is onstantly increasing, most datasets produced today and in the past cannot be considered
FAIR (findable, accessible, interoperable, and reproducible). To remedy this and to increase the comparability of cross-linguistic resources,
it is not enough to set up standards and best practices for data to be collected in the future. We also need consistent workflows for the “retro-standardization” of data that has been published during the past decades and centuries. With the Cross-Linguistic Data Formats initiative, first standards for cross-linguistic data have been presented and successfully tested. So far, however, CLDF creation was hampered by the fact that it required a considerable degree of omputational proficiency. With cldfbench, we introduce a framework for the retro-standardization of legacy data and the curation of new datasets that drastically simplifies the creation of CLDF by providing a consistent, reproducible workflow that rigorously supports version control and long term archiving of research data and code. The framework is distributed in form of a Python package along with usage information and examples for best practice. This study introduces the new framework and illustrates how it can be applied by showing how a resource containing structural and lexical data for Sinitic languages can be efficiently retro-standardized and analyzed.

Details

show
hide
Language(s): eng - English
 Dates: 2020-05-192020
 Publication Status: Issued
 Pages: 8
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: DOI: 10.17613/8t0e-w639
Other: shh2600
 Degree: -

Event

show
hide
Title: 12th Conference on Language Resources and Evaluation [postponed due to Corona]
Place of Event: Marseille
Start-/End Date: 2020-05-11 - 2020-05-16

Legal Case

show

Project information

show hide
Project name : CALC
Grant ID : 715618
Funding program : Horizon 2020 (H2020)
Funding organization : European Commission (EC)

Source 1

show
hide
Title: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)
Source Genre: Proceedings
 Creator(s):
Calzolari, Nicoletta, Editor
Béchet, Frédéric, Editor
Blache, Philippe, Editor
Choukri, Khalid, Editor
Cieri, Christopher, Editor
Declerck, Thierry, Editor
Goggi, Sara, Editor
Ishara, Hitoshi, Editor
Maegaard, Bente, Editor
Mariani, Hélène Mazo, Editor
Moreno, Asuncion, Editor
Odijk, Jan, Editor
Piperidis, Stelios, Editor
Affiliations:
-
Publ. Info: Paris : European Language Resources Association (ELRA)
Pages: 7251 Volume / Issue: - Sequence Number: - Start / End Page: 6995 - 7002 Identifier: ISBN: 979-10-95546-34-4