The Language Archiving Technology domain

Koenig, Alexander; Ringersma, Jacquelijn; Trilsbeek, Paul

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

The Language Archiving Technology domain

MPS-Authors

Koenig, Alexander
Technical Group, MPI for Psycholinguistics, Max Planck Society;

/persons/resource/persons152

Ringersma, Jacquelijn
Technical Group, MPI for Psycholinguistics, Max Planck Society;

/persons/resource/persons184

Trilsbeek, Paul
Technical Group, MPI for Psycholinguistics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

ltc-105-koenig.pdf
(Any fulltext), 206KB

Supplementary Material (public)

There is no public supplementary material available

Citation

Koenig, A., Ringersma, J., & Trilsbeek, P. (2009). The Language Archiving Technology domain. In Z. Vetulani (Ed.), Human Language Technologies as a Challenge for Computer Science and Linguistics (pp. 295-299).

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-4177-E

Abstract

The Max Planck Institute for Psycholinguistics (MPI) manages an archive of linguistic research data with a current size of almost 20 Terabytes. Apart from in-house researchers other projects also store their data in the archive, most notably the Documentation of Endangered Languages (DoBeS) projects. The archive is available online and can be accessed by anybody with Internet access. To be able to manage this large amount of data the MPI's technical group has developed a software suite called Language Archiving Technology (LAT) that on the one hand helps researchers and archive managers to manage the data and on the other hand helps users in enriching their primary data with additional layers. All the MPI software is Java-based and developed according to open source principles (GNU, 2007). All three major operating systems (Windows, Linux, MacOS) are supported and the software works similarly on all of them. As the archive is online, many of the tools, especially the ones for accessing the data, are browser based. Some of these browser-based tools make use of Adobe Flex to create nice-looking GUIs. The LAT suite is a complete set of management and enrichment tools, and given the interaction between the tools the result is a complete LAT software domain. Over the last 10 years, this domain has proven its functionality and use, and is being deployed to servers in other institutions. This deployment is an important step in getting the archived resources back to the members of the speech communities whose languages are documented. In the paper we give an overview of the tools of the LAT suite and we describe their functionality and role in the integrated process of archiving, management and enrichment of linguistic data.