English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Using lexical language models to detect borrowings in monolingual wordlists

Miller, J. E., Tresoldi, T., Zariquiey, R., Castañón, C. A. B., Morozova, N., & List, J.-M. (2020). Using lexical language models to detect borrowings in monolingual wordlists. PLoS One, 0242709. doi:10.1371/journal.pone.0242709.

Item is

Files

show Files
hide Files
:
Miller_Using_PLoSOne_2020.pdf (Publisher version), 3MB
Name:
Miller_Using_PLoSOne_2020.pdf
Description:
-
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
2020
Copyright Info:
This is an open access article distributed under the terms of the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium,provided the original author and source are credited

Locators

show

Creators

show
hide
 Creators:
Miller, John E., Author
Tresoldi, Tiago1, Author           
Zariquiey, Roberto, Author
Castañón, César A. Beltrán, Author
Morozova, Natalia2, Author           
List, Johann-Mattis1, Author                 
Affiliations:
1CALC, Max Planck Institute for the Science of Human History, Max Planck Society, ou_2385703              
2Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Max Planck Society, ou_2074311              

Content

show
hide
Free keywords: -
 Abstract: Native speakers are often assumed to be efficient in identifying whether a word in their language has been borrowed, even when they do not have direct knowledge of the donor language from which it was taken. To detect borrowings, speakers make use of various strategies, often in combination, relying on clues such as semantics of the words in question, phonology and phonotactics. Computationally, phonology and phonotactics can be modeled with support of Markov n-gram models or -- as a more recent technique -- recurrent neural network models. Based on a substantially revised dataset in which lexical borrowings have been thoroughly annotated for 41 different languages of a large typological diversity, we use these models to conduct a series of experiments to investigate their performance in borrowing detection using only information from monolingual wordlists. Their performance is in many cases unsatisfying, but becomes more promising for strata where there is a significant ratio of borrowings and when most borrowings originate from a dominant donor language. The recurrent neural network performs marginally better overall in both realistic studies and artificial experiments, and holds out the most promise for continued improvement and innovation in lexical borrowing detection. Phonology and phonotactics, as operationalized in our lexical language models, are only a part of the multiple clues speakers use to detect borrowings. While improving our current methods will result in better borrowing detection, what is needed are more integrated approaches that also take into account multilingual and cross-linguistic information for a proper automated borrowing detection.

Details

show
hide
Language(s): eng - English
 Dates: 2020-12-09
 Publication Status: Published online
 Pages: 39
 Publishing info: -
 Table of Contents: Introduction
- Problem and motivation
- State of the art

Materials and methods
- Materials
- Lexical language models
- Bag of sounds
- Markov Model
- Recurrent neutral network
- Decision preocedures
- Assessing detection performance
- Experiments and studies
- Implementation

Results
- Detection of artificially seeded borrowings
- Cross validation of borrowing detection on real language data
- Factors that influence borrowing detection performance
- Comparing entropy distributions to investigate the performance of the Markov Model and Neural Network methods

Discussion
- Artificially seeded borrowings
- Cross validation of borrowing detection methods
- Factors determining borrowing detection performance
- Detecting borrowings from a single donor language
- Comparing entropy distributions

Conclusion
 Rev. Type: Peer
 Identifiers: DOI: 10.1371/journal.pone.0242709
 Degree: -

Event

show

Legal Case

show

Project information

show hide
Project name : CALC
Grant ID : 715618
Funding program : Horizon 2020 (H2020)
Funding organization : European Commission (EC)

Source 1

show
hide
Title: PLoS One
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: San Francisco, CA : Public Library of Science
Pages: - Volume / Issue: - Sequence Number: 0242709 Start / End Page: - Identifier: ISSN: 1932-6203