English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Pitfalls in corpus research

MPS-Authors

Ernestus,  Mirjam
Language Comprehension Group, MPI for Psycholinguistics, Max Planck Society;
Center for Language Studies, external;
Decoding Continuous Speech, MPI for Psycholinguistics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

Rietveld_2004_pitfalls.pdf
(Publisher version), 166KB

Supplementary Material (public)
There is no public supplementary material available
Citation

Rietveld, T., Van Hout, R., & Ernestus, M. (2004). Pitfalls in corpus research. Computers and the Humanities, 38(4), 343-362. doi:10.1007/s10579-004-1919-1.


Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-1762-B
Abstract
This paper discusses some pitfalls in corpus research and suggests solutions on the basis of examples and computer simulations. We first address reliability problems in language transcriptions, agreement between transcribers, and how disagreements can be dealt with. We then show that the frequencies of occurrence obtained from a corpus cannot always be analyzed with the traditional X2 test, as corpus data are often not sequentially independent and unit independent. Next, we stress the relevance of the power of statistical tests, and the sizes of statistically significant effects. Finally, we point out that a t-test based on log odds often provides a better alternative to a X2 analysis based on frequency counts.