Pitfalls in corpus research

Rietveld, Toni; Van Hout, Roeland; Ernestus, Mirjam

doi:10.1007/s10579-004-1919-1

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

学術論文

Pitfalls in corpus research

MPS-Authors

Ernestus, Mirjam
Language Comprehension Group, MPI for Psycholinguistics, Max Planck Society;
Center for Language Studies, external;
Decoding Continuous Speech, MPI for Psycholinguistics, Max Planck Society;

External Resource

There are no locators available

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

Rietveld_2004_pitfalls.pdf
(出版社版), 166KB

付随資料 (公開)

There is no public supplementary material available

引用

Rietveld, T., Van Hout, R., & Ernestus, M. (2004). Pitfalls in corpus research. Computers and the Humanities, 38(4), 343-362. doi:10.1007/s10579-004-1919-1.

引用: https://hdl.handle.net/11858/00-001M-0000-0013-1762-B

要旨

This paper discusses some pitfalls in corpus research and suggests solutions on the basis of examples and computer simulations. We first address reliability problems in language transcriptions, agreement between transcribers, and how disagreements can be dealt with. We then show that the frequencies of occurrence obtained from a corpus cannot always be analyzed with the traditional X2 test, as corpus data are often not sequentially independent and unit independent. Next, we stress the relevance of the power of statistical tests, and the sizes of statistically significant effects. Finally, we point out that a t-test based on log odds often provides a better alternative to a X2 analysis based on frequency counts.