Automatic segmentation of speech articulators from real-time midsagittal MRI 
based on supervised learning.

Labrunie, M.; Badin, P.; Voit, D.; Joseph, A. A.; Frahm, J.; Lamalle, L.; Vilain, C.; Boe, L. J.

doi:10.1016/j.specom.2018.02.004

詳細要約

Automatic segmentation of speech articulators from real-time midsagittal MRI based on supervised learning.

Labrunie, M., Badin, P., Voit, D., Joseph, A. A., Frahm, J., Lamalle, L., Vilain, C., & Boe, L. J. (2018). Automatic segmentation of speech articulators from real-time midsagittal MRI based on supervised learning. Speech Communication, 99, 27-46. doi:10.1016/j.specom.2018.02.004.

Item is 公開

表示: 全項目非表示: 全項目

基本情報

表示: 非表示:

アイテムのパーマリンク: https://hdl.handle.net/21.11116/0000-0001-EF9B-4 版のパーマリンク: https://hdl.handle.net/21.11116/0000-0001-EFA4-9

資料種別: 学術論文

ファイル

表示: ファイル

非表示: ファイル

:

2632784.pdf (出版社版), 4MB

ファイルのパーマリンク:
-

ファイル名:
2632784.pdf

説明:
-

OA-Status:

閲覧制限:
制限付き ( Max Planck Society (every institute); )

MIMEタイプ / チェックサム:
application/pdf

技術的なメタデータ:

著作権日付:
-

著作権情報:
-

CCライセンス:
-

:

2632784_Suppl.htm (付録資料), 76KB

表示保存

ファイルのパーマリンク:
https://hdl.handle.net/21.11116/0000-0001-EFA3-A

ファイル名:
2632784_Suppl.htm

説明:
-

OA-Status:

閲覧制限:
公開

MIMEタイプ / チェックサム:
text/html / [MD5]

技術的なメタデータ:

表示

著作権日付:
-

著作権情報:
-

CCライセンス:
-

作成者

表示:

非表示:

作成者:
Labrunie, M., 著者
Badin, P., 著者
Voit, D.¹, 著者
Joseph, A. A.¹, 著者
Frahm, J.¹, 著者
Lamalle, L., 著者
Vilain, C., 著者
Boe, L. J., 著者

所属:
1Biomedical NMR Research GmbH, MPI for biophysical chemistry, Max Planck Society, ou_578634

内容説明

表示:

非表示:

キーワード: Real-time MRI; Speech articulation; Articulator segmentation; Multiple Linear Regression; Active Shape Models; Shape Particle Filtering

要旨: Speech production mechanisms can be characterized at a peripheral level by both their acoustic and articulatory traces along time. Researchers have thus developed very large efforts to measure articulation. Thanks to the spectacular progress accomplished in the last decade, real-time Magnetic Resonance Imaging (RT-MRI) offers nowadays the advantages of frame rates closer than before to those achieved by electromagnetic articulography or ultrasound echography while providing very detailed geometrical information about the whole vocal tract. RT-MRI has thus become inescapable for the study of speech articulators' movements. However, making efficient use of large sets of images to characterize and model speech tasks implies the development of automatic methods to segment the articulators from these images with sufficient accuracy. The present article describes our approach to develop, based on supervised machine learning techniques, an automatic segmentation method that offers various useful features such as (1) capability of dealing with individual articulators independently, (2) ensuring hard palate, jaw and hyoid bone to be adequately tracked as rigid structures, (3) delivering contours for a full set of articulators, including the epiglottis and the back of the larynx, which partly reflects the vocal fold abduction / adduction state, (4) dealing more explicitly and thus more accurately with contact between articulators, and (5) reaching an accuracy better than one millimeter. The main contributions of this work are the following. We have recorded the first large database of high quality RTMRI midsagittal images for a French speaker. We have manually segmented the main speech articulators (jaw, lips, tongue, velum, hyoid, larynx, etc.) for a small training set of about 60 images selected by hierarchical clustering to represent the whole corpus as faithfully as possible. We have used these data to train various image and contour models for developing automatic articulatory segmentation methods. The first method, based on Multiple Linear Regression, allows to predict the contour coordinates from the image pixel intensities with a Mean Sum of Distances (MSD) segmentation error over all articulators of 0.91 mm, computed with a Leave-One-Out Cross Validation procedure on the training set. Another method, based on Shape Particle Filtering, reaches an MSD error of 0.66 mm. Finally the modified version of Active Shape Models (mASM) explored in this study gives an MSD error of a mere 0.55 nun (0.68 mm for the tongue). These results demonstrate that this mASM approach performs better than state-of-the-art methods, though at the cost of the manual segmentation of the training set. The same method used on other MRI data leads to similar errors, which testifies to its robustness. The large quantity of contour data that can be obtained with this automatic segmentation method opens the way to various fruitful perspectives in speech: establishing more elaborate articulatory models, analyzing more finely coarticulation and articulatory variability or invariance, implementing machine learning methods for articulatory speaker normalization or adaptation, or illustrating adequate or prototypical articulatory gestures for application in the domains of speech therapy and of second language pronunciation training.

資料詳細

表示:

非表示:

言語: eng - English

日付: オンライン出版: 2018-03-01出版: 2018-05

出版の状態: 出版

ページ: -

出版情報: -

目次: -

査読: 査読あり

識別子（DOI, ISBNなど）: DOI: 10.1016/j.specom.2018.02.004

学位: -

訴訟

表示:

Project information

表示:

出版物 1

表示:

非表示:

出版物名: Speech Communication

種別: 学術雑誌

著者・編者:

所属:

出版社, 出版地: -

ページ: - 巻号: 99 通巻号: - 開始・終了ページ: 27 - 46 識別子（ISBN, ISSN, DOIなど）: -

アイテム詳細

基本情報

ファイル

関連URL

作成者

内容説明

資料詳細

関連イベント

訴訟

Project information

出版物 1