Method for Constructing Formants for Studying Phonetic Characteristics of Vowels
Keywords:
phonetics, phonology, formants, acoustic analysis, digital signal processing, vowels, RussianAbstract
This article presents the results of applying method for obtaining formant components of vowel phonemes for the corpus of professional reading in Russian. In this paper, a review of existing areas of development of methods for obtaining formant characteristics of vowels for different languages was made. A review was also made of the extent to which formant picture patterns are used in speech technologies and natural language processing. On the corpus of professional reading CORPRES, data was obtained on formant components for 351929 realizations of vowel phonemes on the material of 8 speakers. The data obtained are grouped in accordance with the symbols in the real transcription, which was performed by phoneticians within the framework of segmenting the corpus. The formant planes represent the distribution of allophones of vowels for all speakers according to the two first formants. The variability of formant characteristics in the corpus for pre-tonic and post-tonic allophones are presented for one male speaker. The article also presents the results testifying the difference between the rounded unstressed /i/ and /a/, which are perceived by both naive speakers and expert phoneticians as /u/. As an experimental material, the recordings of reading by one male announcer of specially selected sentences, which took into account various linguistic factors, were used. Analysis of the data of the formant components of these vowels showed that the values of the first formant of these vowels are close to the values of the stressed vowel /u/ for this speaker. The closure of these vowels corresponds to the closure of /u/. The second formant values in the vowels [u], which were to be realized as [i] and [a] are different. They are more advanced in comparison with /u/.
References
2. Fant G. Akusticheskaja teorija recheobrazovanija [Acoustic theory of speech produc-tion]. M.: Nauka. 1964. 284 p. (In Russ.).
3. Chaari S., Ouni K., Ellouze N. Wavelet ridge track interpretation in terms of formants. Ninth International Conference on Spoken Language Processing. 2006. pp. 1017–1020.
4. Özbek I.Y., Demirekler M. Tracking of visible vocal tract resonances (VVTR) based on kalman filtering. Ninth International Conference on Spoken Language Processing. 2006. 4 p.
5. Mellahi T., Hamdi R. LPC-based formant enhancement method in Kalman filtering for speech enhancement. AEU-International Journal of Electronics and Communica-tions. 2015. vol. 69. no. 2. pp. 545–554.
6. Weruaga L., Al-Khayat A. All-pole model estimation of vocal tract on the frequency domain. Ninth International Conference on Spoken Language Processing. 2006. pp. 1001–1004.
7. Magi C., Bäckström T., Alku P. Stabilised weighted linear prediction-a robust all-pole method for speech processing. Eight Annual Conference of the International Speech Communication Association. 2007. pp. 522–525.
8. Kendall T., Vaughn C. Measurement variability in vowel formant estimation: а simu-lation experiment. Proceedings of the Scottish Consortium for ICPhS 2015. 2015. 5 p.
9. Weenink D. Improved formant frequency measurements of shortsegments. The Scot-tish Consortium for ICPhS 2015. 2015. 4 p.
10. Ramírez M.A. Hybrid Autoregressive Resonance Estimation and Density Mixture Formant Tracking Mode. IEEE Access. 2018. vol. 6. pp. 30217–30224.
11. Arai T. Sliding Vocal-tract Model and its Application for Vowel Production. Tenth Annual Conference of the International Speech Communication Association. 2009. pp. 72–75.
12. Ghosh P.K. et al. Estimation of articulatory gesture patterns from speech acoustics. Tenth Annual Conference of the International Speech Communication Association. 2009. pp. 2803–2806.
13. Fang Q., Nishikido A., Dang J. Feedforward Control of A 3D Physiological Articula-tory Model for Vowel Production. Tsinghua Science and Technology. 2009. vol. 14. no. 5. pp. 617–622.
14. Arai T. Simple Physical Models of the Vocal Tract for Education in Speech Science. Tenth Annual Conference of the International Speech Communication Association. 2009. pp. 756–759.
15. Lu X.B., Thorpe W., Foster K., Hunter P. From experiments to articulatory motion – A three dimensional talking head model. Tenth Annual Conference of the International Speech Communication Association. 2009. pp. 64–67.
16. Lammert A.C., Narayanan S.S. On Short-Time Estimation of Vocal Tract Length from Formant Frequencies. PloS one. 2015. vol. 10(7). pp. e0132193.
17. Fant G., Liljencrants J., Lin Q. A four-parameter model of Glottal Flow. STL-QPSR. 1985. vol. 4. no. 1985. pp. 1–13.
18. Fant G. The voice source in the connected speech. Speech Communication. 1997. vol. 22. no. 2-3. pp. 125–139.
19. Murphy P.J. Relationship between Harmonic Amplitudes and Spectral Zeros and Glottal Open Quotient. International Conference on Phonetic Science (ICPhS). 2007. pp. 889–892.
20. Uezu Y., Kaburagi T. Analysis of voice register transition focused on the relationship between pitch and formant frequency. The Scottish Consortium for ICPhS. 2015. 5 p.
21. Evdokimova V.V. [A systematic approach to determining the parameters of the vocal tract]. Vestnik Sankt-Peterburgskogo universiteta. YAzyk i literature – Vestnik of Saint Petersburg University. Language and Literature. 2007. vol. 2-II. pp. 144–148.
22. Mokhtari P., Tanaka K.A. Corpus of Japanese Vowel Formant Patterns. Bulletin of the Electrotechnical Laboratory (ETL). 2000. vol. 64. pp. 57–66.
23. Evanini K., Isard S., Liberman M. Automatic formant extraction for sociolinguistic analysis of large corpora. Tenth Annual Conference of the International Speech Communication Association. 2009. pp. 1655–1658.
24. Barreda S. Investigating the use of formant frequencies in listener judgments of speaker size. Journal of Phonetics. 2016. vol. 55. pp. 1–18.
25. Macari A.T. et al. Correlation Between the Position of the Hyoid Bone on Lateral Cephalographs and Formant Frequencies. Journal of voice. 2016. vol. 30. no. 6. pp. 757.
26. Hoedl P. Defying gravity: formant frequencies of English vowels produced in upright and supine body position. The Scottish Consortium for ICPhS 2015. 2015. 5 p.
27. Eichhorn J.T., Kent R.D., Austin D., Vorperian H.K. Effects of Aging on Vocal Fun-damental Frequency and Vowel Formants in Men and Women. Journal of Voice. 2018. vol. 32. no. 5. pp. 644. e1-644. e9.
28. Zuo D., Mok P.P.K. Formant dynamics of bilingual identical twins. Journal of Pho-netics. 2015. vol. 52. pp. 1–12
29. Heeren W.F.L. Can formant shifts and effort cues enhance boundary tone perception in whispered speech? The Scottish Consortium for ICPhS 2015. 2015. 5 p.
30. Zhao Y., Lin W. Study of the formant and duration in Chinese whispered vowel speech. Applied Acoustics. 2016. vol. 114. pp. 240–243.
31. Franco-Pedroso J., Gonzalez-Rodriguez J. Linguistically-constrained formant-based i-vectors for automatic speaker recognition. Speech Communication. 2016. vol. 76. pp. 61–81.
32. Skarnitzl R., Vaňková J. Speaker discrimination using formant trajectories from case-work recordings: can LDA do it? The Scottish Consortium for ICPhS 2015. 2015. 5 p.
33. Daqrouq K., Tutunji T.A. Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. Applied Soft Computing. 2015. vol. 27. pp. 231–239.
34. Fisher J.M., Dick F.K., Levy D.F., Wilson S.M. Neural representation of vowel for-mants in tonotopic auditory cortex. NeuroImage. 2018. vol. 178. pp. 574–582.
35. Skrelin P.A. et al. A Fully Annotated Corpus of Russian Speech. Proceedings of the International Conference on Language Resources and Evaluation. 2010. pp. 109–112.
36. Evdokimova V.V. [Variability of the formant structure of a vowel in different types of speech]. Pervyj mezhdisciplinarnyj seminar "Analiz razgovornoj rechi"[ First Interdisciplinary Seminar "Analysis of Conversational Speech"]. 2007. pp. 49–54. (In Russ.).
37. Kocharov D., Evdokimova V., Evgrafova K., Morskovatykh M. Labialization of unstressed vowels in Russian: phonetic and perceptual evidence. International Conference on Speech and Computer. 2018. pp. 301–310.
Published
How to Cite
Section
Copyright (c) Вера Вячеславовна Евдокимова, Даниил Александрович Кочаров, Павел Анатольевич Скрелин

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms: Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).