Речевые  технологии  в  многомодальных интерфейсах

Карпов; Ронжин; Ли; Шалин

doi:10.15622/sp.2.13

Карпов Санкт-Петербургский институт информатики и автоматизации РАН
Ронжин Санкт-Петербургский институт информатики и автоматизации РАН
Ли Санкт-Петербургский институт информатики и автоматизации РАН
Шалин Санкт-Петербургский институт информатики и автоматизации РАН

DOI:

https://doi.org/10.15622/sp.2.13

Аннотация

В статье предлагается краткий обзор существующих архитектур многомодальных интерфейсов и систем, использующих речь в качестве одного из основных компонентов ввода информации. Приводятся основные отличия одномодальных и многомодальных интерфейсов, а также возможные способы объединения информации от различных модальностей. Описываются области применения многомодальных интерфейсов, как в современных, так и в перспективных системах человеко-машинного взаимодействия.

Литература

Oviatt, S. L. Multimodal interfaces. // The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications / J. Jacko and A. Sears, Eds. Lawrence Erlbaum Assoc. Mahwah, NJ, 2003. Chap.14. P. 286–304.

Oviatt, S.L., Cohen, P.R., Wu, L.,Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J. & Ferro, D. Designing the User Interface for Multimodal Speech and Pen-based Gesture Applications: State-of-the-Art Systems and Future Research Directions // Human Computer Interaction. 2000. Vol. 15, no. 4. P. 263–322.

Cohen, P. R., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., & Clow, J. Quickset: Multimodal interaction for distributed applications // Proceedings of the Fifth ACM International Multimedia Conference, New York: ACM Press. 1997. P. 31–40.

Bers, J., Miller, S., & Makhoul, J. Designing conversational interfaces with multimodal interaction // DARPA Workshop on Broadcast News Understanding Systems. 1998. P. 319– 321.

Duncan, L., Brown, W., Esposito, C., Holmback, H., & Xue, P. Enhancing virtual maintenance environments with speech understanding // Boeing M&CT TechNet, 1999.

Wuerger, S.M., Hofbauer, M. and Meyer G. The integration of auditory and visual motion signals at threshold // Perception & Psychophysics. 2003. Vol. 65, no. 8. P. 1188–1196.

McGurk H., & MacDonald J. W. Hearing lips and seeing faces // Nature. 1976. No. 264. P. 746–748.

Benoit, C., Martin, J.-C., Pelachaud, C., Schomaker, L., & Suhm, B. Audio-visual and multimodal speech-based systems. // D. Gibbon, I. Mertins & R. Moore (Eds.). Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation. Kluwer, 2000. P. 102–203.

Robert-Ribes, J., Schwartz, J-L., Lallouache, T. & Escudier, P. Complementarity and synergy in bimodal speech: Auditory, visual, and auditory-visual identification of French oral vowels in noise // Journal of the Acoustical Society of America. 1998. Vol. 103, No. 6 P. 3677–3689.

Heckmann M., Berthommier F., and Kroschel K. A. Hybrid ANN/HMM Audio-Visual Speech Recognition System // Auditory-Visual Speech Processing (AVSP’01), Aalborg, Denmark, 2001.

Corno F., Farinetti L., Signorile I. An eye-gaze input device for people with severe motor disabilities // SSGRR-2002s: International Conference on Advances in Infrastructure for e-Business, e-Education, e-Science, and e-Medicine on the Internet, L'Aquila, August 2002.

Gips J. and Olivieri P. EagleEyes: an eye control system for persons with disabilities // Presentation at Eleventh International Conference on Technology and Persons with Disabilities. Los Angeles, 1996.

Tinto Garcia-Moreno F. Eye Gaze Tracking System Visual Mouse Application Development. Report, Ecole Nationale Supériere de Physique de Strasbourg (ENSPS) and School of Computer Science, Queen’s University Belfast, 2001. 77 p.

LC Technologies, Inc. The Eyegaze System. Fairfax, Virginia, .

Berard, F. The perceptual window: Head motion as a new input stream. // Proceedings of the IFIP Conference on Human-Computer Interaction (INTERACT99). / A.M. Sasse and C. Johnson, Eds. IOS Press, 1999. pp. 238–244.

Maglio P. P., Matlock T., Campbell C. S., Zhai S., and Smith B. A. Gaze and Speech in Attentive User Interfaces // Proc. of the Third International Conference on Multimodal Interfaces, Beijing, China, 2000.

Oviatt, S.L. & Cohen, P.R. Multimodal systems that process what comes naturally // Communications of the ACM, New York: ACM Press, 2000. Vol. 43, no. 3. P. 45–53.

Zhai, S., Morimoto, C., & Ihde, S. Manual and gaze input cascaded (MAGIC) pointing // Proceedings of the Conference on Human Factors in Computing Systems (CHI'99), New York: ACM Press, 1999. P. 246–253.

Schomaker L. et al. A Taxonomy of Multimodal Interaction in the Human Information Processing System. Report of the ESPRIT PROJECT 8579, February 1995.

Jennifer Mankoff and Gregory D. Abowd. Error Correction Techniques for Handwriting, Speech, and other ambiguous or error prone systems. GVU TechReport GIT-GVU-99-18. June 1999.

Juan A. et al Integrated Handwriting Recognition and Interpretation via Finite-State Models. Technical Report ITI-ITE-01/1, Institut Tecnològic d'Informàtica, València (Spain), July 2001.

Niklas Becker Multimodal Interface For mobile clients. Technical report TRITA-NA-E01102, December 2001.

Просмотры	1762
Скачивания	1249

Статьи

Речевые технологии в многомодальных интерфейсах

DOI:

Аннотация

Литература

Опубликован

Статистика

Как цитировать

Выпуск

Раздел

Импакт-фактор

Разделы

Мы в сети

Обратная связь