Multimodal Interaction Design, Implementation and Evaluation
Spoken natural language may appeal to users in the general public since it is the main modality used, together with pointing gestures or gaze, in face-to-face human communication. Our work on multimodal human-computer interaction is based on the two following observations. On the one hand, speech- and gesture-based multimodality has been extensively studied, both from a software and an ergonomic point of view. However, speech plus graphics as an output form of multimodality has raised fewer research studies, especially regarding the utility and usability of speech as a supplementary modality to graphics.
Besides, pointing hand gestures have the same limited expressive power as gaze in some contexts of use, namely, the selection of objects on very large displays (e.g., electronic blackboards, reality centres or caves, etc.) or in 3D environments. In these interaction environments, both modalities can only specify directions, if used spontaneously as in real life. Our current work on multimodality addresses the three following issues:
- How to design oral messages that help visual search in cluttered displays?
- How to design multimodal command languages that use information on spontaneous or controlled gaze movements to disambiguate oral commands, especially those including deictic phrases?
- Oral assistance to visual search
Concerning the effectiveness of oral support to visual search, the detailed presentation of our first study has been accepted as a chapter in a collective scientific book edited by Kluwer [39]. This study was focused on determining whether oral information on the location of a visual target in a complex, cluttered, display could improve the efficiency of its identification (accuracy and selection times). Targets were either familiar (visual presentation of the isolated target prior to scene display) or unfamiliar (oral characterisation of the target only, prior to scene display), monomodal (visual or oral) or multimodal (visual and oral).
This initial study was followed up, last year, with two more ambitious experimental studies. The first experiment, which involved 24 participants, was focused on investigating the influence of oral help messages on the speed and accuracy of visual target detection activities. Message content merely specified the location of the target in one out of nine pre-defined areas on the screen. The effectiveness of this form of oral assistance was assessed for various display spatial layouts. 3600 photographs of real landscapes, people and objects were selected from a database including over 6000 items, then formatted and divided up into 120 thematically homogeneous collections (30 photographs per collection). These collections were displayed using four spatial layouts (40 collections/scenes per spatial layout): elliptical, radial, matrix-like, random. To refine results on participants’ performances (especially target detection accuracy and selection time), we performed a complementary experiment where the eye movements of 5 participants were traced (using an ASL-501 eye tracker) during similar visual tasks to the ones performed during the second experiment save for the presence of oral assistance. Eye movements were analysed using specific software. Results of both studies are detailed in Suzanne Kieffer’s PhD [15]; see also [19] for the second study, [27] and [28] for the third one. These results are part of our contribution to the Micromégas Project(Three year research project (starting July 2003) in collaboration with the In Situ team at UR INRIA-Futurs (Orsay) and the Laboratoire de la Perception et du Mouvement in Marseille; it benefits from national support (ACI “Masses de données”, 1st call, 2003).).
Continue to read the full research