Multi-modal interaction has as its counterpart multi-modal computing that enhances the ability of computer systems to acquire, process and present different modes of data efficiently and robustly. Such systems have several aims: to analyse and interpret multi-modal information even when it is large, scattered, noisy and possibly incomplete; to organize the gathered knowledge to enable powerful querying; and to produce convincing visual output to display complex information in real time.
Designing systems that can interpret multi-modal information is a task with many component parts.
Acquiring, organizing and retrieving multi-modal information. Searching digital documents today relies on the use of keywords and simple text descriptions. Media including video, image and audio files are searchable only through the use of manually created annotations, which is restrictive and can create bias for certain types of search. Although many types of online resources are available for both professional and casual users, there is little integration among the different sources and formats.
In the future, knowledge will be automatically acquired, categorized and continuously maintained by a suite of methods that can process natural language1, and recognize and analyse video content2. These systems will also be able to perform other functions to improve organization, such as inferring relationships between pieces of information, and using context to extract the meanings of ambiguous words (semantic disambiguation; Fig. 1)3. Science and engineering, most notably medicine and the life sciences4, will particularly benefit from these applications as the number and range of scientific publications grows.
Realistic virtual environments. The goal is to create virtual environments for enhanced presentation of multi-modal data. The visual aspect can be programmed from first principles5 or can incorporate sophisticated processing of existing footage such as static images, video or three-dimensional scans6. These methods require techniques from computer graphics, image processing, computer vision, and combinatorial and geometric computing7 to generate large-scale, integrated, physically accurate and visually rich virtual environments.
A related requirement is for the creation of human-like virtual characters that look and speak realistically, show convincing emotions and mimic the behaviour of real people. Virtual characters provide a powerful and intuitive interface through which to present complex multi-modal data, and can be used to populate virtual reality environments.