Mirroring human-to-human communication. One approach to accessing stored information is to design a system that interacts with users in a way that mirrors human behaviour and dialogue. A system that recreates natural, daily person-to-person communication, in which both the system and the human user combine the same spectrum of modalities for input and output, is said to be symmetric8. A good example involves drivers and passengers travelling in a car: rather than breaking their attention to access advanced car services (for example, satellite navigation, entertainment or four-wheel drive), a naturalized interface would allow for easy access using voice commands combined with predictive algorithms. These technologies would create computational models of the current task combined with context, such as the user’s state and cognitive load, to understand the user’s needs and provide appropriate multi-modal responses.
Autonomous infrastructure. Designing dependable, autonomous multi-modal systems is one thing; however, they must be supported by appropriate platforms that are self-organized and able to operate independently on a range of infrastructures, thereby providing reliable computing and communication anytime and anywhere. Any manual input to these systems should be limited to the installation and replacement of hardware components.
Such systems will be capable of delivering personalized, relevant, and timely information and communication. However, they must respect users’ legitimate privacy concerns while holding them accountable for their actions. Such systems are a necessary platform for the previously stated objectives.
The multi-modal future is already around us in the form of smartphones, global positioning systems and even hyper-realistic computer games; going forwards, these will be even more commonplace — available anytime and anywhere. In our vision, these systems will be self-organizing and autonomous, using natural interfaces to provide personalized information quickly and accurately, yet they must also respect users’ legitimate privacy concerns9,10. The priority is to develop principles for the design and operation of such systems that manage the huge amounts of multi-modal information safely and securely.
Researchers at the Max Planck Institute for Informatics have recently developed a new marker-less approach to capturing complex human performances (spatio-temporally coherent geometry, motion and texture) from multi-view video. A novel approach to build comprehensive knowledge bases that tap the deepest online information sources and relationships, to address questions beyond today’s keyword-based search engines, has been proposed (de Aguiar, E. et al. ACM TOG 27 (3), 2008; Weikum, G. et al. Comm. ACM 52 (4), 2009).