What Can We Make with These? How Multimodal Interaction Could Transform the Way Human Translators Work and Live

Multimodal interaction–the essence of interactive translation dictation–has taken me, literally, around the world: from Glasgow to Tokyo, from Istanbul to Seattle. Among the many perks of having been part of the ICMI community for several years was an escorted VIP tour of Microsoft Research just outside of Seattle (USA) in 2015, including a visit to their Envisioning Centre.

I was able to see with my own eyes (and operate with my own voice and hands) the various prototypes you can see in this video. The Centre “is all about imagining how technology could be used to make life easier and more enjoyable, sometimes in small ways and sometimes in revolutionary ones.”

Inside this Envisioning Centre, multimodality is all around. Among the expected (and demonstrated) advantages of multimodal systems is indeed an easier and more enjoyable interaction with the technologies we use to live, work and play.

What is Multimodal Interaction?

For me, multimodal interaction means multiple ways of interacting with computational devices and applications. The various interaction modes in multimodal systems often involve natural forms of human behaviour, e.g. voice, touch, gaze, hand gestures and body movement. The user often has the option to combine these or choose between them, depending on the task being performed or the condition of the interaction (e.g., indoors vs outdoors, on-the-go, etc.). Multimodal interaction is about natural human behaviour, choices and flexibility.

Take one example from the video (link above): “What can I make with this” (while showing the system a chili pepper). The system is multimodal in that it processes both combined input signals: the user’s voice query, and the shape, colour and size of the food item. It uses both speech-recognition and object-recognition technology. While monomodal voice queries such as “what can I make with a chili pepper” or “show me recipes that use chili pepper” could have been possible, the shorter multimodal command felt somehow more natural or spontaneous, and returned accurate results.

We Have Researched Alternatives to Keyboard-and-mouse PCs for as Long as We Have Had Them

The first efforts to develop and test multimodal systems took place during the 1980s and 1990s. These works coincided with the ongoing efforts to improve speech recognition technology and integrate it into different professional domains and applications.

One of the earliest works on multimodal interaction is probably Bolt’s famous “Put that there” experiment in 1980, which examines some of the advantages of combining different input modes such as voice and hand gestures. In his seminal paper, Bolt describes an experiment in which the user issues voice commands to a system while pointing to carry out different tasks such as creating, modifying and moving objects. Saying “put that there” while pointing would be easier and more effective than saying, for instance, “put the small purple circle at the top right corner of the screen”. The hundreds of studies that followed have demonstrated the many advantages of multimodal systems as compared to monomodal systems or keyboard-based interfaces, both from the users’ perspective and from a computational perspective: from increased productivity and flexibility to a better user experience; from higher accuracy to lower latency.

Improving Support for Human Translators’ Cognition and Performance

The major impetus for developing multimodal interfaces has been the practical aspects of mobile use. In our day, among the most concrete examples of commercial multimodal interfaces are tablets and smartphones, which are basically voice-and-touch-enabled (but include as well other recognition-based technologies such as gesture, gaze and facial recognition).

However, the interest in multimodal interaction goes well beyond mobility:

“Ultimately, multimodal interfaces are just one part of the larger movement to establish richer communications interfaces, ones that can expand existing computational functionality and also improve support for human cognition and performance […]. One major goal of such interfaces is to reduce cognitive load and improve communicative and ideational fluency.” (Oviatt, 2012)

Such support for improved cognition and performance is what human translators need from their technology tools; not the opposite. At InTr Technologies, inspired by successful case studies of multimodal interaction in many professional domains and day-to-day applications (and what we have experienced with our own senses), we have investigated and continue to investigate how multimodal input could enhance human translators’ capabilities and improve their performance and ideational fluency.

How Do You Envision the Future of Translator-computer Interaction?

If you are a professional translator and, like hundreds of thousands of others around the world, are experiencing frustration with the paraphernalia in your toolbox, or even physical discomfort or illnesses with the traditional desktop PC environment, you can organically introduce multimodal interaction to your work, and harness its power and benefits. You may draw some inspiration from how tech giants envision human-computer interaction in the years to come. (Try searching “future vision 2020” on YouTube and you will find many more examples of how our work and life could look like in the next decade or so). Much like artificial intelligence, most of these “futuristic” technologies are disruptive rather than evolutionary. They do provide us, however, with hints about how else we could be working and collaborating in translation (beyond the traditional picture of the lone individual typing and clicking all day long in front of a desktop or laptop computer ̶ and with a pile of dictionaries).

You may have a dream work environment in mind. You may have examples of successful multimodal interaction in translation (say, if you use voice-recognition software, cloud-based applications and/or mobile devices while you work). You may be just curious to learn about how translators have used multimodal interaction in a lab or a real-life work environment, or what our research has in store for you. If you are any of these translators, or a stakeholder in the language services industry, talk to us  ̶ in English, French, Spanish, Portuguese, Catalan or Italian!  Recognition-based technologies are more and more robust, mobile devices more ubiquitous, and cloud services more reliable. We have the ingredients on the table. Now, what can we make with these?


Lean-thinking the translation process: Typing is wasteful

I trust the header image and title of this post tell the story. Here is, however, a little more meat for the avid, hungry readers.

One of the business strategies executives and entrepreneurs in all industries are being taught nowadays is to adopt the lean-thinking philosophy. Lean is about creating “processes that need less human effort, less space, less capital, and less time to make products and services at far less costs and with much fewer defects, compared with traditional business systems” (lean.org).

Lean proposes a new way of organising human activities to deliver more benefits to society and value to individuals. It aims at reducing or, better still, eradicating waste, that is, any activity or step in a process that does not add value. When wasteful elements are removed from a production line, for example, employees can focus their skills and time on quality work.

Lessons from the Sensei masters

The lean-thinking philosophy was inspired from the Japanese car manufacturing industry. In the 1950s, “Sensei” masters in lean thinking would challenge Toyota line managers, for instance, to look differently at their practices by focusing on:

–The workplace and observing current workflows and work conditions. It is a mark of respect for employees and the opportunity to add value by implementing employees’ ideas and initiatives, instead of attempting to create value through prescribed work.

–Customer and employee satisfaction, and understanding that it is built-in at every step of the process

–“Kaizen”. In Japanese, it literally means “change for the better.” Perfection is sought through a commitment to improve things step-by-step, seeking one hundred 1% improvements rather than one 100% leap forward.

Getting rid of what doesn’t add value

Among the mantras of the lean movement is “eradicate waste”. What this means for businesses is that they must understand processes and eliminate all the barriers that slow down or impede the flow. The translation industry has tirelessly strived to eradicate waste in many ways. In fact, the very reason translation memory systems came into being some three decades ago was the three-R philosophy: reduce, reuse and recycle. Since then, enormous progress has been made in the development of add-in applications designed to increase productivity and reduce the time and effort human translators spend on different tasks.

However, there is one element of the overall translation process that has been overlooked so far: the use of the traditional (mechanical) keyboard as the one and only text input device. I certainly would not set a keyboard on fire and then photograph it; many have done that already. Nonetheless I am one of the few who have investigated the use of the keyboard-and-mouse interface by human translators. Beyond translation, hundreds of thinkers and researchers have demonstrated that the traditional PC environment is a major impediment for cognitive performance, creativity and productivity. This is particularly true for natural-language communication tasks.

I have personally observed that almost 10% of the typing activity by translators involves hitting the space bar, and that deleting, correcting typos, and navigating the cursor around using the arrow keys account for 15-35% of keystrokes.

Wasteful practice that is begging for a Lean solution.

I have also observed that translators are 3-7 times slower when they type than when they speak or read aloud a text in a natural way (with French being the most keyboard-unfriendly language in my investigations yet!).

Interactive translation dictation: A lean solution just around the corner

To eradicate waste and add more value to human translators, InTr Technologies is lean-thinking the translation process by introducing the human voice as the primary input mode in translation. Beyond other natural language processing applications which, for decades, have benefited the industry and improved translators’ productivity, today’s voice-enabled applications are robust and particularly attractive for the industry. Speech is possibly the oldest tool humans have used to communicate, and it is still the most natural way to do so. A quote I read recently inspired me, and it should inspire translators, translation project managers and other stakeholders in the industry to start thinking differently:

“The illiterate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn, and relearn” – Alvin Toffler.

Effectively integrating voice recognition technology into the translation process will necessarily imply unlearning to type, learning to use voice applications effectively and relearning to dictate translations, as many translators did in the days before PCs. It is time to bid farewell to the mechanical keyboard and begin adopting “a leaner way of thinking” in translation. It is time to eradicate wasteful actions in the translation process that do not add value, and empower human translators through interactive translation dictation. It is possible to start now! Ask us how…

La traduction dictée interactive arrive enfin!

En 2018, c’est le moment de dire « Adieu! au clavier » et d’amorcer le grand retour de l’oralité. Dans les années à venir, c’est la traduction dictée interactive qui deviendra la norme dans le secteur de la traduction professionnelle.

Pour taper le mot Vigneault, j’ai besoin de dix frappes au clavier : deux pour le v majuscule et huit autres pour chacune des autres lettres. Si je veux être poli, Monsieur Vigneault : vingt frappes, espace comprise.

Depuis plus d’un demi-siècle, les traducteurs ont la possibilité de dicter leurs textes plutôt que de les taper à l’ordinateur. Pourtant, la traduction dictée et les outils de dictée, très courants dans le milieu de la traduction des années 1960 et 1970, sont vus aujourd’hui avec méfiance par les traducteurs en exercice et par les écoles de traduction. La majorité des traducteurs s’en tiennent aux méthodes traditionnelles parce qu’ils n’ont pas fourni à la traduction dictée l’occasion de faire ses preuves. Toutefois, un nombre non négligeable de traducteurs dans le monde entier dictent leurs traductions actuellement.

Moi, je dicte mes traductions, et j’ai eu la chance de rencontrer plusieurs autres traducteurs qui ont dicté pendant des décennies, ou qui dictent encore. Nul ne cache que la traduction dictée permet de doubler, voire tripler, sa productivité. Elle permet de se concentrer sur le transfert interlinguistique et de produire des traductions de meilleure qualité. Elle aide à prévenir des troubles de santé liés au travail de bureau et à se sentir en forme. Elle offre une satisfaction professionnelle accrue et une meilleure qualité de vie aux traducteurs.

Le grand retour de la traduction dictée, sous une nouvelle forme

Les nombreux témoignages des traducteurs dictant leurs traductions et les travaux scientifiques des quatre dernières décennies nous ont inspirés à développer la traduction dictée interactive chez InTr Technologies. La traduction dictée interactive ravive la traduction dictée telle que pratiquée il y a un demi-siècle tout en intégrant le meilleur des technologies interactives, multimodales et infonuagiques d’aujourd’hui, dont la reconnaissance vocale et les appareils mobiles et à écran tactile.

La traduction dictée interactive offre le potentiel de devenir l’une des techniques de travail les plus efficaces et ergonomiques dans l’avenir de la profession. Elle offre l’avantage de fonctionner dans une vaste panoplie de combinaisons de langues. Elle peut être utilisée n’importe où et n’importe quand, à partir de votre ordinateur ou de vos appareils mobiles.

Au-délà d’une productivité accrue

La traduction dictée interactive donnera une nouvelle dimension à la communication dans le monde globalisé. Elle contribuera d’abord à l’effacement de la frontière entre la traduction et l’interprétation en promouvant une pratique hybride. D’une part, elle offrira aux interprètes la possibilité de travailler sur des projets de traduction écrite, sans souffrir la peine du clavier. D’autre part, elle offrira aux traducteurs la possibilité d’acquérir des compétences d’oralité pour travailler sporadiquement en tant qu’interprètes.

La traduction dictée interactive stimulera aussi l’acquisition de compétences de transfert interlinguistique chez les apprenants de langues secondes et attirera de milliers de nouveaux étudiants en traduction et en interprétation vers les Grandes Écoles et les universités. Ainsi, elle se projette comme une solution incontournable pour répondre à la demande croissante de traduction professionnelle à l’ère du numérique et de la mondialisation. Enfin, la traduction dictée interactive permettra aux locuteurs de langues ne possédant aucun système d’écriture d’accéder à du contenu traduit, indispensable pour leur compréhension du monde, leur éducation et leur développement durable, ainsi que de faire connaitre leur patrimoine littéraire et culturel. Elle peut ainsi contribuer à la survie des langues en danger et renforcer la diversité linguistique sur notre planète.

Adieu au clavier et à l’automatisation de l’art de traduire

La langue est dans son essence une affaire d’oralité. La traduction est, dans son essence, une affaire de compréhension et de communication interlinguistique et interculturelle, et non pas une compétition de frappes par minute ni de mots par jour.

Je ne vois pas la profession de traducteur menacée par la traduction automatique ou l’intelligence artificielle (quoique cette dernière joue un rôle important dans la conception d’outils de traduction dictée interactive emergents). Pour comprendre les subtilités des quelque 7 000 langues et les nuances culturelles, les humains surpassent encore, et de loin, les machines. Je vois dans la traduction dictée interactive le véritable espoir : la véritable façon de produire des traductions humaines de qualité supérieure, à une vitesse très proche de la vitesse de notre pensée.