    Audiovisual translation is a branch of translation studies concerned with the transfer of multimodal and multimedial texts into another language and/or culture. Audiovisual texts are multimodal inasmuch as their production and interpretation relies on the combined deployment of a wide range of semiotic resources or ‘modes’. Major meaning-making modes in audiovisual texts include language, image, music, colour and perspective. Audiovisual texts are multimedial in so far as this panoply of semiotic modes is delivered to the viewer through various media in a synchronized manner, with the screen playing a coordinating role in the presentation process.

    Since the 1970s, screen-based texts have become increasingly ubiquitous. Scholars have been quick to bring the investigation of new textual manifestations – ranging from software to videogames – into the remit of audiovisual translation research, thus extending the boundaries of this area of study. Chaume (2004) has documented the successive stages of this expansion by looking at the terms used to designate this field of enquiry during the period in question. Considering that the mainstream forms of audiovisual translation – i.e. subtitling and dubbing – were born on the back of sound motion pictures, it is only natural that the terms ‘film dubbing’ and ‘film translation’ came to feature prominently in early scholarly work. The subsequent emergence of television as a mass medium of communication and entertainment provided new avenues for the dissemination of translated audiovisual texts, with labels such as ‘film and TV translation’ and ‘media translation’ gaining visibility in the literature. The most recent developments relate to the exponential growth in the volume of audiovisual texts produced by and for electronic and digital media. Terms like ‘screen translation’ and ‘multimedia translation’ illustrate the extent to which audiovisual translation has outgrown its core domain of enquiry and annexed neighbouring fields under an all-inclusive research agenda.

    The genealogy of audiovisual translation

    Even during the silent film era, exporting films to foreign markets involved some form of interlingual mediation. The turn of the twentieth century witnessed the incorporation of written language into the conglomerate of film semiotics in the form of intertitles. The use of these texts placed between film frames grew in parallel with the emergence of increasingly complex filmic narratives. Intertitles situated the action in a specific temporal and spatial setting, provided viewers with insights into the characters’ inner thoughts and helped them negotiate the discrepancies between screen time and real time, during a period when filmic techniques were rudimentary. Removing the original intertitles and inserting a new set of target language texts back into the film was all that was required to exploit it commercially in a foreign market. But intertitles also served as the springboard for the development of new forms of audiovisual translation. In-house commentators were employed to fill the same gaps as the intertitles, although these entertainers often sought to enhance the viewer’s experience by spreading gossip about the film stars or even explaining how the projector worked. The national industries of the USA and a number of European countries thrived on this absence of linguistic barriers to film exports until the aftermath of World War I took its toll on the financial capability of European industry to fund new projects. By the early 1920s, American films had come to secure a dominant market share throughout Europe, pushing some national film industries (e.g. British and Italian) close to the brink of collapse.

    According to Forbes and Street (2000), the advent of sound in the late 1920s put a temporary end to the American domination of European film industries, as the big studios became suddenly unable to satisfy the demand of European audiences for films spoken in their native languages. Experimental attempts to appeal to local European sensibilities – e.g. the ‘multilingual filming method’ and the ‘dunning process’ – failed to earn the American industry its lost market share back, and it soon became obvious that new forms of audiovisual translation were required to reassert its former dominance. During the second half of the 1920s, technological developments made it possible to ‘revoice’ certain fragments of dialogue or edit the sound of scenes that had been shot in noisy environments through a process known as ‘post-synchronization’. Despite being conceived as a means of improving the quality of an original recording, post-synchronized revoicing was soon used to replace the source dialogue with a translated version, and is therefore acknowledged as the immediate forerunner of dubbing as we know it today. Concurrent advances in the manipulation of celluloid films during the 1920s allowed distributors to superimpose titles straight onto the film strip images through optical and mechanical means. By the late 1920s it had become customary to use this evolved version of the primitive intertitles to provide a translation of the source dialogue in synchrony with the relevant fragment of speech, thus paving the way for the development of modern subtitling.

    The perfection of these new techniques and their acceptance by European audiences ended the moratorium on American control of European markets, with American films regaining a market share of 70 per cent in Europe and Latin America by 1937. This second wave of domination was regarded as a threat not only to the sustainability of Europe’s national film industries, but also to their respective languages, cultures and political regimes – in the mid-1930s, the latter ranged from democratic systems to fascist dictatorships. The multiplicity of European interests and ideologies would soon lead each country to adopt its own protectionist measures and/ or censorship mechanisms, which were, in many cases, enforced through the choice of specific policies and forms of audiovisual translation. Despite these efforts, and except for brief exceptional periods like World War II, these dynamics of domination were to remain unchanged.


    A typology of subtitling procedures

    Subtitling consists of the production of snippets of written text (subtitles, or captions in American English) to be superimposed on visual footage – normally near the bottom of the frame – while an audiovisual text is projected, played or broadcast. In so far as it involves a shift from a spoken to a written medium, subtitling has been defined as a ‘diasemiotic’ or ‘intermodal’ form of audiovisual translation. Interlingual subtitles provide viewers with a written rendition of the source text speech, whether dialogue or narration, in their own language. In communities where at least two languages coexist, bilingual subtitles deliver two language versions of the same source fragment, one in each of the two constitutive lines of the subtitle.

    Each of the fragments into which subtitlers divide the speech for the purposes of translation must be delivered concurrently with its written rendition in the target language via the subtitle. And given that ‘people generally speak much faster than they read, subtitling inevitably involves … technical constraints of shortage of screen space and lack of time’. Subtitles composed according to widely accepted spatial parameters contain a maximum of two lines of text, each accommodating up to 35 characters. The actual number of characters that can be used in each subtitle then depends on the duration of the corresponding speech unit.

    Since the 1970s, we have witnessed the proliferation of intralingual subtitles, which are composed in the same language as the source text speech. Intralingual subtitles were traditionally addressed at minority audiences, such as immigrants wishing to develop their proficiency in the language of the host community or viewers requiring written support to fully understand certain audiovisual texts shot in nonstandard dialects of their native language. However, intralingual subtitling has now become almost synonymous with subtitling for the deaf and hard of hearing in the audiovisual marketplace, where accessibility-friendly initiatives are receiving increasing attention. Subtitles for the hard of hearing provide a text display of the speech but also incorporate descriptions of sound features which are not accessible to this audience. To compensate for their higher density, this type of subtitle complies with specific conventions in terms of timing, text positioning and use of colours. Although subtitles for the deaf were for a long time restricted to films and programmes recorded in advance, the development of real time or live subtitling technologies, ranging from the stenograph and stenotyping methods to speech recognition systems, has increased the accessibility of live news, live chat shows and reality TV to the deaf community.

    Historically, the terms ‘interlingual’ and ‘intralingual subtitles’ correlated with open and closed subtitles, respectively. Interlingual subtitles have tended to be printed on the actual film, thus becoming part of the audiovisual text itself. Given that they are visually present throughout the screening and universally accessible to all viewers (except for the visually impaired), interlingual subtitles are said to be open. Intralingual subtitles, however, have tended to be encoded in the broadcast signal using a number of technologies, mainly teletext. They are known as ‘closed subtitles’ because they are accessible only to viewers whose television sets are equipped with the relevant decoder and who choose to display them on the screen while watching the programme. The advent of DVD and digital television represents a departure from this tradition as both media provide viewers with closed intralingual and interlingual subtitles, normally in more than one language.

    The subtitling process

    The subtitler’s basic working materials have traditionally included a time-coded VHS copy of the source film or programme and a ‘dialogue list’; i.e. an enhanced post-production script containing a transcription of the dialogue, a description of relevant visual information and sometimes notes for the translator. The text is typically subjected to a ‘spotting’ process, during which the dialogue is divided into segments that are time-cued individually. Each dialogue segment is then translated or transcribed in compliance with certain segmentation and editing conventions, including time–space correlation standards. The output of this process, normally an electronic list of spotted subtitles, is then returned to the commissioner of the translation. In recent years, increased circulation of audiovisual texts in digital format and the development of dedicated software applications have brought about important changes in the subtitling process. Although these new technologies are not necessarily available to all freelance professionals, they now allow subtitlers to complete a project – including the actual transference of subtitles onto the text – using a standard computer.

    Advantages and limitations of subtitling

    Empirical evidence suggests that subtitles can deliver 43 per cent less text than the spoken dialogue they derive from. Given the constraints arising from the synchronous alignment between spoken sound and written subtitles that the industry requires, subtitlers are expected to prioritize the overall communicative intention of an utterance over the semantics of its individual lexical constituents. Deleting, condensing and adapting the source speech are thus some of the most common subtitling strategies deployed by professionals. Under such tight medium-related constraints, subtitling is claimed to foster cultural and linguistic standardization by ironing non-mainstream identities – and their individual speech styles – out of the translated narrative. Pragmatically, this streamlining process can affect, for instance, the impression that viewers form of characters in terms of friendliness. In terms of Venuti’s ‘domestication/foreignization’ dichotomy, the subtitling process typically leads to the domestication of the source dialogue and the effacement of the translator.

    Subtitling can be viewed as a form of ‘overt translation’ since it allows viewers to access the original speech. Effectively, this empowers viewers who have some knowledge of the source language but are unaware of how the subtitler’s work is conditioned by media-related constraints to monitor and criticize the translation. Criticisms are often levelled at subtitling because it represents an intrusion on the image and its processing requires a relatively intensive cognitive effort on the part of the viewer, thus detracting from the overall viewing experience. On the positive side, advocates of subtitling highlight the fact that it respects the aesthetic and artistic integrity of the original text. The viewer’s exposure to a foreign language has also been found to promote the target audience’s interest in other cultures. And finally, subtitling is a comparatively cheap and fast form of audiovisual translation.


    Although there is a lack of consensus on the scope of the term ‘revoicing’, it technically designates a range of oral language transfer procedures: voice-over, narration, audio description, free commentary, simultaneous interpreting and lip-synchronized dubbing. In practice, ‘revoicing’ tends to encompass all these procedures, except for lip-synchronized dubbing, which is commonly referred to as ‘dubbing’. Although all these methods involve a greater or lesser degree of synchronization between soundtrack and on-screen images, the need for synchronization is particularly important in the case of dubbing.

    Voice-over or ‘half-dubbing’ is a method that involves pre-recorded revoicing: after a few seconds in which the original sound is fully audible, the volume is lowered and the voice reading the translation becomes prominent. This combination of realism (the original sound remains available in the acoustic background throughout) and almost full translation of the original text makes voice-over particularly suitable for interviews, documentaries and other programmes which do not require lip synchronization. Voice-over is also used today to translate feature films for some small markets in Europe and Asia because it is substantially cheaper than dubbing.

    Although it is not always pre-recorded, narration has been defined as ‘an extended voice-over’. This form of oral transfer aims to provide a summarized but faithful and carefully scripted rendition of the original speech, and its delivery is carefully timed to avoid any clash with the visual syntax of the programme. In recent years, a very specific form of pre-recorded, mostly intralingual narration has become increasingly important to ensure the accessibility of audiovisual products to the visually impaired: this is known as audio description. An audio description is a spoken account of those visual aspects of a film which play a role in conveying its plot, rather than a translation of linguistic content. The voice of an audio describer delivers this additional narrative between stretches of dialogue, hence the importance of engaging in a delicate balancing exercise to establish what the needs of the spectator may be, and to ensure the audience is not overburdened with excessive information.

    As opposed to these pre-recorded transfer methods, other forms of revoicing are performed on the spot by interpreters, presenters or commentators by superimposing their voices over the original sound. Free commentary, for example, involves adapting the source speech to meet the needs of the target audience, rather than attempting to convey its content faithfully. Commentaries are commonly used to broadcast high-profile events with a spontaneous tone. Simultaneous interpreting is typically carried out in the context of film festivals when time and budget constraints do not allow for a more elaborate form of oral or written language transfer. Interpreters may translate with or without scripts and dub the voices of the whole cast of characters featuring in the film.

    Lip-synchronized dubbing

    Lip-synchronized (or lip-sync) dubbing is one of the two dominant forms of film translation, the other being interlingual subtitling. In the field of audiovisual translation, dubbing denotes the re-recording of the original voice track in the target language using dubbing actors’ voices; the dubbed dialogue aims to recreate the dynamics of the original, particularly in terms of delivery pace and lip movements. Regarded by some as the supreme and most comprehensive form of translation, dubbing ‘requires a complex juggling of semantic content, cadence of language and technical prosody … while bowing to the prosaic constraints of the medium itself’. In the last three decades, there have been several attempts to map out the set of variables moulding this transfer method, mainly by diluting the importance of lip synchrony proper within a wider range of synchrony requirements. These new and more elaborate models of dubbing synchrony advocate the need to match other features of the original film which contribute to characterization or artistic idiosyncrasy. At any rate, the relative weighting of lip matching vis-à-vis other types of synchrony depends on the target market, with American audiences, for example, being more demanding than Italians in this respect.

    The lip-sync dubbing process

    The translation of a source language dialogue list is one of the earliest stages in the dubbing process. Although access to a working copy of the film is crucial for translators to verify non-visual information and make appropriate decisions on aspects such as register or pragmatic intention, this is not always made available to them. The translators’ participation in the dubbing process often ends with the production of a dialogue list in the target language; in practice, translators do not concern themselves with lip movements as they usually lack experience in dialogue adaptation and adjustment techniques. A ‘dubbing writer’ who is adept at lip reading but not always familiar with the source language takes over at this point to ‘detect’ the text. This involves identifying those sounds delivered by screen actors in closeup shots that will require maximum synchrony on the part of dubbing actors and marking their presence on the relevant frames of the film strip. Once the adaptation is ready, the film dialogue is divided into passages of dialogue, called ‘loops’ or ‘takes’, whose length depends on the country where the dubbed version is produced. These takes become the working units during the revoicing of the dialogue track, which is carried out under the supervision of a dubbing director and a sound engineer. The involvement of so many professionals in the dubbing process explains why this form of audiovisual translation is up to fifteen times more expensive than subtitling. The actual translation and adaptation of the dialogue amounts to only 10 per cent of the overall cost, although this depends on the genre – with action and humour films being the cheapest and most expensive, respectively.

    Advantages and limitations of lip-sync dubbing

    Dubbing allows viewers to watch a film or programme without dividing their attention between the images and the written translation. This reduces the amount of processing effort required on the part of the audience and makes dubbing the most effective method to translate programmes addressed at children or viewers with a restricted degree of literacy. In so far as dubbing is a spoken translation of an oral source text, it is possible for the target text to convey more of the information contained in its source counterpart. Also, dubbing allows for the reproduction of the original dialogue’s interactional dynamics, including stretches of overlapping speech and most other prosodic features. On the negative side, dubbing is expensive and time-consuming. Furthermore, it tends to draw on a restricted range of voices to which viewers may become overexposed over a number of years, which detracts from the authenticity of the dubbed film. In relation to the translation process itself, the concern of dubbing practitioners with synchronization and the take-based approach to the revoicing process has often resulted in a ‘compartmentalization’ of the source text. This adherence to the constraints of micro-equivalence often proves detrimental to the ‘naturalness’ and ‘contextual appropriateness’ of the translated dialogue. It is also held accountable for most of the so-called ‘universals’ of dubbed language, including its failure to portray sociolinguistic variation and its overall tendency towards cultural neutralization. The transmission of culture-specific terms and values in dubbed audiovisual texts remains a highly problematic issue. In principle, the revoicing of the dialogue allows for an easy domestication of the original text, including the replacement of source cultural references by their naturalizing counterparts, i.e. their functional equivalents in the target viewer’s cognitive environment. However, these attempts to maintain the illusion of authenticity may backfire and damage the commercial success of the dubbed product when the foreign language and culture draw attention to themselves, e.g. through poor synchronization of mouth movements or the reliance on culturally idiosyncratic visuals.

    Translation in the audiovisual marketplace

    Lip-synchronized dubbing, the most expensive method of audiovisual translation, has traditionally been the preferred option in countries with a single linguistic community – and hence a large potential market to secure a sizeable return on the investment. In some cases (e.g. France), the dissemination of a single dubbed version across the length and breadth of the national territory has been instrumental in achieving linguistic uniformity, to the detriment of regional dialects or minority languages. On the other hand, the predominance of dubbing in Germany, Italy and Spain in the 1930s and 1940s was fostered by fascist regimes. Revoicing a whole film became an effective instrument of censorship, enabling the removal of inconvenient references to facts and values that clashed with the official doctrine. Voice-over, on the other hand, became the transfer method of choice in most Soviet bloc countries and other Asian markets (e.g. Thailand), either because the national language was unchallenged or because budget constraints made the cost of lip-sync dubbing simply prohibitive. Subtitling, on the other hand, thrived in a group of rich and highly literate countries with small audiovisual markets (Scandinavian countries) and bilingual communities (the Netherlands and Belgium), as well as in other states with lower literacy rates but much poorer economies (Portugal, Greece, Iran and most Arab countries), for whom other forms of audiovisual translation were unaffordable.

    Until the mid-1990s, the audiovisual marketplace remained divided into two major clusters: subtitling versus dubbing countries. Since then, however, we have witnessed a series of changes in the audiovisual landscape, including the ever growing volume of programmes and broadcast outlets, the development of digitization techniques and the emergence of new patterns in the distribution and consumption of audiovisual products. This has contributed to blurring the lines between the formerly opposing camps: in any given market, ‘dominant’ or traditional forms of audiovisual transfer now coexist with other ‘challenging’ or less widespread types. The combined use of several established methods within a single programme constitutes developments that continue to contribute to the hybridization of the media industry worldwide (ibid.).

    Research in the field of audiovisual translation

    Although the available body of research on audiovisual translation has grown exponentially in the last two decades, scholars have tended to gravitate to a small range of issues, including the effects of medium-related constraints on the translator’s discretion, transfer errors arising from the search for synchronization and the failure of translated dialogue to recreate social and geographic variation. Luyken et al.’s concerns over the lack of systematic theorization (1991:165) and Fawcett’s warnings against the excessive degree of anecdotalism and prescriptivism in audiovisual translation scholarship (1996:65–9) continued to resonate in subsequent work (e.g. Chaume 2002).

    On the basis of a relatively small number of experimental studies on viewers’ processing habits, reading strategies or reception patterns (e.g. d’Ydewalle et al. 1987; Gottlieb 1995; de Linde and Kay 1999; Fuentes 2001), some researchers have sought to articulate frameworks of rules, time-space correlations and mediation priorities for subtitling and dubbing practitioners. Such frameworks of seemingly undisputed assumptions on viewers’ needs require systematic validation and updating, particularly in view of the increasing ubiquity of screen-based texts in everyday life and the ongoing fragmentation of audiences into specialized niches (Pérez González 2008). The need for robust insights into the perceptual and cognitive dimension of audiovisual translation, however, has been overshadowed in the early part of the twenty-first century by technological developments in the field, including speech-recognition techniques (Eugeni 2007) as well as the use of corpora and translation memory tools (Armstrong et al. 2006; see COMPUTER-AIDED TRANSLATION); these developments seek to respond to the industry’s demand for fast delivery of automated output.

    Audiovisual translation scholars have relied heavily on descriptive translation studies, both under the umbrella of polysystem and norm theories. In their attempt to understand what guides the choice of translation strategies, specialists have examined the status of the source and target cultures vis-à-vis one another within the global audiovisual arena (Delabastita 1990); explored how the interaction of power, prestige and other market factors within a given country has led to the dominance of a specific form of audiovisual transfer (Lambert and Delabastita 1996; Karamitrouglou 2000); and looked into the universality of certain filmic rhetorical devices (Cattrysse 2004). A plethora of studies has drawn on these same theories to identify the operational norms that guide the actual transfer of textual material in the main forms of audiovisual transfer. Some of these studies have resulted in descriptions of widely accepted translation standards (Karamitrouglou 1998), techniques and strategies (Díaz Cintas and Remael 2007). A descriptive agenda also informs a series of new corpus-based studies of dubbed language, which seek to demonstrate the limited influence of the source text on the configuration of emerging target text norms (Pavesi 2005).

    Against the backdrop of increased attention to processes of contextualization, recent publications on audiovisual translation have drawn on theories from neighbouring disciplines, including pragmatics (Hatim and Mason 1997; Kovačič 1994) and gender studies (Baumgarten 2005). As in other fields of translation studies, researchers have also investigated the impact of clashes of ideology and power differentials on dubbed or subtitled dialogue (Ballester 1995, 2001; Remael 2003) and looked at the translator’s mediation in terms of domesticating and foreignizing strategies (Ulrych 2000; Fawcett 2003). Amateur subtitling cultures such as fansubbing (Pérez González 2006b) – which emerged as a result of the increasing compartmentalization of subtitling audiences – represent an extreme example of foreignization, known as ‘abusive subtitling’ (Nornes 1999). Amateur translators exploit traditional meaning-making codes in a creative manner and criss-cross the traditional boundaries between linguistic and visual semiotics in innovative ways, thus paving the way for new research informed by multimedia theory (Pérez González 2008).

    Source: Routledge Encyclopedia of Translation Studies
