Burch, Sound

It is often quoted that Bresson remarked in his notes that “a sound always evokes an image; an image never evokes a sound.” Burch interprets this as Bresson believes that the sound represents a “greater realism”, hence more evocative. Here the comment consists of two issues: first, whether or not the sound is more realistic than the image; second, if it is so in a certain context (for instance, in Bresson’s work), does it become more evocative?

For the first issue, apparently Bresson is not alone in this line of argument. The belief that the recorded sound does NOT distinguish from the original had been a popular one ever since the earliest years of cinema. Béla Balázs, for example, states:

What we hear from the screen is not an image of the sound, but the sound itself which the sound camera has recorded and reproduced again…there is no difference in dimension and reality between the original sound and the recorded and reproduced sound. (Balázs 216)

Other similar observations include:

…in a photograph, the original is as present as it ever was. Sound can be perfectly copied…the record reproduces the sound. (Cavell 20)

And it is true that in cinema—as in all talking machines-one does not hear an image of the sound, but the sounds themselves. Even if the procedure for recording the sounds and playing them back deforms them, they are reproduced and not copied. (Baudry 47)

Auditory aspects, provided that the recording is well done, undergo no appreciable loss in relation to the corresponding sound in the real world: in principle, nothing distinguishes a gunshot heard in a film from a gunshot heard in the street. (Metz)

Either “perfectly copied”, “reproduced and not copied” or “undergo no appreciable loss”, both these arguments manifest a strong belief to a “greater realism” that is “inherent” in sound. Interestingly, contemporary film scholars, as James Lastra noted, tend to adopt a different perspective when writing on the sound. For Alain Williams, sound “is never the literal, original sound that is reproduced in the recording, but one perspective on it, a sample, a reading of it.” Rick Altman also stresses that recordings have “only partial correspondence to the original event.” Thomas Levine, more forcefully, claims that “familiarity has dulled the capacity to recognize the violence done to sound by recording.”

This apparent theoretical discrepancy demands our explanation. The simplest one would be the following reasoning. Sound, similar to image, has undergone a “projection” from the three dimensional space where it originates to the two dimensional space where it is recorded (phonograph, magnetic tape, compact disc, etc.) Although sound is projected again, through the use of speakers, into a three dimensional space, this space is not the one that it purports to represent. However, the fact that sound is always perceived three dimensionally gives the illusion that it is reproduced in all its fidelity, that sound does not cheat us, that what we hear is identical to the original.

To conduct an analytical examination à la Noël Carroll of what Lastra described as an obsession of the theoretical discussion that has “returned again and again…to a single central problem,” we need to distinguish sound as an event and sound as a sign. Obviously, in the light of this distinction, what Metz says makes perfect sense since “against the definition of sound as an essentially unrepeatable event, Metz describes sound as an eminently repeatable and intelligible structure” and that “for Metz, it is not a question of there being no literal difference between street and theater, but rather there being no difference in meaning.” Lastra’s argument, it seems to me, is supported by the fact that in the standard practice of non-synchronized sound films, the soundtrack is almost exclusively made by props, but not the real thing—it proves that the audience does not care about, and most of the time cannot tell the difference. Another example given by Chion also shows that even when the sound is first perceived as an event, it quickly transforms itself into a sign, which is a short-cut between cause and effect.

Sound is recorded, yet it is not a recording in the normal sense of the word. Sound participates in the meaning-making process of the film by supplying the missing part of the representation of “reality”, but due to its technical specificities, it has enjoyed a greater freedom, ever since its inception, detached from the diegetic space. For indeed, sound functions as a sign, where what it signifies is more important that how it signifies.

Yet when Bazin talks about the ontological significance of the image, we wonder why the same cannot be said of the sound. For example, a woman’s face on screen always belong to a real woman, but a gunshot does not belong to any particular gun, or most probably not a gun at all. Why a face has to be an individual face while a gunshot is only understood as “A gunshot” ? If the ways in which an image is exploited manifest ideological implications, why similar manipulation, or even more violent ones done to the soundtrack pass unnoticed?

The innocent microphone: selectivity, recognizability and intelligibility

Perhaps the manipulation of the camera has become quite obvious to most attentive viewers. The dominant system, that is, the Standard Version (Bordwell) devested of moral lessons, expresses essentially a wish to maximize the narrative at the price of everything else. To say that the camera is designed to be transparent actually means three things: first, its presence should not be noticed; second, it should follow the spectator’s point of interest; finally, it should grant the spectator a visual pleasure one way or another.

Understandably, sound in the classic system functions almost exactly under the same guidelines. Moreover, if the aesthetics of a transparent camera has somewhat expired these days due to over-analysis, the innocence of the microphone is still largely maintained. The spectator (or the auditor) tends to believe what they hear is what is there to hear, or at least, what is necessary to be heard. For this myth to be properly unveiled, an analogy of microphone to camera can be very useful.

In movie business there are many conventions. The most basic ones are not even in the medium itself. If film is experienced in a theater, then there are at least two constants involved in this process: the distance, angle of a specific viewer to the screen; and a set of distances, angles to the speakers. No matter what these distances and angles are, they remain constant during the whole film (unless the said viewer changes his seat). This complete immobility and constancy is opposed to the film’s inner mobility and variance of the distance, angle between camera and microphone to their objects.

For any event taking place in the narrative, there are infinite ways of recording it. Arnheim deduces from here that the aesthetics of the cinema are necessarily based on the distortions manifested from projecting a three dimensional world onto a two dimensional screen. Crudely speaking, this set of “aesthetics” constitutes only part of what the cinema is capable of, for there is naturally another set of aesthetic possibilities involved in the montage and other post-production processes. In the production process, however, there are basically two things one has to determine as soon as possible; that is where to put the camera and the microphone.

Both as a point immersed in a three dimensional world, the camera and the microphone function strikingly similar in many ways. They are all directional receptors; that is to say, they capture their information along with a specific direction in the three dimensional environment. There are nevertheless differences, some technical, some aesthetic. The technical difference between a camera and a microphone is essential. The camera is equipped with a rigid frame: a piece of information is either on screen or off screen. Conversely, the microphone does not have a rigid sound envelope. It picks up sounds from all directions through reflections and inside its receptive range, the strength/angle ratio manifests a continuous curve without any boundary. It is useful at this point to bring up the fact that compared to the optical camera, the human eye has a much broader reception range and similar to the aural faculty, in the marginal area, visual information loses its precision and becomes blurred. Adopting an Arnheimian argument, we could say that it is this fundamental difference that has originated the various aesthetic possibilities by which the microphone corroborates or contradicts the camera.

Let us take an example where a most obvious contradiction passes unnoticed: two persons engaging a conversation on the street. In Annie Hall (1977) , for instance, Alvy Singer and his friend Rob are walking on the streets of New York City and in the meantime, carrying on a conversation. Not only is the typical street noise considerably lowered, the dialogue is presented in a constant volume despite the fact that the characters who speak are moving from the vanishing point (and off-screen at first) to the point where they are practically in an arm’s reach of the camera. In this case, obviously the sound does not replicate the environment as faithful as the camera does and hence can be said to manifest a lesser degree of realism than the images do. Also, in order to facilitate the intelligibility of the conversation, the actual volume of the traffic has to be suppressed; for the same reason, the volume of the dialogue has to be elevated and normalized. Ultimately, we might as well jump to the conclusion that this is not a synchronized sound at all. And it is fairly safe to do so in numerous cases. In fact, the intelligibility of the dialogue mandates generally ersatz solutions: if the synchronized sound cannot achieve a satisfactory intelligibility, the dialogue must be recorded in studio and mixed later with the ambient sound. For the sake of intelligibility, we must not only sacrifice the authenticity of the sound, but also the unity of the camera and the microphone in a diegetic space. The microphone on the set is rendered useless, so to speak, due to its incompetence to distinguish the human voice out of the chaos.

Noël Burch argues from here that the camera is in fact like his little brother the microphone, a non-selective device (see his example on the surface of the pinball machine). And this nonselectivity actually constitutes a similarity between them, which he believes is the essential relationship between sound and image. Not distinguishing, therefore, can be regarded as the exact virtue of the microphone since it posits the necessity of mise-en-son, where “an overall musical orchestration of all the distinctive elements of the soundtrack seems to be imperative, in somewhat the same manner that the way in which a visual image is perceived demands that constant attention be paid to the total visual composition.” (Burch 92)

Burch’s argument is admittedly justifiable in that it traces the validity of the sound-mixing practice to the selective nature of human perception, now widely known as the cocktail party effect. He observes that if a conversation carried inside a car poses no intelligibility problem to its participants despite the interference of the sound of the motor, of the wind, and sometimes of the radio, it is because human ear is capable of tuning to the specific frequency and implementing an amplification effect to the channel it intends to receive. The microphone, on the other hand, nonselectively jumble all these sounds together and makes consequent comprehension difficult. The separate recording of sound sources and their later mixing and remixing, are therefore justified in that they work in the same direction as the human ear does.

To theorize this distinction, Burch establishes two pairs of contrasting terms: seeing and looking, recording and hearing. For Burch (34-35), “to ‘look’ has to do with a mental process, whereas to ‘see’ has to do with the physiology of the eye.” And most importantly, “when we view a film, as when we view a painting or a photograph, seeing is no longer dependent on looking, as is nearly always the case in a real-life situation; the selectivity involved in looking no longer affects the nonselectivity involved in seeing in the slightest.”

Apparently, the above distinction has far-reaching implications and has the potential, as in Arnheim, to define the specificities of the cinematic machine. First, for both Arnheim and Burch, the cognitive and mental activities involved in cinema are the same as those involved in our everyday abilities since it is the one and same set of faculties that deal with them. Second, for both of them, perceptual laws have to be considered first in order to discuss, if really needed, some heightened state of consciousness or mystical comprehension that leads to the artistic expressivity. Finally, for both of them, the cinematic specificities arise from the differences between our perception of the daily world and of the filmic medium.

This approach, dubbed as bottom-up by Edward Branigan (152), is apparently the opposite of top-down ones that characterizes semiotics, psychoanalysis, socio-cultural study, textual analysis and many others, conveniently summarized by David Bordwell as SLAB theories. In these theories, high level schematic patterns such as self-identification, memory, expectation, purpose, and so on come first and dictate how films mean. Burch, at least in his early writings, asserts a path from how cinema is perceived (phenomenology) to how cinema should be like (aesthetics) and finally to what cinema is (ontology).

What I would like to do here, therefore, is to adopt a bottom-up approach starting with the selectivity of the microphone and deal with several issues that Burch does not clarify. As we see above, the nonselectivity of the microphone forces the sound engineer to record sounds separately, assigning new priorities in the overall aural diegetic space. These new priorities consist of three possibilities, respectively, recognizability, legibility and intelligibility.

Burch’s notion recognizability refers to circumstances where the question “what it really is?” raises. It is on this ground he argues that “an extreme auditory close-up of a drop of water dripping into a sink is as difficult to recognize for what it really is as an extreme visual close-up of the joint of a woman’s thumb.” (91)

Michel Chion extends this line of analogy citing the utterance of “knife” in Hitchcock’s Blackmail (1929), saying that the word functions as a sort of aural close-up. Despite the apparent similarity of these two analogies, a subtlety exists which leads to different aesthetic considerations. For recognizability only an acknowledgement of the source will suffice. But legibility demands the manifestation of the source to be clear. For the instance in question, although the other phrases are blurred, singling out the word “knife”, we nonetheless acknowledge these utterances are words. We have no difficulty recognizing them as human speeches, albeit we might not know what they are referring to exactly. Another example can be found in Chris Marker’s La Jetée (1962), where illegible (even to a native speaker) German whisperings can be heard in many scenes, contributing to the overall atmospheric inquietude. If the above two instance illustrate how filmmakers can manipulate the legibility of the speech by setting up acoustic barrier for their exact meanings, let us not forget that there is also a natural barrier, the linguistic barrier, that constantly affects the intelligibility of the film.

In order for a message to function, it needs to be delivered properly, that is, in a legible form; it also requires a decoding capacity which is purely intellectual. Intelligibility demands the intellect and in our case, a certain linguistic capacity. I am talking about films using a deliberate mixture of foreign languages with no subtitles. Three types of mixture can be found here. The first, Sternberg’s Anahatan (1953), or any instances that follow this example, will need a voice-over to explain things, a gesture that makes at once the narrator (Sternberg himself) closer and the foreign language speaker more distant. The second possibility, understandably common in European films, is to have the dialogue explains itself, to use to legible part of the dialogue to fill in the blank left by the other parts (same technique for the telephone conversation). Unlike the first case, however, this kind of language mixture does not impose a linguistic priority, but rather, reflects a reality of cultural plurality[1]. The third case is much rarer: in this case the foreign language is not explained at all and there is no way to figure it out, at least not in the film. The best examples are Robbe-Grillet’s LImmortelle and Duras’s India Song. In the first film Turkish and Greek are heard without any clues for its meaning, visual or verbal. It is intended this way to convey the very sense of foreignness of someone (French speaking) who finds himself in Istanbul. In Duras’s example, the foreign language is supposed to be Laotian. But unlike the Robbe-Grillet example, where the meaning of the foreign conversation is finally exposed in the published scenario, Duras’s published text of India Song contains no interpretation of the beggarwoman’s monologue, making it an eternal mystery for critics and audience alike.

The following table summarizes our categorization to this point:

Film production

Film spectator

Type of manipulation




“Perfect” manipulation


Linguistic barrier


Acoustic barrier


No manipulation

There are several points remain to be explained. One is our attitude towards the so-called “acoustic manipulation” where the utterance is distorted, prolonged, blurred or reverberated in a thousand ways. Ironically, its opposite,  what I dub as “acoustic de-manipulation”, is often perceived as no less stylistic. In Godard’s 2 ou 3 Choses que je sais d’elle, for instance, people talk in the café. The content of their dialogue is, as is always in a Godard film, very important. But this dialogue is constantly and purposefully disturbed by the noise of the pinball machine. Disregarding the industrial convention that a dialogue the background noise should be recorded separately and mixed together with the latter’s volume lowered, Godard challenges the notion of verbal legibility from another direction, to reveal the manipulation through not manipulating, to demonstrate the inevitable difficulties of the contemporary communication or articulation. In regard to this, Burch comments,

Godard, who is quite interested in sound interference of this sort, often records similar scenes in synchronous sound (or recreates the same effect in a studio), doubtless to make us aware of the effort our ear must make to understand whatever message is being transmitted. (Burch 1981, 101)

The reason why I am citing Burch a lot is because what he proposed in Theory and Practice in regard to the sound-image paradigm, and what I believe as the most revolutionary aspect in his theoretical contribution, has been lamentably unexplored and has practically little follow-ups. For example, Burch (91) presents the following idea,

“The essential nature of the relationship between sound and image is due not to the difference between them, but rather to the similarity between them.” It is due to this similarity that sound and image are capable of forming a unitary presence that is perceived by the spectator. The analysis of this unitary presence, then, constitutes what Branigan (153) called “intermediate forms” that situate between “the technical specifications of the filmmaking equipment”, “the photographic qualities or more generally, the pictorial qualities of its product”, and “the high-level communicative ‘intentions’ of a filmmaker.”

Burch’s theoretical stance can be appreciated in two ways. For one, he propagates the bottom-up approach aligning with Arnheim and early Eisenstein (a Eisenstein of attraction, of organic montage), opposing top-down approaches that dominate film theory for over two decades. For the other, he de-prioritizes and ignores the existential link that is magnified by followers of the cinematic realism tradition (Bazin, Kracauer). For Burch, “everything projected on a film screen has exactly the same intrinsic ‘reality’, the same ‘presence’….all the elements in a given film image are perceived as equal in importance.” The rejection of cinema’s even potential reproduction of reality is significant. It explains Robbe-Grillet’s hostility to Bazin and the Cahiers authors, especially Truffaut. In an interview with Fragola, Robbe-Grillet expressed his contempt.

As a film theoretician, Burch shares with Robbe-Grillet a fundamental understanding of the cinematic medium, that is, its aesthetic goal. In fact, Branigan, on evaluating Burch’s theoretical contribution, makes the following observation (Branigan 153), which I would not hesitate to use as an exact description of Robbe-Grillet’s formalist approach, “certain brief disconcerting moments, ambiguities, retrospective interpretations, hesitation, denials, misjudgments, uncertainties, and indeterminacies that arise in the viewing of film may serve to draw further attention to the constitutive power of form and hence, for Burch, such moments will become an important aesthetic goal for the medium in its effort to achieve an awareness of form.”



