Noira-Blanchè-Rougi: The issues of sound

The unique event theory

In response to the traditional view (see my Burch, Sound) that sound manifests a higher degree of realism—or a lesser degree of manipulation—Altman and others propose to regard sound as an unique event. To put it in a simple way, sound is unique because it is a material thing. Here I would like to resume this line of arguments/observations and explorer some further theoretical possibilities.

In Altman’s perspective, sound is essentially a spatially and temporally continuous phenomenon. Yet in analysis, restricted by the correspondent vocabulary, it is often reduced to a much simpler model bearing a limited set of coordinates. In Bordwell’s film textbook, for instance, the acoustic properties of a sound are listed as loudness, pitch, and timbre. This of course is not meant to say that these three parameters are all that we can attribute to a sound, but that they are the most interesting ones to an aesthetic study of the sound. Interestingly, it turns out that all these three terms are directly borrowed from a musical analogy and attributing them to a film sound as primary sources of aesthetic consideration implies an ideal situation where all film sounds can be treated as musical notes functioning in an orchestral work—the film soundtrack. In reality, unfortunately, none of the film sounds can be sufficiently defined by loudness, pitch and timbre; nor is any of these parameters the most contributive in the aesthetic use of sound in a film.

Musical notes are a selected series of discrete, abstract sounds that bear distinctive properties. They function in a self-referential way and are completely subjected to their own arbitrary formal rules of harmony, disregarding the physical effects they are capable of producing. There is a considerable difference between what is noted on a score and what it actually sounds like. The loudness of a note varies of course from performance to performance and the timbre, from instrument to instrument. Even the pitch, which appears to be the only parameter of a note that has a “fixed” value, corresponds to possibly different sounds. The history of pitch standards in Western music is notorious for their variances in regards to what exact frequency a specific note is. The A above middle C, for instance, is now set at 440Hz, known as the concert pitch[1]. But historically, it has been associated with a range of possibilities from 400Hz to 451Hz.

Obviously, this variance does not affect the relative ratio a note represents to its neighbor in a scale; nor does it change the fact that in a system of equal temperament a note has double the frequency of the same note in an octave below. The musical notation system and its terminology (loudness, pitch, timber) owes their validity to the fact that the music harmony relies on relative positions defined in a self-closure structure. Conversely, a natural sound is sampled directly from the real world and defies any rigorous definition of this nature. A natural sound is what we call it because its sound pattern can be properly recognized. If a musical note and a natural sound can indeed be compared, it is due to a cognitive process of turning the multiple, vague, concrete and three dimensional sound from the real world into a single, distinctive, abstract and one dimensional musical note. And vice versa: as a musical note is played, it ceases to be an abstract pitch value and turns into a concrete happening with a certain loudness (decided by the performer) and a certain timbre (decided by the instrument). This bi-directional process is more or less similar to the abstraction from the infinite instances of “three” to the concept of threeness in mathematics and then to apply it back to the real world.

Musical notes are indeed like mathematics, representing an idealization, disregarding several important aspects that we cannot afford to ignore in the film sound. One of these is the perception. In film, the perception of a sound is tempered by the sound reproduction equipment and the acoustic environment where the sound is heard. And this applied to the music performance as well. Even if music successfully gets rid of the three dimensionality in its notation system, the phenomenon returns full-fledged when it is played, recorded, or heard, always in a three dimensional space. Live performances of the same score can vary greatly due to their loci—the acoustic differences between concert hall, stadium, living room and outdoor park make the same piece barely recognizable[2].

Second, unlike any natural sound, which is spontaneously produced and is thus always associated with an event, A “do” does not necessarily signify anything. It does not signify a change in the natural order, nor does it have any intrinsic emotional value. A musical note does not correspond to anything concrete whereas a sound in a film always does. Hence, a large potion of the manipulabilities of the film sound rely on the association or the disassociation of a sound to the visual.

Third, what we typically identify as a single film sound is in fact fundamentally composite in nature. A “single” sound normally comprises multiple frequencies organized in a recognizable spatial-temporal pattern. These patterns—rhythmic or melodic—are essential in our perception of the sound. In fact, every recognizable sound manifests a particular pattern, without which the said sound ceases to be what it is to our understanding. Furthermore, a sound as we perceive it can be divided into several stages: the initial stage of sound production is often termed as attack; it is followed by a stage of sustain and finally, of decay. All these stages, their shortening, or prolonging, play a significant role that contributes to the overall characteristics of the sound we experience. Here are an example, taken from Altman’s book. The word “Boo” appears to be singular in its signification and hence, application. Yet in practice its effectiveness varies greatly according to its context (where we are and what we are doing at the moment), its producer (a child, an adult, a family member or a total stranger), and the distance between the booer and booee (imagine someone booes from ten meters away or right beside your ear). The surprise effect of the sound “Boo” relies heavily on these parameters, which at times render it successful, other times unsatisfactory, as one would experience from real life.

Theoretically any sound will function this way, although admittedly in most circumstances this would be an exaggeration. There are sounds that are conventionally associated with the distance, either closeness or farness. The best example for the former is the human heartbeat, which is often used to signify a subjective presence—I am. A rooster, on the other hand, draws a vivid picture of the landscape, of the environment the protagonist finds him/herself in. Hence, in the grammar of cinema, heartbeat equivalents a close-up, whereas the rooster is often associated with an establishing shoot.

The sound with the images

As we come to realize from the above analysis, a sound has an intrinsic material heterogeneity (Altman) that is distinct from musical notes. In many ways this heterogeneity provides not only the recognizability of the said sound, but also a narrative, an implication of its emotional denotation. The orchestration of sounds in a soundtrack, then, has to take into consideration the harmonizing or conflicting effects when multiple sounds are further juxtaposed. In addition to the heterogeneity and recognizable pattern constituted by a single sound, a sound partition develops a higher level of heterogeneity and pattern which is fully capable of rendering a complex structure. The theoretical potentiality of an independent soundtrack stems from this very fact.

We speak of the “soundtrack”, that is, there is an anticipated complication when the above observation is situated in the context of a film. In film the problem of sound is “worsened” by another fact that the film sound is designed to accompany images—although a soundtrack can exists independently, it functions better when associated with images. Musical notes, on the other hand, need no images. In fact, should a musical piece be associated to one or a series of images, we often feel that such an interpretative attempt damages ultimately the “pureness”, that is, the formal integrity of the said musical piece.

The ways in which a sound can accompany an image is hard to classify in words. But amongst infinite possibilities, the accompaniment is said to be successful only when it creates meaning. Usually, meaning is generated when one can associate one thing to another. But the reverse is also true: the dissociation generates meaning too, albeit in a peculiar way. Hence when we say the accompaniment of sound to images creates meaning, it can work either way: to corroborate or to contradict. If, then, in a audio-visual system sound and images are constantly and systematically established as confirmation and contradiction, we say that the sound is in a contrapuntal position to the images, that the use of audio-visual composite is “dialectical”.

Ultimately, the intelligibility of the image and that of the sound is not entirely equivalent. Human beings lack a rigorous capacity to distinguish sounds. Hence an aural intelligibility signifies often a high contrast of the main element and its surroundings. Visual intelligibility, on the other hand, is easily achieved without much hassle. In fact, apart from close-ups, putting the character in the exact center of the screen is often undesirable as it is perceived as an unnecessary emphasis. A part of a familiar object in view (for example, a hand) is usually sufficient to imply the whole (a person). But a part of a familiar sound? Unlikely. If a man is shown out of focus, or partially off screen, his visual presence is nevertheless recognizable. Now imagine the same person’s voice muffled or distorted: would it still be easily recognizable? Or if the camera is in a violent forward motion, we understand that a character is running. What is the possible aural means to convey this movement? Heartbeat. Footsteps. Events exterior to the movement itself. While the camera manifests the movement in direct, the sound only implies it, does it in a figurative way.

Theoretically, the visual channel is able to provide more space for aesthetic considerations than the aural channel does because its intelligibility is easier to reach. The only way to raise the aural bar is to add the missing spatial dimension. But if the recording & remixing process transforms the original event space into a variation of it, the ultimate playback of the recorded sound adds further confusion by adding another altogether irrelevant acoustic dimension to the original. If the former process can be regarded as primarily subtractive (from 3D to 2D), the latter is primarily additive (from 2D to 3D), where the putative diegetic sound space is overlapped and masked with a double spatial dimension from the actual space where the auditor find themselves in. This makes sometimes the control of the final sound effect difficult, since the acoustic properties of an auditorium is not that easily changeable and adaptive to all kinds of film sounds. In the early years of cinema, since the film is itself silent and accompanied by a piano or an orchestra, the theater designed its acoustic environment to resemble that of a music hall. Moreover, since the type of music is inherently late nineteen century romantic, it demands full range of a big orchestra & choir and enjoys much reverberations. Theater owners thus happily opt for a baroque style decoration with multiple levels, private boxes, rows of fluted columns and endless plaster molding, which provides at once complex visual pattern for the eye to dwelt on and the desired acoustic property. With the advent of film sound, that is, the dialogue, the whole situation changes. For the primary property of the dialogue is its intelligibility and reverberation is its natural enemy. The said theater owners had no other option but to haste to remove all that is fancy in the hall and to add large block of plain-looking sound absorbing materials. The result of this reform is our “modern” style theater.

Is this sort of renovation (or any ongoing ones) will enable a better aural spatial realism for film exhibition? Most probably not. For what makes the human ear capable of picking up the right frequency while in the car and fail in the theater? It is not that the various sounds differ from their frequency, or loudness, or timber, or any applicable acoustic properties, but because the listener can locate the sound source and turn his head (with its attached two sound receptors) towards it. The auditor in the theater cannot make this difference because the spatial dimension that attached to the original sound is not reproduced in the theater. Even if the theater is equipped with spatial orientated speakers, the information needed is eliminated exactly during the practice of separate recording and mixing. In other words, sound mixing practice cannot and has never intended to, produce an authentic rendition of the diegetic aural space (see Charles O’Brien). In this respect an aural spatial realism is fairly hopeless.

The deteriorating indexicality

Human being is equipped with a pair of eyes and a pair of ears. Conversely, we generally uses only one camera and one microphone on the set. If in practice there are actually more than one camera and more than one microphone are used, they are not designed to function in the same way the pair of eyes and ears do. What is the implication of this seemingly obvious fact? The design purpose of the double receptors in the case of human being are multiple: it provides a redundancy of the information, a higher robustness of the system and most importantly, a mechanism to detect the motion and to estimate the relative distance of things. Now comes the theater, where our spectator is completely immobile. If the spectator is to identify with the placement and movement of the camera and the microphone, as an early filmgoer credulously does, he will certainly feel an uneasiness due to the fact that his body is telling him, he is not moving at all! The spectator is thus only “trained” by repetitive viewing experiences that one is not obliged to react in a theater, that what happens in front of his eyes can actually take place a hundred thousand miles away, that there is no need either to get away from a coming train or pull out his revolver when somebody is aiming at him, etc., just as the way a baby in the cradle would have in order to learn how to interact with the world around him.

Furthermore, exactly because the audience is deprived of the spatial information that is habitual in real life, he is rendered more susceptible to the influence of the image. A baby cries. When the ears fail to locate its source, we will have to accept what the screen offers as solution. If, by any chance, the screen does not show a crying baby, by which we can immediately be assured that the sound has been located, how can we possibly determine whether this baby is entirely imaginary (illusion) or cries just outside the frame?[3]

Ears are originally “designed” to complement the limited field of vision, to provide a 360 degree of awareness of the environment which the living creature finds itself in. Sound conveys therefore a sense of directness, provokes often an immediate response. Yet when the audience learn from the theater that such an immediacy is not present, or even not wanted (it is advisable to remain at one’s seat during the screening), they cease to remain alert and relax into their armchair. An animal in the forest startles from the tiniest sound close to it, yet nowadays the bellow of King Kong can barely keep the eyes of the audience open. Sound has turned figurative, so to speak, as it presents a narrative event without the implication that immediate reaction is needed.

Again, as we are deprived of the ability to evaluate the source of a sound, a baby seen crying on the screen is not necessarily the same one heard on the soundtrack. The cries of a baby, like many other sounds, lack a distinct pattern to be recognized as belong only to a certain individual or circumstance. Traffic noise, gun shots, car horns are all example of such a generality. In practice, these sounds are deployed as pure symbols: here is the baby, here is the traffic! Therefore Metz’s comments on the fidelity of sound reproduction in film should not be regarded as purely illusionary (Metz is certainly aware of the fact that real gunshots are so much different than the ones we hear in the theater), and the truth in it is that a gunshot heard in a film is a symbol of the real gunshot—it is immediately recognizable for what it signifies, but not what it is. It is true that in the conventional film grammar, all sounds, except the voices, are often representational than authentic. They serves their purpose of forwarding the narrative without bothering much of an aural actuality. We are often presented with sounds that do not change its volume or other acoustic properties in order to correspond to the movement of its sources or the “pointe d’écoute” . In Psycho (1960), for instance, as Marion attempts to get the attention of the Motel manager played by Norman Bates, she beeps the car-horn. However, when the point of view changes from inside the car (where Marion is) to the outside, to a higher level amid the heavy rain, we do not perceive any change of acoustic effects. Even for Hitchcock, who is fully aware of the extreme manipulability of the soundtrack, the actuality of the sound is sometimes completely ignored: the horn in question is there not to reproduce a horn in all its acoustic parameters, but to signify, to deliver a message.

As a matter of fact, sound usually is, or at least has become much more general in its association with the narrative and emotional motifs than the image does. Gerald Mast, after enumerating various clichés of the mood music, admits, “the amazing thing about these clichés is that, as predictable as they are, they often succeed at imparting exactly the right mood for a scene or emotional effect. And with surprising effectiveness.” (212) The television business has pushed this innocence into a ridiculous extreme, again, with surprising effectiveness. The laughs & claps that are emitted every 3 seconds in many comic soap operas donate the otherwise extremely bland mise-en-scene and lighting with a faked emotional intensity, which apparently works for the mass audience but is loathed by serious filmmakers like Woody Allen.

Unexpectedly, the sound’s generality is achieved through another line, where indexicality is no longer needed. Naturally, the deteriorating of the indexicality is also manifested in the visual domain. But here the sound departs much earlier and goes much more further than the image: while computer generated graphics will still need many years to catch up with the photographic representation, sound has never been the sole result of a straightforward indexical recording and the sound production has already become, in some of the high-end studios, a completely artificial process. “Even the most inexpensive films feature soundtracks that are no longer primarily recorded,” observes Rick Altman, and “the electronic revolution has now made it possible to produce all the music and effects for a film sound track without recording a single cricket or musical instrument.” (44) If indeed, a filmmaker goes all the length to record all the sounds that are needed in a certain scene, that is probably not because it is more aesthetically desirable, but rather more economical.

The lost of indexicality provokes different reactions in sound and image. It is a widely expressed opinion that the increasing use of digital color grading in films is a betrayal of the indexicality, and hence the authenticity. The some people seem not to have been seriously bothered by the same problem issued from the soundtrack. Sound is allowed to be fake while the image certainly is not.

Final words

According to Altman, film critics ostensibly ignore a multiplicity of the film text: the multiple versions of a film, created on diverse social and industrial needs (censorship, standardized length, colorization, foreign-language dubbing) are often defined as the variations of an “original” or “definitive” version; the exhibition spaces where a film is projected are also treated as neutral, as if they are not capable of producing perceptible differences in the spectator. It is by these arguments that Altman ultimately propose “cinema as an event” which aims to turn the traditional core of the concentric film universe (text, production, reception, culture) into the void of a donut shaped “event” floating amid the cultural plasma. While I do believe “cinema as an event” is certainly an interesting notion and potentially fruitful approach, I do not think the “multiplicity of the film text” is by any means ignored. It is the contrary that is probably true—too broad a multiplicity has been assigned to any film text that interpretation has given way to pure enumeration where any particular meaning is no better than the others…

[1] This standard was taken up only “recently” by the International Organization for Standardization in 1955 (and was reaffirmed in 1975) as ISO 16.

[2] I do not want to overemphasize this unrecognizability since for musical notes, a slight change in frequency does not necessarily lead to a perceived change of pitch. The human perception allows for a range, or to put it in another way, is imprecise in nature. And this range varies from person to person.

[3] Michel Fano’s sound score is based on the idea of constantly establishing and denying such a link. For instance, a sound of shattered glass is normally associated with the image of broken glass. But in Fano, apart from this apparent possibility, Fano also associates it with the image of unbroken glass, or an entirely irrelevant scene. The point is not to affirm or deny the association, but through a ludic application, to make the audience come to be aware of the fictive nature of this kind of associations. It is in this level that Fano’s practice is completely in accordance with Robbe-Grillet’s narrative and montage.

Noira-Blanchè-Rougi

The issues of sound

Thursday, January 17, 2008

The unique event theory

The sound with the images

The deteriorating indexicality

Final words

No comments:

Popular Posts

Blog Archive