Issues of automatic shot analysis

 

Haven’t blogged for a long time…but, as they say, forgive him, since he knows not what he is doing.

 

Last time I visited cinemetrics (two years ago), it was not impressive. Through the diligent promotion of Yuri Tsivian, however, the site is now showing some prosperity (as Yuri puts it, “a crop of good news”). I see a lot of familiar names there. Although some faculty members I talked to express some reservations (they are basically mine too), it seems that the database is steadily growing and these human efforts will eventually amount to something.

A very important issue, as I used to believe, is to have a software that can scan the video file and turn out data. For this purpose we are glad to learn there is already several. But so far the problem has been one of accuracy.

Here I want to take another look of the problem, namely, the legitimacy of the object of examination.

First, the shot structure. It is somehow believed that a film is constructed by elementary units such as shots. The shot is not an imaginary thing. It is determined and therefore justified by the nature of mechanical reproduction through an optical device. A shot is taken before any critical intervention. A shot took shape before we can even look at it. In this sense a shot has an integrity we are powerless to deny. However let us do not forget a shot is also broken in the editing process. If a shot is broken into three pieces and intertwined with another three pieces of another shot, do we say there are two shots, or six? Talking about the ontology, two; but in shot analysis, we say six. Therefore, the ontological justification doesn’t hold.

Second, the most important feature of a shot, as it is understood today, is its scale. How do we define it? The shot scale is determined by the closeness of its object to the camera. But if a shot has no human in it, how do we get a sense of its scale? We may say, it is determined by the real depth of the shot, that is, how far does the light go from the camera to run into its object? So the term scale is after all, a misunderstanding. What we are looking at is actually a distance. This, again, goes back to the European and American way of defining shot. Close up means closer. Long shot means the distance is long.

The problem of shot scale is two: first, and this is the reason why it cannot be accurately determined at the moment by a software, because the software does not know what is the depth of the camera’s object. And the problem is not only for the machine, but also for human. For in a deep space staging one would not know what object is the object of the camera, i.e., whether to qualify it as a cu or a ls. Any such knowledge would be an interpretation, running the danger of losing objectivity.

Second, shot scale used to consist of only three, close up, medium shot and long shot. The cinemetric system refines it by adding another five: bcu, mcu, mls, fs and other. Here the problem arises: if we can easily choose between ls and cu, the difference between ms and fs is sometimes not very clear, not to mention if one does not have the time to determine it immediately in viewing the sequence. Moreover, wouldn’t we say in order to achieve maximum accuracy it would be more desirable to have the actual distance instead of these confusing approximations?

Third, when it comes to the analysis of style, I am no longer sure that shot is the most useful term that contributes to our understanding of it. What is more interesting, from what I see it, is to understand how the visual information changes. The visual information changes; that is what distinguishes cinema from a still painting. I see there are four kinds of change: shot change (montage), camera movement, object movement inside a fixed frame, special effects (including lighting and other effects). All these contribute to a visual change, yet only the shot change is acknowledged.

If we are interested in quantifying film style, can we really expect to achieve any interesting result by not accounting for three fourth of the visual changes? Naturally, although these are functionally equivalent as visual changes, they manifest different kinds of stylistic choice. Whether a moving camera or a cut, this is something that a filmmaker has to resolve in his every day experience. Without this difference, we wouldn’t be able to talk about stylistics. Yet in another level, I would suggest that these are indeed secondary. I say secondary in terms of the degree of variation. We have to say that a complete stasis is most different from a rapid movement of camera or a rapid cut. The difference between the latter two, in this sense, is secondary comparatively speaking. One might immediately suggest, is this system capable of handling subtleties which make a master a master? I say definitely, yes. But yes in the sense that this subtlety is reflected, and even in a most accurate way, in the statistics. But of course, statistics is a study of a group of samples, and it is only meaningful if it remains this way. A master’s statistics can have little quantitative difference with an amateur’s, but alas, we know already that statistics is not useful to tell masters from amateurs.

Next time I will try to say more about how do I conceive this system of visual variations.


edit

2 comments:

Thoth Harris said...

Dong wrote: "First, the shot structure. It is somehow believed that a film is constructed by elementary units such as shots. The shot is not an imaginary thing. It is determined and therefore justified by the nature of mechanical reproduction through an optical device. A shot is taken before any critical intervention. A shot took shape before we can even look at it. In this sense a shot has an integrity we are powerless to deny. However let us do not forget a shot is also broken in the editing process."

Right, and so I agree with you Dong, but it gets even more wonky, does it not? Broken by what, and by what kind of process? A human one. But it isn't even that simple. We are dealing with variations of chance (which, its pure form, both does and doesn't exist) and it's influence on events, on at what precise nanosecond the shot is broken. The determinacy of the human beings making the film, and their intervention on this configuration of chance play their part as well. Paolo Pasolini railed against the lionization of the sequence, because, with the sequence shot, there is an even more constant editing process taking place as well. What is editing? Choice? Or the choice of shots? I believe it is many things. I believe that if you compare walking seqences in Béla Tarr's Werckmeister Harmonies and the walking sequences in John Boorman's Point Blank, you will indeed get a vast chasm of aesthetic differences. This is the comparison of and idea, for sure. But yon don't have merely the differences of ideas, but the differences between the speed of footsteps and timbres of those footsteps, between the speed of cutting away (which is, in the case of Boorman's intense and quick as opposed to Tarr's less-montage based cutting but instead one of a camera's approaching, retreating, and stalking at the character(s)' side(s). Both are fascinating approaches with completely different emotional, visceral, and if you will, intellectual effects.

Dong Liang said...

Hi Thoth, nice to hear from you! Hope you are doing well.
As I said, I believe this system of automatic shot analysis will soon be in place for us to use and benefit some of our studies.
But it will NOT be able to answer the kind of subtlties that you marked.

Popular Posts

Blog Archive