Printed from www.flong.com/texts/interviews/interview_72dpi/
Contents © 2020 Golan Levin and Collaborators

Golan Levin and Collaborators

Interviews and Dialogues

Interview by Jan Rikus Hillman for 72dpi / DE:BUG 56

Golan Levin, September 2001.

How did the AVES and Scribble projects come about?
The Audiovisual Environment Suite was produced in support of my Master's thesis at the MIT Media Laboratory. I was studying under John Maeda in his Aesthetics and Computation Group, and I had started off my graduate studies with a series of strictly silent experiments, heavily influenced by Scott Snibbe's "Motion Phone," into the real-time creation and manipulation of animated graphics. In the middle of my graduate studies, John got bored of the things I was making and prompted me to attack the really difficult problem—creating systems which could control sound at the same time. When my software received an award in the 2000 Ars Electronica festival, it was John Maeda and Gerfried Stocker [director of the Ars Electronica Festival] who independently suggested that I use the software to do a live performance. That's how Scribble was born; I called up the best composers I know—Greg Shakar and Scott Gibbons—and begged them to help me compose a half-hour live performance using this weird software I had created.

What was your basic intention?
My basic intention behind the Audiovisual Environment Suite was to design a system which would make possible the simultaneous performance of animated image and sound. I chose to implement such a system by making use of the digital computer's capacity to synthesize graphics and sound in response to real-time, high-bandwidth gestural inputs. I'm not the first person to attempt to design such a system; instead, my goal was to raise a provocative new set of questions and answers about the power, beauty, sophistication and personality that it is possible for an audiovisual instrument to have.

What were your sources of inspiration?
My sources of inspiration for the Audiovisual Environment Suite and the Scribble performance lay deep in the extensive history of attempts to synchronize abstract image and sound, variably known as ocular music, visual music, or color music. This particular endeavor has a history that spans several centuries of work by dozens of gifted practitioners. For example, the earliest known device for performing visual music was built in 1734 by a Jesuit priest and mathematician, Father Louis-Bertrand Castel. Castel's Ocular Harpsichord coupled the action of a harpsichord to the movement of transparent tapes, whose colors were believed by Castel to correspond to the notes of the occidental musical scale. In 1789, Erasmus Darwin suggested that visual music could be produced by projecting light from oil lamps through colored liquids. Thereafter followed a steady development of audiovisual instruments, employing a wide range of technologies and materials: Frederic Kastner's 1869 Pyrophone, for example, opened flaming gas jets into crystal tubes to create both sound and image, while an 1877 device by Bainbridge Bishop sat atop a pipe organ and produced light with a high-voltage electric arc. But apart from these, two twentieth-century instruments were particularly inspirational to the AVES work: Thomas Wilfred's Clavilux (1919), and Oskar Fischinger's Lumigraph (1948), both of which achieved considerable critical acclaim through international high-art performances. Both were optomechanical; the Clavilux filtered light through several stages of multicolored glass disks, while the Lumigraph interupted colored beams of light with a flexible fabric surface. Naturally, the computer has had a great impact on the field of visual music, as it removes many of the tradeoffs that constrain the design of physical machines. The three most important inspirations to my work, in the computational domain, were "Timepaint" by John Maeda, the "Motion Phone" by Scott Snibbe, and Music Insects (later sold as SimTunes) by Toshio Iwai, all of which were developed in the early 1990's. John Maeda's Timepaint is a delicate illustration of the dynamic process by which apparently static marks are made: by extending a gesture's temporal record into the third dimension, Maeda's work can flip between a flat animated composition and a volumetric diagram of temporality. Snibbe's Motion Phone is an application for interactively authoring dynamic animations; it accretes recordings of gestures into an abstract animation loop, creating lively and rhythmic patterns of colorful triangles, squares, circles and lines. Iwai's Music Insects, on the other hand, is a paint program in which the pixels deposited by the user operate as scorelike elements in a music-producing simulation.

What's special and unique about the project?
One particular contribution of the Audiovisual Environment Suite was that I attempted to classify the interface metaphors currently used by designers of audiovisual systems, and in response create and identify new ones. So, for example, over the past few years, metaphors for relating sound to image in interactive graphical environments have coalesced into three basic conventions: "timeline" metaphors, "control panel" metaphors, and "interactive widget" metaphors. MIDI sequencers and audio-editing programs, for example, typically use a diagrammatic score or "timeline" metaphor, in which a pitch or amplitude ordinate is plotted against an abscissa of time. Many software synthesizers, on the other hand, have adopted a "control panel" metaphor in which a screen full of knobs, dials, sliders and buttons—in imitation of classic analog hardware devices—provides precise control of a sound's parameters. Finally, some designers have experimented with an "interactive objects" metaphor, in which the properties of one or more reactive virtual widgets are mapped onto generated sounds. Unfortunately, none of these metaphors met the needs of my goal, which was to create an extremely flexible visual performance system coincident with an equally expressive musical performance system, and not merely a GUI for a musical instrument. In order to do this, I introduced the metaphor of an inexhaustible and dynamic audiovisual "substance," which could be freely deposited, controlled, manipulated and deleted by the user's gestures. Another thing which could be considered unique about the Audiovisual Environment Suite was that I prohibited myself from using sprites and MIDI samples. The infinite plasticity of a synthetic canvas demanded that any sonic counterpart to it be equally infinite in its possibilities. And this could only occur if the system's model of audiovisual synthesis ultimately afforded the expressive control of every single sound sample and every single pixel. To provide any less—by resorting to a model based on the mixing or filtering of canned sound loops or sprites, for example—would merely create a toy instrument whose expressive depth would be drastically attenuated and explicitly curtailed from the outset. So I settled on a methodology in which I coded software synthesizers from scratch, exposing expressive hooks into their inner mechanisms along the way. Using the lowest-level synthesis techniques, such as granular synthesis and waveshaping synthesis, allowed the sound and image in the software systems to be tightly linked, commensurately malleable, and deeply plastic. Something which I think is very important and unique about the Scribble performance was that everything we did was created and performed totally live, in real time, on the spot—there were no canned graphics, sequences, or otherwise pre-composed materials involved. It meant that, from the standpoint of the performer, there was much more at stake, and much more that could go wrong—there was no sequencer or backup tape that we could rely on to do our performance for us. As a result, this represented, for me, the first time that I felt legitimate doing electronic performance on stage.

Did you encounter any difficulties?
The most difficult thing of all was learning how to write my own software.

What does it mean to work with moving images?
The answer is in your question, because working with moving images is a meaning-making activity. But what that meaning might be, depends on who is doing the communicating and what they are interested in saying. There is now such an enormous variety of contexts for the production and reception of moving images—such as traditional hollywood cinema, documentary forms, video art, character animation, abstract animation, animated banner advertisements, computer games, and visual instruments, to name a few—that it is impossible to pinpoint a single meaning for this set of activities, other than that it has become one of our most important, interesting (and occasionally controversial) means of communication.

Which skills are required to create moving images?
People have been making moving images for more than two hundred years, using hundreds of different technologies. There have been so many techniques and communications strategies and motivations behind this sort of cultural production, that I think it's difficult to identify any specific set of technical skills behind it. The first animation device we know of was the Thaumatrope, invented in England in 1809—this was basically two drawings, like an open eye and a shut eye, mounted back-to-back on a stick... when you turned the stick, the drawings would appear to animate. So you might think drawing is a necessary skill.. but plenty of video artists and filmmakers today are quite content just to point a camera at something and hit the 'record' button. Oskar Fischinger made animated films in the 1930's by progressively slicing blocks of colored wax, one frame of film per slice... and plenty of artists today use pure code or software animation systems to generate sequences of digital images. But if I really had to try, I'd coin a new word like 'spatiotemporal design' to describe the skill that each of these kinds of artists have in common. It's a kind of design which must consider how to construct or fill space, over time, and (in the interactive realm) taking contingency and conditionality into consideration.

How do these images come alive?
It may sound like a platitude, but I think it's essential to be tuned in to our greatest teacher, nature itself. The classic Disney animators spent hundreds of hours studying slow-motion movies of animals walking. Nick Park spends all his time twisting his face into weird positions, in order to understand how to shape his claymations. It's very easy to make things move mechanically, but quite another matter to produce organically plausible motion, even for something as simple as an abstract circle. In my own work, I rely alot on the motion capture of human gestures as a rich source of input and inspiration. But other techniques work well too: the recognition that the world can be represented as an assembly of masses and springs and dampers is a critical intuition in the creation of most contemporary computer graphics.

How would you define your style?
In the end I hope that my work will not be thought of as conforming to any commonly understood style, i.e. "rave graphics" or some such. Styles conceived in this way are much more like fashions: one's work is only relevant for a short time before it is copied widely and then obsolesced. I'd much prefer the possibility that someone could look at one of my pieces, and somehow detect the unique mark of my own "hand" in it. One time I was surfing the web and I came across a rather remarkable Java applet on the site of some large company. Somehow I immediately knew—I can't specifically explain how—that it was made by Maeda. I knew the tiger by the mark of his claw, so to speak. I think Maeda's accomplishment is particularly special when you consider that he was able to do this entirely with code. I hope my own work, more and more, will come to have this quality. Of course, I'd also like my work to exist timelessly outside the continuum of visual culture, but I fully concede that this is folly, especially in light of the extremely rapid pace at which the culture itself is evolving.

If I had to qualify the features which I strive for in my own work, I'd say grace, whimsy, fluidity, and open-endedness are some of the most important. A few people have said that my work is visibly inflected by the school in which I was educated, namely Maeda's Aesthetics and Computation Group at the MIT Media Lab—certainly I've been influenced by the value they place on technical rigor and visual polish. But it's my hope that my style is something all my own, which exceeds the sum of the above with something else that is personal and intangible.

Do your products reflect your philosophy? Could you give any specific examples?
One of my most closely held personal tenets, and perhaps the most stringent criterion by which I evaluate my own products, is that an interactive experience should be able to completely absorb my attention (as a user) for at least five minutes. That is, if an I can't design an interaction to hold my attention for 5 minutes, then I must either scrap the piece or head back to the code. This criterion is harder to achieve than it might sound, partially because I have an extremely short attention span. It's also difficult because an interactive system must engage its participants in an entirely different way than, say a movie or television show.

For example, it's easy to keep a viewer sedated for five minutes in front of video-based material—practically any video clip of food or explosions or naked people will do. I'm not saying that it's easy to make good video; just that it's easy for people to get absorbed into the video medium. And thus many so-called interactive multimedia experiences provide just that: you click, and you see some absorbing chunk of video; you click again, and you advance to the next one. These systems are fine for conveying the narrative sorts of things conventionally seen on film or TV, but they miss a huge opportunity in terms of the unique malleability and plasticity of computational communication. Here I have to credit Brenda Laurel for first pointing out to me that it is much more interesting to use a computer as an *instrument* or *game* than as a *record player*. We already have lots of different kinds of record players, but none becomes coupled to us so closely or satisfactorily as when we play an instrument or game. When we choose the goal of involving a user in the tight cybernetic feedback loop of some such kind of *activity*, the rewards for both user and designer are completely different, much more personal, and thus much more involving.

In creating experiences that enable this sort of engagement, I've been tremendously influenced by Marshall McLuhan's distinction between what he termed "hot" and "cool" media. To McLuhan, "hot" media are high-definition, high-resolution experiences that are "well-filled with data," while "cool" media are low-definition experiences that leave a great deal of information to be filled in by the mind of the viewer or listener. Photography and film are hot media, for example, while cartoons and telephony are cool. McLuhan's definitions establish a strongly inverse link between the "temperature" of a medium and the degree to which it invites or requires audience participation: hot media demand little completion by their audience, while cool media, "with their promise of depth involvement and integral expression," are highly participatory.

The particular goal of the Scribble systems was to build sophisticated cool media for interactive communication and personal expression. In doing so, I interpreted McLuhan's specification for cool media—that they demand "completion by a participant"—quite literally. The notable property of cool media, I believe, is that they blur the distinctions we make between subject and object, enabling the completion of each by the other. An example of such a subject/object distinction is that between author and authored, the blurring of which, according to psychologist Mihalyi Csikszentmihalyi, is critical to the Zen-like experience of creative flow. Another such distinction is that between sender and recipient, to whose dissolution, wrote the philosopher Georges Bataille, we owe the delight of communication itself. These thoughts contain the seed of my five-minute engagement criterion: instead of judging a system by asking, "for how long can I suspend my disbelief in it?", I developed the questions: "for how long can I feel it to be a seamless extension of myself?" and "to what depth can I feel connected to another person through it?"  And in trying to create systems which answered these questions, I came to rely less and less on the presentation of pre-prepared content, and more and more on the development of intuitive and engaging rule-systems.

Do your designs follow a certain ideal?
My most stringent criterion for my work is that every element or aspect of a given piece conveys the sense that it is a totally essential component. Ideally, this means two things: that no element of the finished artwork can be taken away without destroying the piece, and (complementarily), that if any element of a piece can be removed without harming the artwork, then that element *must* be removed. The end result of this design process is not necessarily minimalistic—some very complex and elaborate patterns can result from it—but it is quite spare, and free from decoration. I've heard this particular ideal restated in a variety of ways by others whom I respect; the graphic designer Alexander Gelman in particular has written an entire book on this philosophy which he calls "Subtraction". Basically I continually ask myself: what are the fewest and simplest means necessary to communicate a given idea? Is every conceivable aspect of the piece somehow incontravertibly motivated by aesthetic necessity, or could one imagine the piece existing in any different way? By restraining myself from adding too much to a piece, it becomes easier for viewers and users to project themselves into it, and I think this is of paramount importance for interactive artworks.

A second ideal for which I've striven in the design of my interactive systems is that they be "instantly knowable, yet infinitely expressible." By "instantly knowable," I mean that no instructions or explanations ought to be necessary for a novice user: the mechanisms of control are laid bare to the intuition, and thus the system's operation is self-revealing. By "infinitely expressible," on the other hand, I mean that the system has an inexhaustible expressive range, which, like a good instrument, requires a lifetime to master. Of course, the reward of infinite expressibility is that different users can develop unique styles or creative "voices" in that medium, and ultimately arrive at a new means for expressing and discovering themselves. There are some great real-world examples of systems which have these properties, such as the piano or the pencil: although any four-year-old can discover their basic principles of operation, an adult can just as well spend fifty years practicing at them, and still feel like there remains more that can be expressed through them. Now, most software systems are either easy to learn, or extremely powerful—but they are almost never both, because this requires that their rules of operation are simple, yet also afford a boundless space of possible outcomes. This is difficult, and nearly contradictory, but I feel it is essential for interactive software to achieve this. So one of my principle ideals has been the design of systems that possess these qualities, of simplicity, possibility, and transparency.

On the web there are two main currents: work designed with digital movie technology and Flash animations.  What are the differences and advantages of DV vs. Flash?
As it happens, I don't work with either, so perhaps I'm in a good position to critique both. First let me say that digital video as an online medium is a logical extension of a hundred years of film history, and it's here to stay. The ways in which people construct film and communicate through it have remained largely unchanged since the early 20th century, when heroes like Eisenstein and Kuleshov worked out the fundamental formal elements of the medium—e.g. "how do cinematic cuts work?", "when should I use a close-up?", etcetera. What has drastically changed, of course, is *what* people are now interested in saying with the medium, and *who* is saying it. The internet has obviously radically changed both, and now the answer is that *everyone* is putting up video of themselves, and most of what they're broadcasting is either porn, police beatings, or exploding Twinkies. Each of which, I might add, is an essential expression, not only of the author but also the culture at large.

Now Flash, on the other hand, is a historic accident. People needed a technology for communicating animations to one another over the Net, and Macromedia happened to have the best available answer. There's nothing quintessentially or teleologically incontravertible about Flash; it's just that animated GIFs were too big and nonreactive, and Java failed to catch on (it was too difficult to learn, Microsoft killed it, and it never ran well on Macintosh computers). By doing the hard work necessary to ensure that Flash worked everywhere, Macromedia was able to beat out the competing standards. Nowadays Flash is *the* ubiquitous technology for nearly any kind of animated graphic on the web. While there's nothing intrinsically wrong with Flash, there's nothing intrinsically right about it, either. For the sake of simplicity, Macromedia elected to give designers creative control of only a tiny fragment of what is possible on the computer. Although this is a sound business decision, the result is that the imaginations of most online designers are now limited to what Flash can do. And with such a small palette to work from, most Flash designs look the same, and taste the same, and generally break little new ground. One of my goals with my own online work, done largely in Java, is to provide a few examples of what can be done when one thinks outside the box of the dominant toolset.

How would you describe your work process? (e.g. do you work with storyboards?)
I don't work with storyboards, chiefly because my animation work is non-narrative. Instead, I generally make a few sketches on paper, work out the necessary equations, and then I make a Java applet which implements the design. If I like the applet, which doesn't always happen, then from there I'll often develop it into a full-screen .exe version, using C and OpenGL.

Which software and hardware do you use?
I write my own software for everything I do. Generally I use Java to make things that I expect to share over the internet, and C for installations or for computers that I configure myself. I try to write my own software whether I'm creating a static, dynamic or interactive work—in all cases, I think it yields more personal results. A few years ago I was the world's biggest Mac devotee, but I switched to Windows when I got tired of fatally crashing my machine every few minutes. I'd love to get a Mac—their hardware is very good, and I'm optimistic about OSX—but I'm still waiting to see how their Java implementation has caught up.

In terms of production - what gave you the biggest headaches?
The most difficult part of any project is, for me, integrating new components into a system. Macromedia Director is a great system, because it can glue so many different kinds of components together: video, sound, MIDI, networking, serial communications, etc. etc. But I generally don't work in Director—its graphics toolkit rarely satistfies my needs—so I frequently end up doing all of this plumbing myself. In the case of the Scribble software systems, this meant connecting an OpenGL graphics system to Microsoft's DirectSound synthesis interface. All of the hooks and connections are documented, but there are a lot of little things to get right, and none of the tricks for achieving optimum performance are documented. So this sort of thing is a pain.

There are very few examples for the amalgamation of film and non-linear structures.  What is so difficult about creating a new narrative framework on the basis of film for the internet?
My opinion is certain to irritate many people, but I happen to believe that narrative media and interactive media are fundamentally incompatible. If I had to soften this statement, then I'd at least say that narrativity and interactivity are opposite forces. For me the question centers around how one conceives of free will: either you can choose your own fate, or you can't. So of course there are some interesting compromises which arise; often these work extremely well by separating the time-scales at which the narrative and interactive forces operate. A classic example is the typical Nintendo game in which the moment-to-moment interactions of a given battle are quite interactive, but in the longest time-scale, somehow we always, inexorably, rescue the princess.

The best non-linear video-based media I've seen was the "Portable Effects" system produced a few years ago by Rachel Strickland at Interval Research in Palo Alto. This system could best be described as "interactive cinema verite'". Strickland, who is an odd combination of architect and documentary film-maker, had collected hundreds of hours of interviews with people as they emptied out the contents of their bags and pockets. People would describe all of the items they carried, why they carried these things, and how they organized them spatially and conceptually in their bags. The variety of interviewees was pretty huge—everything from an eight-year-old schoolgirl to an arctic explorer—and the content was extremely interesting, as it really shed new light on strategies of vernacular design. Strickland and her team then annotated the content of these videos, in order to create an interactive software system that allowed its users to experience and browse the interviews topically and thematically. The really stunning thing was that the system would automatically and seamlessly edit the interviews together, on the fly and just-in-time, according to the specific topical thread that the user was interested in pursuing. The result was a hypertextual cinematic form which was completely narrative in the short time-scale of the individual video clips, but completely interactive in the longer time-scale of how these clips were sequenced or assembled—the opposite, then, of the Nintendo game.

Strickland was able to succeed, in part, because her database of thousands of video clips was both full of rich content as well as richly connected by thousands of thematic annotations. You need a video database to have this degree of density in order for a user to have a satisfactorily individual interactive experience with those materials. At the same time, the internet still isn't ready to serve up the gigabytes of video which constituted her archive. But that kind of bandwidth and storage is only a few years away, and inevitable. I hope the Portable Effects project is able to make it onto the web when the internet is ready.


Many designers took the step from print to WWW and now to motion graphics.  Why does this seem like the natural next step?
This has to do with larger shifts, I believe, in the dominant modes of cultural expression, and the technologies available for such expression, in our visual culture. I'm not really qualified to answer this question—this is really one for the sociologists, media theorists, and historians, if you ask me—but we used to make indentations in clay tablets when it was meaningful and expedient for us to do so, and now new technologies have supplanted this. I'm sure plenty of folks don't agree that motion graphics is a natural next step, just as some people continue to practice stonecutting to this day. I might add that a carved stone is still millions of times more likely to last a hundred years than any video or Flash animation described in this book.

What is the underlying strength of combining animation and interaction?
The answer is short and simple: people can engage in a feedback loop of astonishing responsivity and unparalleled flow.

How important is sound in this context?
A successful filmmaker once told me that the secret to good film is the sound. Sound does more than reinforce action; it establishes an entire setting. David Bordwell discusses this at length in his book, "Film Art".

Is the web a good medium for moving image-content?
We have an essential need to share images and representations: such messages and symbols are one of the chief products, and functions, and fuels, of society. The web is *precisely* the best medium for moving-image content, because it is visual, and radically shared. If it is not yet so in practice, we shall see it become so when technological bandwidth and storage are able to meet our hunger.

How do you envisage the future of moving images?
How will moving images conquer society? Moving images already conquered us, in the 20th century. In the 21st century, we will conquer them back.

Will people relive and play their dreams and wishes on a holodeck?
I think people will be sadly disappointed if they try: even the tinest reminder that the holodeck is synthetic, will make the whole experience seem tragically pathetic. In general, people are most comfortable when they deal with highly abstracted cartoons, or highly realistic photographs. But there's a basic psychological principle which kicks in when a representation of a person is *almost* realistic, but not quite: we get unnerved and revolted. It's not just about the visual rendering, either, but also in the way a character moves and talks. We might accept such representations for games, but as for dreams and wishes... I think we just have too much grey matter devoted to making sense out of our impressions of people, to suspend our disbelief when we enter the psychological realm of unconscious desires. We might accept such representations in movies, but I think it will be a very long time before we prefer computer simulations to dreams, drugs or reality itself.