Printed from www.flong.com/texts/reports/report_yellowtail/
Contents © 2020 Golan Levin and Collaborators

Golan Levin and Collaborators

Project Reports

Project Report for Yellowtail

The following text explains the design and mechanism of two interactive drawing applications, Curly (1998-99) and Yellowtail (1999-2000), which can be experienced here. This material is excerpted from my Masters' thesis, Painterly Interfaces for Audiovisual Performance (MIT, 2000). The complete thesis document, which discusses these works and many other audiovisual software systems, is available here.

Yellowtail: Animated, Real-Time "Pattern Playback"
Excerpt from Painterly Interfaces for Audiovisual Performance,
Golan Levin, 2000: Chapter 3, Section 2.1.

1. Origins

Yellowtail was my first experiment into the design of an environment for the simultaneous creation of both sound and image. It evolved out of an earlier silent piece, called Curly, which I developed in September of 1998. Although Yellowtail eventually fell short of achieving a wholly painterly interface for real-time audiovisual performance—its final design is fundamentally diagrammatic—it was nevertheless an important milestone in the evolution of my thesis work. As we shall see, Yellowtail served as the model against which the goals of this thesis developed in contradistinction.


Figure 52. A screenshot from Curly, developed in September 1998.

Curly, Yellowtail's progenitor, was a reactive paint system in which a user's linear marks transform into an animated display of lively, worm-like lines. After the user deposited a mark, the system would then procedurally displace that mark end-over-end, making possible the simultaneous specification of both a line's shape as well as its quality of movement. Straight marks would move along the direction of their own principal axes, while circular marks would chase their own tails. Marks with more irregular shapes would move in similarly irregular, but nonetheless rhythmic patterns. Curly's screen space obeyed periodic (toroidal-topology) boundary conditions, such that marks which crossed the edge of the screen would reemerge on the screen's opposite side, rather than disappearing altogether. Two different styles of motion could be selected by the user using different buttons on the pointing device: the CURLY_TRAVELLING style, in which the marks would travel across the screen, and the CURLY_STATIONARY style, in which the marks would animate in place.

Figure 53. The marks in Curly can obey one of two different styles of animation. On the left is the CURLY_TRAVELLING style, in which a mark propagates along an axis of movement defined by its endpoints. On the right is the CURLY_STATIONARY style, in which a mark animates in place by cycling its shape through the stationary positions initially established by its original endpoints.


Figure 54. The evolution of a CURLY_TRAVELLING gesture as it progresses down the display.

No randomness was employed in the procedural animation of the "curlies." Instead, their animated behavior is strictly determined by the shape and speed of the mark when it was drawn. Nevertheless, because each line repeats according to its own natural period, the complex phase relationships of the different marks produce the effect of an ever-changing yet coherent animated texture.

2. Sonification

In June of 1999, I had the idea of sonifying Curly by treating its animating canvas as an "inverse spectrogram." Ordinarily, a spectrogram is a diagrammatic image used to visualize the frequency content of sound data. In a typical spectrogram, Short-Time Fourier Transforms (STFT) are applied to extremely small portions of a waveform, and represent the time-based information of the wave segment as components in the frequency domain. Transforms from adjacent windows of sound data are then rendered as a picture to create an image of the sound's frequency content versus time.

Spectrograms were originally developed to analyze sounds, such as speech, but took on provocative new possibilities when used in reverse, as a means of synthesizing sound. This technique, called pattern playback, was first developed by the speech researcher Frank Cooper in the early 1950's [Cooper 1953]. Cooper showed that it was possible to draw a pattern of paint splotches on plastic, and then use a machine of his own design to play back the sound. This made it possible for his lab to do many psychoacoustic experiments, and it also helped validate the use of a spectrogram as an analysis tool [Slaney 1995]. Cooper's machine used an array of light sources, each modulated at one of the fifty harmonics of 120Hz, to illuminate a strip of acetate tape. Patterns were painted on the film, and the light that was reflected from the pattern was transformed by photoresistors into a varying voltage and then amplified for auditory playback. The result, according to Cooper, was "highly intelligible" speech [Slaney 1995].

Figure 55. "Pattern Playback" machine for hand-painted spectrograms made by Frank Cooper in the early 1950's.

Since then, a number of researchers and companies have developed spectrogram-based drawing systems for the analysis and resynthesis of sound. In these systems, a digital image representing the intensity of different audio frequencies over time is used as a "score" for an additive or inverse-FFT synthesizer (a sound synthesizer in which a large number of weighted sinusoids are summed to produce complex tones). Examples of such systems include John Strawn's eMerge (1985), Gerhard Eckel's SpecDraw (1990), B. Holloway's LemurEdit (1993), and Malcolm Slaney's Pattern Playback Plugins (1995), the last of which embedded sound spectrogram technologies in an Adobe Photoshop plugin [Roads 1996, Slaney 1995]. Perhaps the most popular spectrogram resynthesizer, however, is UI Software's Metasynth [UI Software 1998], which merges an additive sound synthesis engine with a variety of spectrogram-specific image editing tools and filters.


Figure 56. The score interface from UI Software's Metasynth, a spectrogram-based synthesizer [UI Software 1998].

As powerful as such systems are, I felt that they could be improved or extended in two important ways. Firstly, none of the pattern playback systems were designed with the capacity to support real-time performance. In all cases, including Metasynth, the metaphor of interaction has been modeled after that of a traditional music sequencer: users paint into the spectrogram, click on the tapedeck-style "play" button, evaluate the sonic results, stop the playback, and then paint some more. This slow feedback loop of painting and audition is suitable for a meticulous style of composition, but makes improvisatory performance difficult or impossible. In sonifying Curly with pattern playback technology, I sought to collapse the duration of this feedback loop in order to produce an effective simultaneity of creation and evaluation. To this end, I borrowed a technique discussed in Chapter Two, in which a looping computer score has the capacity to be modified at the same time that it plays back its contents.

In order to support real-time sound performance, a square spectrogram patch was added to Curly in the center of its canvas. The pixels of the screen's frame buffer coinciding with the location of this patch are fetched at frequent and regular intervals by an additive synthesizer; sound is then generated by mapping the brightnesses of pixel columns in the patch's frame buffer to the individual amplitudes of a bank of additive synthesis oscillators. As a result, any of the drawn marks which happen to intersect or occupy this patch immediately result in auditory events. With the addition of pattern playback sound generation and a minor visual redesign, this new version of Curly was renamed Yellowtail.

Figure 57. The spectrogram interface patch in Yellowtail. A horizontal line called the current time indicator sweeps the patch periodically from bottom to top. At any given moment this indicator may or may not intersect a row of pixels which belong to one of the user's animating marks. Each of the columns of pixels directs the amplitude of a given sinusoidal oscillator in an additive (Fourier) synthesizer. The greater a pixel's intensity, the more of its corresponding oscillator is heard in the final sound. The oscillators are arranged in order of exponentially increasing pitch from left to right, such that the spectrogram's width spans about six octaves.

The second major extension I wished to make to pattern-playback systems was the idea of using an animated image instead of a static one. Even a system which permits real-time score manipulation and playback can yield tiresome results if the score's inherently static nature produces unchanging sounds when looped. An animated spectrogram image, by contrast, held the potential to create manageable variability in both sound and image. The dynamic nature of the Curly animation algorithm provided a ready solution. If the canvas was filled with CURLY_TRAVELLING marks, then the marks would intersect the spectrogram patch at seemingly stochastic intervals, forming a texture of controllably irregular tones and chirps. If, on the other hand, a CURLY_STATIONARY mark were placed into the spectrogram patch, the result would be a periodic event which sounded different every time it was played, yet whose variability was governed by precise bounds set by the user.

In addition to these two key innovations in animated pattern playback, three small design features of Yellowtail are also worthy of mention: its performance grids, its adjustable sound context, and its use of image processing techniques. The first of these, performance grids, refers to a means by which the user's gestures could optionally "snap" to specific quantization grids in the horizontal (pitch) or vertical (temporal) axes. The benefit of this feature is that users can choose to conform Yellowtail's otherwise continuous sound-space into the more discretized sound-space generally characteristic of music. Marks which are conformed to the vertical quantization grid, for example, only make sound at regular divisions of common meter, producing rhythmic noises in a manner similar to a drum machine. Marks which are conformed to the horizontal grid, on the other hand, are restricted to the nearest pitch in an equal-tempered chromatic scale.

A second interesting feature of Yellowtail is its adjustable sound context, in which its spectrogram patch can be picked up by the user and moved around. Originally, it was seen as a shortcoming that the spectrogram patch, owing to limitations of the computer's speed, could not occupy the entire screen space. Interestingly, however, this technological constraint eventually provided a valuable design opportunity for enhancing the system's expressivity. By grabbing the patch itself and dragging it around, the user can treat it as a mobile "sound lens" and thereby to "listen" to different regions of the visual composition. Smaller movements of the patch, such as small left-to-right adjustments, make possible the musical transposition of the marks contained within it, while large translations permit dramatic and instantaneous shifts in context.

Figure 58. A screenshot from Yellowtail, showing its square spectrogram patch in the center, with its horizontal current time indicator. The user's marks have been blurred by Yellowtail's real-time convolution filter, described below.

A third special feature of Yellowtail is the option it provides of applying a real-time 2D convolution operation to the pixels in the spectrogram patch. [Note: this feature only exists in the Silicon Graphics/IRIX version of Yellowtail.] Presently, only one convolution kernel is provided, namely a low-pass filter. The effects of this image processing technique are a substantial blurring of the image, combined with a frame-to-frame temporal persistence similar to video feedback or retinal afterimages. The convolution filter produces soft and attractive visual results, but is also especially noteworthy for the corresponding changes it precipitates in the audio synthesized from the spectrogram patch. When the blurring convolution is enabled, the audio acquires an otherworldly, cavernous, deeply reverberant quality.

3. Discussion

The two most important contributions of Yellowtail are that it (1) permits the real-time creation and performance of spectrographic image patterns, and furthermore that it (2) permits the use of a dynamically animated image, and not just a static image, as the raw material for pattern playback. The combination of these two ideas yields an audiovisual instrument which not only affords an unusual quality and high degree of control over the spectral content of sound, but also makes it possible for this spectral information to gradually (or abruptly) evolve over time in a manner programmed by the user's gestural movements.

It was during the course of developing and critiquing Yellowtail that the primary objective of this thesis—the design of a painterly interface metaphor for audiovisual performance—was crystallized for the first time. Several shortcomings of Yellowtail, in particular, led to the articulation of this goal. I was first struck by the manner in which the painterly visual space of Curly had become conceptually overridden by the addition of Yellowtail's diagrammatic spectrogram patch. I quickly realized that, however its means might differ, Yellowtail's basic metaphor for creating sound was no more novel than that of a traditional score or sequencer. Moreover, its spectrogram's arbitrary mapping between dimensions of sound and image, namely, {X=pitch, Y=time}, had the effect of introducing arbitrary nonisomorphisms into the pictorial plane. Thus the right half of the screen became the privileged location of high-pitched sounds, while visual artifacts situated in the left half of the screen became inextricably bound to low pitches. Such a deliberate and arbitrary non-isomorphism may be a standard device in the visual language of diagrammatic information visualizations, but was, I felt, poorly suited to the compositional language of abstract cinema which had motivated the work since the beginning, and which I wished to preserve.

Another important shortcoming of Yellowtail, from the perspective of a painterly audiovisual creation space, was that its spectrogram interface became an extraneous visual and syntactic intervention in the landscape of an otherwise vibrantly calligraphed surface. To elaborate, I consider the patch to be an extraneous visual element in the image plane, because it is not itself generated by Yellowtail's user, but instead exists as an a priori feature of the software environment. It is, simply, an unrequested component of the visual space, whose continual presence is irrelevant to the user's visual composition, yet irrevocably a part of it; it is as if, in some hypothetical world, every fresh sheet of drawing paper arrived pre-marked with an indelible square in its center. The spectrogram patch is also a syntactic intervention because it functionally segregates the surface of the screen into "pixels which make sound" (the marks inside the patch), "pixels which don't make sound" (marks outside the patch), and, disturbingly, "pixels which signify the presence of an agent which operates on others to produce sound" (the pixels which represent the patch).

Yellowtail succeeds in producing an environment in which there is an "unlimited amount of audiovisual substance," but this substance only obeys the strict and conventional laws of the language of diagrams. In the work that followed, I sought to design audiovisual performance systems situated within the expanded visual syntax of abstract cinema.


  • Frank S. Cooper, "Some Instrumental Aids to Research on Speech," Report on the Fourth Annual Round Table Meeting on Linguistics and Language Teaching, Georgetown University Press, pp. 46-53, 1953.
  • Curtis Roads, The Computer Music Tutorial. Cambridge, MA: MIT Press, 1996.
  • Malcolm Slaney, "Pattern Playback from 1950 to 1995". Proceedings of the 1995 IEEE Systems, Man and Cybernetics Conference, October 22-25, 1995, Vancouver, Canada.
  • UI Software, Metasynth, 1998.