Brian Whitman - July 2005

eigen.jpg

Some time ago, Joe Milazzo did a little piece on Eigenradio, a MIT music broadcasting project to which Bagatellen has long provided a link. After reading Joe’s profile, I checked out the site and was hooked immediately, spacing out to its mysteriously concocted, gnarly mish-mashes at least weekly. So, I was naturally very sorry when Eigenradio went off line recently. Wanting to share my grief with the family, I hunted down Brian Whitman, the “music analyst/synthesist” responsible for both the theoretical science behind and the implementation of this computer-processed radio music. After condoling, I asked Brian a few questions, not only about that site, but about his researches in the area of the musical essences and preferences generally. (A number of his papers are available on line through his website.) Brian’s responses indicate the depth to which he has thought about matters that Bagatellen contributors like me are quite content to muse out loud about at length using as few brain cells to do so as is humanly possible. I here reproduce our e-confab:

When I listened to Eigenradio, I was struck by the enveloping Ivesian thing, you know, watching three marching bands approach from different directions, hearing nine practice rooms at once from the hall of a music building, that sort of thing. But I take it that while the Ivesian complexity is there, Eigenradio wasn't simply a matter of pouring fifteen broadcasts into one sonic soup tureen: there was some careful reduction of broth going on there. Can you describe this process for the layman? Was some computer literally sucking in a bunch of radio broadcasts and spewing out Eigenradio tunes 24-7? How long did the transformations take? Were commercials and/or news filtered out?

The webcast of Eigenradio did involve a computer (well, eight of them) sucking in a bunch of broadcasts and spewing back Eigenradio. In between the radio recording and synthesis a lot of stuff goes on: the stream is first segmented into songs (attempts to separate speech and music are made, but I never explicitly check), and then after a set of songs are recorded we queue them up into batches. Normally I try to take sixty minutes of music down to about three or four minutes.

The serious processing happens in this reduction. An example is when you don't know what a sparrow is--so you go to the library and you open a bird book and see three hundred sparrow pictures. You start noticing things they share and things that differentiate them from other birds and from trees, grass, buildings. Sparrows have certain types of feathers, their beaks angle a certain way. What you're doing is looking at a large set of perceptual input and distilling it down from hundreds of pictures to a small set of unary "distinctive properties." Once you find a small set of these properties, you can use it as a) a "classifier" in which you can tell if something is a sparrow or not, and also b) a "synthesizer" that, given some range of filled-in distinctive properties, can generate an approximation of a sparrow. You can synthesize sparrows endlessly as you know what makes a sparrow a sparrow and also how sparrows differ among themselves. Computers are pretty good at this, given constraints.

For music, same thing, different domain. Instead of pixels from pictures, the computer is looking at features derived from the audio signal. After listening to 60 minutes of audio the eigenanalysis starts: some dominant components are identified, and the computer resynthesizes a smaller piece by mixing these dominant components from different pieces to create a new whole. We had a bank of eight computers doing the work for us, and I'd guess that it took about ten minutes for every hour of Eigenradio to render the result. Not too intense, but again this is sixteen processors worth of work.

Fundamentally, and I mentioned this on the original website, if you took a bunch of music and asked it, "Music, what are you, really?" you'd hear Eigenradio singing back at you. It's the computer's idea of what music really is. Note that it's pretty different than what we would think, and the reason I think it was so successful was that it was a cheap way to hear music from a completely different perspective-- for example, we like periodicity and repetition in our music, a computer (eigenanalysis) hates it and 'optimizes it away'-- why would it want to store two copies of the same thing?

"A Singular Christmas" was very similar except a) we used only Christmas music for the input, not general radio streams, and b) instead of synthesizing the dominant components directly we found the closest matching acoustic ("real") sound from a large database of instrument samples. As a result the Christmas record is more 'listenable' but shares the same mathematical lineage as the radio stream. “A Singular Christmas” was far, far more successful than Eigenradio in terms of listenership and press / publicity. It was broadcast on BBC Radio on Christmas Eve. I counted roughly 600,000 listens in the space of two weeks, MIT ended up turning off access to the files for awhile.

I'm not surprised to hear you say that there was some difference in the manner of putting "A Singular Christmas" together: it seems like something that modern classical listeners might be more comfortable with than most randomly chosen Eigenradio broadcasts, which seem clearly "noisier." But I don't understand the methodological difference entirely. Is it that there weren't so many pitchless (or 'grungy'?) sounds in the database you pulled the instrument samples from as might be heard on a regular radio broadcast? How did the program decide which samples to pull from this database?

After you find the dominant components of a song (let's say they were tempo, periodicity, spectral entropy, mean power density in 80-90Hz etc.) and the dynamic ranges of each component that defines the "song space," your task is then to generate the songs with the widest range of characterization--the songs that should represent all other possible songs through some combination. The problem is that when you resynthesize you need to find a path in a random field that a) reflects the range of components your analysis chose and also b) considers the short time and long scale expectations of the human auditory system. If you compressed a song as a ZIP file, for example, the computer could 'listen' to the compressed version and make the same judgments about content as the original file. But if you tried listening to the zip file, it'd be garbage to you even though all the information is still there. Eigenradio made little to no allowances for this effect, it was fundamentally supposed to represent what computers found beautiful about music by finding what's known as the 'minimum description' of the perception. "A Singular Christmas" was the same as Eigenradio up to the point of resynthesis. We took a bunch of Christmas songs, did the Eigenradio "component finding" algorithm on them, and we were left with the components and characteristics. But instead of just quickly spinning through the possibilities at some fixed rate (the main culprit in Eigenradio's drill-in-a-tunnel acoustic aesthetic) after we determined the components we did a search through "real sounds" to find the ones that best matched the chosen component. This is a nice and easy arbitration between computer and human demands. The "real sounds" come from this database which I maintain for a bunch of projects: internally it's called "all Possible Sounds" and is comprised of about 200 gigabytes of audio samples consisting of instrument and effect sounds. It's all in a database with an acoustic similarity back end. So I can take the dominant component from a group of music and instead of synthesizing it rubber-bass style, I can ask the database to find the best N samples that match it in time, timbre and tone. Other things we can do with this database include resynthesizing a single song just by using the components of all (e.g.) banjo sounds, etc.

What if any connections do you see between your written research on musical tastes and Eigenradio-type extractions of what might be called "musical essences"? Can you imagine an Eigenradio II that extracts very different "essences," thereby producing wildly different music? Does something important follow from that? What could one (possibly) infer about a group which always preferred Eigen I to Eigen II or vice versa?

There is a direct link between Eigenradio and music retrieval work in that the analysis used to find the 'essence' of music is almost exactly the same up to the point of resynthesis. Before you try to get a machine to learn something you want to eliminate redundancy and find covariance among variables in the perceptual feature space. From a music retrieval scientist's standpoint, Eigenradio is what the computer hears when it tells you that the Kinks and Sugarplastic share some traits.

Doing preference studies on synthesized Eigenmusic is a level of indirection that I am not comfortable taking right now!!

In your research there seems to be a hunt for objective measures of taste similarity. I've occasionally opined that a simple ranking system (say of 1-4 stars) combined with a historical database of recording names followed by stars for each reviewer would be much preferable to the linguistic comparisons with various animals and food and the obligatory gushing that constitute ordinary music reviews. I mean, what use is it to me to know that Jones likes or hates recording X if I have no idea whether I generally like the same sort of stuff Jones likes? I suppose there is some entertainment value both to having pieces compared with various colors and in the one-upsmanship of the hunt for the most fervent rave or nasty pan ever, but from a consumer advice point of view, it would seem that a knowledge of reviewer preferences and a thumbs up or down would be more helpful. Is that the sort of thing you're discussing in your research on taste and "automatic reviews"?

Well first, let me disclaim that I am not looking for 'objective' measures of similarity but 'predictive:' given a community and a piece of audio, how will they respond? Objective would imply that there's only one answer.

The automatic record review project was a study to see if there was "learnable" language in record reviews. If a reviewer calls two records "slow and plodding" and there's something in the actual content of the records that matches, we can start to understand what 'slow and plodding' means. But of course that's hardly ever the case--how many adjectives can you name that are objectively informative? Review language is usually far removed from the signal it refers to. So for that project we took the chatty gorilla of pop writing—“Pitchfork”--and pitted it against the boring reliable standby, “All Music Guide.” Obviously AMG trounces “Pitchfork” in both correlation of words to audio (i.e. they use language that can easily describe music) and correlation of "star rating" to audio (there is some underlying feature of the music that contributes to the rating.)

But those things have little to do with how well the reviews help people find music. “Pitchfork” and others are helping consumers find music, plain and simple, no matter how inconsistent my algorithms say they are. So who's more worthless--the trend-reflecting wavering music criticism site that helps people find new music or the scientist that proves that the site is trend-reflecting and wavering?

The recommendation from review scheme you describe is collaborative filtering, my #1 enemy. Go to Amazon and you'll see an entire industry devoted to convincing you to buy something because some stranger you agree with also buys this something. You're clustering random consumers into rock critics, really, and I can't say I would trust self-selected music writers to lead the marketing statistics just yet. The fundamental problem of collaborative filtering approaches is the popularity effect: for something to get noticed people have to notice it first. Critics don't review every CDR and promo package that crosses their desk: selection is biased towards already popular or familiar music, so a feedback loop occurs which is more dangerous to this industry than Bittorrent and Soulseek put together.

The way to escape is to be smart about it: have something that has nothing better to do (a computer) do some first order predictions on preference based on content and contextual analysis. If all of a sudden ILM lights up with talk about a new artist, even if there's no audio yet available, we should pay attention. If the automatic listening systems light up with similarity matches, we should pay attention to it. Treat all of these new ways of discovering "hidden connections" as just another source of information to make filtering decisions with.

I’m not sure I understand why you believe your manner of predicting someone’s likelihood of enjoyment is more reliable or otherwise preferable to Amazon-type "A & B both like it" schemas--so long as the latter utilize sufficient data. Isn't there any hope that the Amazon model, instead of moving from a single joint purchase to a recommendation as it now does, could eventually make suggestions on the basis of huge preference correlations and start reaching excellent success levels?

Sure, if Amazon catalogued more than just sales, of course they'll do better. But I was responding to the underlying problem: you're predicting preferences by studying data that already exists. With that model, you can't predict a response to completely new data. That is public enemy #1 for niche/independent music to take a hold. Economies of scale dictate that the more sales or reviews of something, the more likely it'll be recommended and with higher accuracy. My hate on these systems is purely high-level; I am sure there are tons of things that can be done to aid CF to do better, like using text and finer grained preferences and so on. There are many startups attempting just that and I wish them luck. But going down that path will just make popular records get recommended with some higher amount of success. We don't need that; we have entire industries of marketing and radio assuring that will happen for all eternity. Why don't we take advantage of the technology to help level the playing field for new artists instead of just using it as a cheap advertising tool?

Consider the following hypothetical (based loosely on a recent argument between a Bagatellen reviewer and a saxophonist over whether a recording made by the latter sounds anything like Mats Gustaffson, and whether any such alleged similarity matters anyhow):

Suppose someone X reports that he finds a certain recording of Y's music "really bad." When Y asks him for his reasons for this judgment, X responds "No reason, really. I just found it boring. Uncompelling." Then Y responds by asking if X realized when listening that each tune on the disk is based on a Fibonacci series, that each spells "Bach" repeatedly in some language or other, that 100 hungry Chilean Indians are fed by the proceeds to his disc, and that, if listened to through specially filtered headphones the music therein will provide a cure to both insomnia and colon cancer. Finally Y wonders aloud whether X is sufficiently familiar with Anthony Braxton, Gregorian Chant, Nurse With Wound and Onkyo to comment authoritatively on his work.

What the hell is going on here? Is there any value at all to that sort of colloquy? Can/should it make anybody change his/her mind? Does it provide anything that a thumbs up/thumbs down list doesn't?

In the real world, does anyone ever hear music with absolutely no context? Even if you hear something on the radio by accident, you make an assumption about the artist simply because they have music on the radio. You also know the station and the time of day, maybe the song that came after it. The way I see it, there are two types of context here: the type the artist tried to enforce on you (their "story," their marketing/background, think M.I.A. here), and the type the listener applied to the music (girl trouble, general preference, mood). The annoying bit for people who want to help people find music is that it's very hard to predict either type of context with any accuracy. I build systems that read music blogs, monitor chat rooms, check music news sites every day, and we're getting there, but how do you really automatically detect that the song "Shipbuilding" is about the Falklands war? Or that Oval made a custom CD player, Jason Falkner left Jellyfish in the middle of "She Still Loves Him," and they kept his last solo on the tape, etc., etc. It goes on forever. This stuff is crucial, and yes, definitely, it affects people whether it should or not. Trends, hype, marketing, stories, bio, celebrity, relationships, collaborations are all just as important as the bits flying off the plastic. To some listeners it's more important.

No one can seriously say that they could like a song when devoid of all external forces. It wouldn't be music if they could.

Did MIT ever worry about the legal ramifications of Eigen-style analysis and re-synthesis when Eigenradio was being proposed? Was there always complete confidence that these re-broadcasts were within the "fair use" exceptions to copyright protections? From your own (non-lawyer) perspective, is Eigenradio-style re-parsing more like re-arranging molecules than measures--even if whole words or phrases may be sometimes decipherable?

We were given a free rein at the lab, there's somewhat of a 'don't ask don't tell' policy within reason. I was secretly hoping that I'd get in trouble-- I would have gladly put off my dissertation a semester just to sit in a courtroom playing Eigenradio samples at the jury with the RIAA across the table. Sadly, nothing bad at all happened. It is nice to think about legality and fair use when music experiments are involved, but in reality it's only something you can guess at until you get sued-- there's never a right answer in these things.

My favorite concordant example is the James Newton/Beastie Boys "Choir / Pass the Mic" case, where Newton sued Mario C et al for lifting not the sounds or signal of "Choir" (because they paid for and licensed that) but the meaning of the sound under it ("The four black women singing in a church in rural Arkansas..." according to the deposition) and his style of playing ("flute overblow harmonics.") Newton had a gorgeous unintentional non-sequitur in a letter relating to the case: "There is a spectrograph that moves wildly when my multiphonics are played" as if this singular spectrograph, alone in a field in Arkansas, carried the secret of new forms of musical expression. Newton wanted to own it.

The point is that you can rip apart music into sine waves at parameterized phase and magnitude using a process that most computers are intensively optimized for (the FFT) and then you can put those sine waves in a blender, wire them to a fluorescent light, quantize them at the rate of rainfall in your hometown etc and you could still be stealing the original piece: you always start somewhere. But what pisses off copyright holders is if you make something that bites on the "meaning" of the song: its intended effect and audience reaction. Obviously the most sought after form of meaning is popularity and sales and that's where the lawsuits come in. I love the Newton case because you could tell he was so angry that the Beastie Boys took his story and screwed into it a forgettable overdriven anthem for a Jeep Wrangler in the school parking lot... in Eigenradio's case, we completely remove the story and don't replace it or reappropriate it. We're safe.

I understand you're a performer as well as a research scientist. Tell me about the sort of music you enjoy making/listening to. Are there connections here between either your research on musical taste or Eigenradio or both?

I performed as "Blitter" under my brother Keith (Hrvatski)'s tutelage for a short period in 98-2001. I soon got sick of watching people (and myself) behind laptop screens doing not much of anything. So since 2002 or so I've been concentrating on this "process music" instead, which is great as it's impossible to play live. Besides Eigenradio + Singular Christmas I hope to have a full length soon on some vanity label documenting my experiments in this field. The problem is that every time I buckle down to record something I find a better sounding way to do it: "Never make it, it's changing out from under me..." The forever lost title of this record is "Sofia Safari" so when you see this on a shelf you know I've stopped trying to fix things.

You mentioned to me that you're no longer connected with MIT. What are you up to now? Is there any chance that Eigenradio will go back on the air?

I received my Ph.D. from MIT in May 2005 or so and have started a company (The Echo Nest Corporation) with fellow music analysis/synthesis scientist Tristan Jehan. Now that I have lawyers and investors I can't talk so loudly. Needless to say, the company is 'future music' oriented and it should be a very interesting next few months.

The processes and ideas behind Eigenradio are certainly continuing. Besides the aforementioned full length, there will absolutely be more Eigenradio-related projects in the near future. Stay tuned...

Walter Horn

Posted by derek on July 26, 2005 2:59 PM
Comments

nice

Posted by: Michael Schaumann at July 26, 2005 5:23 PM

Fascinating. Thanks Walt and Brian for expanding on the whole Eigenradio process, which had heretofore looked pretty opaque to me. Or at least more like the classic black box than a flow chart.

And I had no idea Brian was related to Keith Fullerton (Hrvatski).

(Great discussion of intellectual property wonkery, too.)

Looking forward to those upcoming projects.

Posted by: Joe Milazzo at July 27, 2005 7:05 AM


Post a comment










Remember personal info?




Please enter the letter "w" in the field below:

NOTE: there will be some lag after you hit the "submit" button, but not much. That lag is our badass spam deterrent software at work. It is not necessary to use the submit button more than once. Thank you.



.................................................. © 2003 - 2006 bagatellen ..................................................