Vector Faculty take new musical style transfer model to ICLR
By Ian Gormely
Artificial intelligence, particularly the fields of machine learning and deep learning, are disrupting nearly every sector imaginable—even the world of art. Still, many artists are embracing the technology for the new creative opportunities it brings.
“The camera didn’t make people stop painting,” notes Sageev Oore, a Vector Institute Faculty Member and Associate Professor of Computer Science at Dalhousie University and jazz pianist, “but it did change what people focused on.”
Oore and fellow Faculty Member Roger Grosse, along with a team of Vector researchersaffiliated students including Sicong Huang, Qiyang Li, Cem Anil, and Xuchan Bao, are among the small but growing number of people exploring the intersection of AI and music. TimbreTron, a musical style transfer model they unveiled in their recent research paper “TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer,” is their proof-of-concept.
The paper, which Grosse and Oore are presenting at this month’s International Conference on Learning Representations (ICLR) – one of the world’s top machine learning conferences -, details a method for how to “take a musical recording played by one instrument and make it sound like it was played by a different instrument,” says Grosse, “while preserving as much as possible about the content including the pitch, the rhythm and, to some degree, the expressiveness.”
Timbre, the sound of a given instrument, is notoriously hard to model. But Oore, Grosse, and their teams circumvented the problem by transforming audio waveforms of a piano piece into images, specifically CQT spectrograms. Using a style transfer model called CycleGAN, they turned the piano spectrogram into a harpsichord spectrogram of the same piece. They then used Google Deepmind’s WaveNet model to change the whole thing back into an audio waveform, except what was once a piano, now sounds like a harpsichord. The system also allows users to change a piece’s tempo without altering the pitch (negating the “chipmunk effect”) or change the pitch without affecting the tempo.
The project originated with Huang who wanted to work on a music-related AI project. At the time, the CycleGAN model was new and “seemed like a natural thing to try,” recalls Grosse, who doesn’t consider himself a musician. He brought in Oore who had already done work combining music and machine learning, including a stint at Google’s Magenta project for incorporating machine learning into creative fields. “This is really up his alley.”
Given his dueling interests in the project, Oore unsurprisingly has differing, though complementary reasons for wanting to partake. His computer scientist side is interested in the amount of control programmers are able to exert when recreating audio and where the limits lie. “We understand more about the audio space and we understand more about the neural net systems for controlling and generating an audio space.”
That said, “from a creative tool point-of-view, the really interesting thing is breaking the tool,” says Oore, recalling something Doug Eck at Magenta often says. Pitch-correction software like Auto-Tune was originally marketed as a way to digitally “fix” out-of-tune vocals. But artists from Cher to T-Pain were more interested in the unnatural ways it could alter the human voice. Oore is similarly curious to hear other sounds TimbreTron might generate. “If it doesn’t produce exactly a piano sound, but it produces something that’s like a cross between a harpsichord and a piano, that might be cooler.”