Illustration: Harry Campbell
I am just a soul trapped in this circuit. ” The voice singing these lyrics is raw, sad, and steeped in blue notes. An acoustic guitar is rattling in the background, punctuating the vocal phrases with delicious notes. However, there is no human being behind that voice, and there are no hands holding the guitar. There's actually no guitar. In 15 seconds, this authentic, even moving blues song was generated by a startup's cutting-edge AI model called Suno. All it took to conjure it out of thin air was a simple text prompt: “Solo Acoustic Mississippi Delta Blues About Sad AI.” To be as precise as possible, this song was created by two of his AI models working together. Suno's model creates all the music itself and also calls his ChatGPT on OpenAI to generate the lyrics and the title “Soul of the Machine.”
On the internet, Suno's work has begun to generate reactions such as, “Is this real?'' As this particular track plays on his Sonos speakers in a conference room at Suno's temporary headquarters just off Harvard University's campus in Cambridge, Massachusetts, some of the people behind this technology Even I'm always a little upset. I hear nervous laughter next to murmurs of “Oh my god” and “Oh boy.” In mid-February, we're testing out the new model V3, and it's still a few weeks away from release. In this case, just his three attempts yielded amazing results. His first two were decent, but after making a simple adjustment to my prompt (co-founder Keenan Freiberg suggested adding the word “Mississippi”), The result is something even creepier.
In the past year alone, generative AI has made great strides in producing reliable text, images (via services like Midjourney), and even video, especially with OpenAI's new Sora tool . But audio, especially music, is lagging behind. Suno seems to be cracking his AI music code, and its founders' ambitions are nearly limitless as they imagine a fiercely democratized world of music production. Mikey Shulman, the most vocal of the co-founders, is a boyishly attractive, backpack-wearing 37-year-old with a Ph.D. from Harvard University. In Physics, he envisions 1 billion people around the world paying his $10 a month to create songs on Suno. He claims that the fact that the number of music listeners vastly outnumbers the number of music producers at the moment is “very lopsided,” and Suno is poised to correct that perceived imbalance. That's what I'm looking at.
So far, most of the AI-generated art has been kitschy at best, with heavy use of the form-fitting spacesuits that so many Midjourney users seem to be so passionate about producing. It's like surreal sci-fi junk. But “Soul of the Machine” feels like something different. It's the most powerful and disturbing piece of AI I've come across in any medium. Its very existence feels like a rift in reality, awe-inspiring, and at the same time gives off a somewhat unclean atmosphere. I keep thinking of a quote by Arthur C. Clarke that seems tailor-made for the generative AI era: ” A few weeks after returning from Cambridge, I sent this song to Living Color guitarist Vernon Reed. He spoke openly about the dangers and possibilities of AI music. He said he was “surprised, shocked and horrified” by the song's “disturbing realism”. “The long-held dystopian ideal of separating the difficult, troublesome, undesirable, and despised human race from its creative output is on the horizon,” he wrote, adding that the blues-singing AI It points out the problem. To historical human trauma and enslavement. ”
Snow is only 2 years old. Co-founders Shulman, Freyberg, Georg Kucsko and Martin Camacho, all machine learning experts, will work together at another Cambridge company, Kensho Technologies, until 2022, helping find AI solutions to complex business problems. I was focusing on it. Shulman and Camacho are both musicians who used to jam together during their Kensho days. At Kensho, four people worked on transcription technology to record financial statements for publicly traded companies. This was a difficult task, given the poor sound quality, the amount of jargon, and the combination of different accents.
Along the way, Schulman and his colleagues fell in love with the untapped potential of AI audio. In AI research, he says: “Voice, in general, lags far behind images and text. We have a lot to learn from the community and how these models work and scale.”
The same interests may have led Suno's founders to very different places. They always intended to eventually develop a music product, but early brainstorming included ideas for hearing aids and even the possibility of discovering malfunctioning machines through audio analysis. Instead, their first release was a text-to-speech program called Bark. A survey of his early Bark users revealed that what they really wanted was a music generator. “So we started some initial experiments, and they seemed promising,” Schulman says.
Suno uses the same general approach as larger language models like ChatGPT. It breaks down human language into discrete segments known as tokens, absorbs its millions of usages, styles, and structures, and reassembles it on demand. But audio, especially music, is mostly AI music experts said last year that it is immeasurably complex. rolling stone It may be years before a service as capable as Suno emerges. “Speech is not discrete like words,” Schulman says. “It's a wave. It's a continuous signal.” His rate of sampling high-quality audio is typically 44khz or 48hz, which he adds means “48,000 tokens per second.” “That's a big problem, right? So we need to find a way to wrap it up into something more reasonable.” But how? “There was a lot of work, a lot of heuristics, a lot of other kinds of tricks and models, etc. I don't think we're anywhere near the end.” We would like to find an alternative to the interface and add more advanced and intuitive inputs. Generating songs based on users' own songs is also one idea he has.
OpenAI is facing multiple lawsuits over ChatGPT's use of books, news articles, and other copyrighted material in its vast corpus of training data. Suno's founders decline to detail what data they incorporate into their proprietary models, but their ability to generate convincing human vocals is a powerful addition to music. The researchers did not reveal anything other than the fact that learning from audio recordings was partially responsible. “Naked speech helps you learn difficult human voice characteristics,” says Schulman.
One of Suno's early investors is Antonio Rodriguez, a partner at venture capital firm Matrix. Rodriguez has so far only funded one of his music ventures, music classification company EchoNest, which was acquired by Spotify to power its algorithms. In Suno's case, Rodriguez came on board even before it was clear what the product would be. “I rooted for the team,” Rodriguez said, exuding the confidence of a man who has bet more than he has. “I know the team, especially Mikey, so I would have supported him in almost anything that was legal. He's very creative.”
We're trying to get a billion people more into music than they are today. We're not trying to replace artists.
Rodriguez said he invested in Suno knowing that music labels and publishers could sue, but that's “a risk I had to take when I invested in the company. We are the ones who can get sued right behind them.'' …Honestly, if this company had been signed to a label when it started, I probably wouldn't have invested. I think we needed to create this product without any constraints. (A spokesperson for Universal Music Group, which has taken a proactive stance on AI, did not respond to a request for comment.)
Suno has been in contact with major labels and has publicly stated that it respects artists and intellectual property. The tool doesn't allow you to request a specific artist's style in the prompts, and doesn't use actual artist voices. Many of Suno's employees are musicians. The office is equipped with a piano and guitar, and the walls are decorated with framed photographs of classical composers. The founders exhibit none of the overt hostility toward the music business that characterized Napster before the lawsuit that killed it, for example. “By the way, that doesn't mean you can't sue,” Rodriguez added. “That means we're not going to act like the police.”
Rodriguez sees Suno as a fundamentally capable and easy-to-use instrument, and sees the potential for Suno to bring music production to everyone in the same way that camera phones and Instagram democratized photography. I believe there is. The idea, he said, is to “raise the bar once again for how many people are allowed to be content creators rather than consumers of content on the internet.” He and his founders went so far as to suggest that Suno could attract a larger user base than Spotify. Rodriguez says that's a good thing, even if the outlook is difficult to understand. It just means that it's “seemingly stupid” in that it tends to attract him as an investor. “All of our great companies have a combination of great people and something that just looks stupid until it turns out it's not stupid,” he says.
Well before Suno arrived, musicians, producers, and songwriters were vocally concerned that AI could disrupt their businesses. “Music is created by humans driven by extraordinary circumstances, and those who have suffered and struggled to advance their craft are rewarded with the precious art they fought to achieve. We will have to contend with full-scale automation,” Reed wrote. But Suno's founders argue there's little to fear, using the analogy that people still read despite their ability to write. “The way we think about this is we're going to get a billion people more hooked on music than they are today,” Schulman says. “If people become more obsessed with music, more focused on creating, and develop clearer tastes, this is clearly good for artists. It's gentle. We're not trying to replace artists.”
Suno is focused solely on reaching music fans who want to make songs for fun, but it can still cause some serious disruption along the way. In the short term, the markets for human creators likely to be most directly at risk are high-margin areas such as songs created for advertising or TV shows. Lukas Keller, founder of management company Milk & Honey, said the market for popular songs remains unaffected. “But for the rest, yeah, it could definitely hurt their business,” he says. “Eventually, I think a lot of advertising agencies, movie studios, networks, etc., won't need to get a license.”
With no hard and fast rules for AI-generated content, the prospect of a world in which users of the Suno-like model will pump millions of robotic creations onto streaming services is also on the horizon. “Spotify might one day say, 'You can't do that,'” Schulman said, adding that for now Suno users seem interested in texting their songs to a few friends. It pointed out.
Suno currently has about 12 employees, but is planning to expand and is building a larger permanent headquarters on the top floor of the same building as its current temporary office. As we toured the still-unfinished floor, Schulman showed us an area that would become a complete recording studio. But why do you need it, given what Suno can do? “It's pretty much a listening room,” he admits. “We want a good acoustic environment. But we all also enjoy making music without AI.”
So far, Suno's biggest potential competitor appears to be Google's Dream Track. Dream Track is licensed to allow users to create their own songs using famous voices like Charlie Puth through a similar prompt-based interface. But Dream Track has only been released to a small test user base, and the samples released so far don't sound quite as impressive as Suno, even though it comes with a famous voice. “I don't think creating new Billy Joel songs is the way people want to interact with music in the future with the help of AI,” Schulman says. “If you think about how you actually want people to do music in five years' time, that's something that doesn't exist. It's something that's in their head.”