Stable Audio 2.0, Stability AI's audio generation model, now allows users to upload their own audio samples and transform them using prompts to create AI-generated songs. However, the song has yet to win a Grammy Award.
The first version of Stable Audio was released in September 2023, but only offered up to 90 seconds to some paying users. This meant that I could only create short sound clips for experimentation. Stable Audio 2.0 provides a complete three-minute sound clip, the length of most radio-friendly songs. All audio uploaded must be copyright free.
Unlike Voice Engine, OpenAI's audio generation model, which is only available to a limited number of users, Stability AI made Stable Audio available to the public for free through its website and eventually through its API.
Stability AI says one of the big differences between Stable Audio 2.0 and its previous versions is the ability to create songs that sound like songs, complete with an intro, progression, and outro.
The company let me play around with Stable Audio a little bit to see how it worked, but I'm still a long way from being able to channel my inner Beyoncé. Let's just say we have a long way to go. With the prompt “Folk-pop songs with an American feel” (I'm talking Americana, by the way), Stable Audio generates a song that, in parts, sounds like it belongs on my “Mountain Vibes Listening Wednesday Morning” Spotify playlist. Did.But it also added what I think is the vocal? Another Verge Reporters claim it sounds like a whale's cry. More than that, he worries that he may have accidentally invited something into the house.
New features in Stable Audio 2.0 allow users to customize their projects by adjusting the strength of the prompts (i.e., how well they should be followed) and how much they change the uploaded audio, so theoretically You can tweak the audio to better suit your listening style. . Users can also add sound effects such as crowd roars and keyboard taps.
Strange Gregorian whale noises aside, it's no surprise that AI-generated songs still feel soulless and weird.my colleague Wes Davis ruminated on this after hearing a song written by Suno. Other companies like Meta and Google are also working on AI audio generation, but have not released their models to the public as they gather feedback from developers to address the issue of soulless sound. yeah.
Stability AI said in a press release that Stable Audio is trained on data from AudioSparx, which has a library of over 800,000 audio files. Stability AI claims that artists under AudioSparx were allowed to opt out of material to train models. One of the reasons Stability AI's former VP of audio, Ed Newton-Rex, left the company shortly after launching his Stable Audio was his training on copyrighted audio. In this version, Stability AI says it has partnered with Audible Magic and used its content recognition technology to track and block copyrighted material from entering the platform.
Stable Audio 2.0 is better than previous versions at making songs sound like songs, but it's not quite there yet. If the model insists on adding some vocals, perhaps the next version will include more discernible language.