Every time Madonna sings her 1980s hit “La Islavonita” on her concert tour, a video of swirling sunset-colored clouds is projected on a giant arena screen behind her.
To achieve his ethereal look, the pop legend turned to the uncharted field of generative artificial intelligence: text-to-video tools. Type in words like “surreal cloud sunset” or “jungle waterfall at dawn” and it will create an instant video for you.
Following in the footsteps of AI chatbots and still image generators, some AI video enthusiasts believe this emerging technology could one day transform entertainment, allowing you to choose your own movies with customizable storylines and endings. Some say it's sexual. But there is a long road ahead, and many ethical pitfalls along the way.
For early adopters like Madonna, who had long pushed the boundaries of art, it was more of an experiment. She removed her visuals from an earlier version of the “Lais La Bonita” concert, which had used more conventional computer graphics to evoke a tropical mood.
“We tried CGI, and she didn't like it because it looked pretty bland and cheap,” said Sasha Kashuha, content director for Madonna's Celebration Tour, which runs until late April. Told. “And I decided to try AI.”
OpenAI, the creator of ChatGPT, recently gave a glimpse of what the company's sophisticated text-to-video technology will look like. showed off Sora, a new tool that is not yet publicly available. Madonna's team tried another product from New York-based startup Runway. Runway pioneered the technology when it released its first public text video model last March. The company released a more advanced “Gen-2” version in June.
Runway CEO Cristóbal Valenzuela says some people see these tools as “magical devices that let you type in a word and somehow remember exactly what was in your head,” but the most effective approach is , by creative professionals looking to upgrade to decades-old tools. It's the digital editing software they already use.
He said “Runway” cannot yet produce a full-length documentary. However, background videos or he may be useful for filling out B-roll, i.e. supporting shots and scenes that help tell the story.
“This probably saves us about a week of work,” Valenzuela said. “The common thread in many use cases is that people are using it as a way to extend or speed up something they were able to do before.”
Runway's target customers are “major streaming companies, production companies, post-production companies, visual effects companies, marketing teams, and advertising companies.” There are a lot of people who make a living creating content,” Valenzuela said.
Danger awaits. Without effective safeguards, AI video generators could threaten democracy with convincing “deepfake” videos that never happened, or, as is already the case with AI image generators, They could flood the internet with fake porn scenes that look like humans. A familiar face.Under pressure from regulators, major tech companies promised Watermark your AI-generated output to help identify what’s real.
There are also copyright concerns about the collection of videos and images that the AI system uses for training (neither Runway nor OpenAI disclose the data sources), and the extent to which it unfairly reproduces trademarked works. A conflict is brewing. And there are concerns that video production machines could at some point replace human work and artistry.
For now, the longest video clips generated by AI are still measured in seconds and can contain obvious glitches, such as jerky movements and distorted hands and fingers. Alexander Waibel, a computer science professor at Carnegie Mellon University who has researched AI since the 1970s, said solving this problem “requires more data and more training,” and that computing systems that rely on that training need “more data and more training.” He said that the ability to navigate is important.
“Now we can say, 'Let's make a video of a rabbit dressed as Napoleon walking through New York City,'” Waibel said. “I know what New York City is like, I know what a rabbit is like, I know what Napoleon is like.”
That's great, he said, but it's still a long way from creating a compelling story.
Before releasing its first-generation model last year, Runway's AI claim to fame came as a co-developer of the image generation tool Stable Diffusion. Stability AI, another London-based company, then took over the development of Stable Diffusion.
The underlying “diffusion model” technology behind the leading AI generators for images and video works by mapping noise or random data onto an image, effectively destroying the original image and making sure the new image Predict what it will look like. This borrows ideas from physics that can be used to explain how gases diffuse outward, for example.
“What the diffusion model does is reverse that process,” says Philip Isola, an associate professor of computer science at the Massachusetts Institute of Technology. “They take randomness and put it back in volume. That's how you move from randomness to content. That's how you can create random videos.”
Generating video is more complex than still images because it must take into account temporal dynamics, how elements in the video change over time and between successive frames, says MIT Computer Science – Daniela Russ, another professor who heads the Artificial Intelligence Research Institute, said.
Rus said that because “multiple frames have to be processed and generated for every second of video,” the required computing resources are “significantly higher than for still image generation.”
Still, that doesn't stop some wealthy tech companies from continuing to compete with each other by showcasing longer, higher-quality AI video generation. Requiring a written description to create an image was just the beginning. Google recently demonstrated a new project called Genie that can transform photos and even sketches into “infinitely diverse” explorable video game worlds.
AI-generated videos could appear in marketing and educational content in the near future, said Aditi Singh, a Cleveland State University researcher who conducted a study on text-to-text conversion. He said it would be a cheaper alternative to producing original footage or acquiring stock video. video market.
When Madonna first talked to her team about AI, “the main intention wasn't, 'Look, this is an AI video,'” says creative director Casuha.
“She asked me, 'Can you use one of the AI tools to make the photo clearer and look more modern and high-resolution?'” Kashuha said. Ta. “She loves incorporating new technology and new types of visual elements.”
Longer AI-generated films have already been created. Runway holds an AI film festival every year to introduce such works. But it remains to be seen whether human viewers will choose to watch it.
“I still believe in humans,” said CMU's Professor Weibel. AI will solve it for you. ”
————
Associated Press journalist Joseph B. Frederick contributed to this report.