Remember when AI art generators became widely available in 2022 and suddenly the internet was full of creepy photos that were super cool but didn't look right when you looked closely? Same thing again Be prepared for that to happen. However, this time it's a video.
Last week, OpenAI released Sora, a generative AI model that generates videos based on simple prompts. It's not yet available to the public, but CEO Sam Altman showed off the feature by accepting requests on X (formerly Twitter). The user responded with a short prompt.monkey playing chess in the park,” or “A bicycle race in which various animals compete on the sea.” It's creepy, enchanting, bizarre, beautiful, and prompts the usual cycle of commentary.
Some people insist on the negative effects of Sora and say,wave of disinformationHowever, while I (and experts) believe that future powerful AI systems pose very serious risks, the argument that certain models will result in a wave of disinformation has so far been unsupported. Is not …
Others point out that Sora's many flaws represent the technology's fundamental limitations. This is a mistake people made when they did it with image generation models, and I suspect it will continue to be a mistake. As my colleague AW Ohlheiser pointed out, “Just as DALL-E and ChatGPT have improved over time, so has Sora.”
Both bullish and bearish predictions may still come true, but if people from all walks of life more deeply consider all the things we've been proven wrong in recent years, Sora and the Conversations around AI will become more productive. .
What DALL-E 2 and Midjourney can teach us about Sora
Two years ago, OpenAI introduced DALL-E 2, a model that can generate still images from text prompts. The high-resolution, fantastical images created by this work quickly spread on social media, with people asking, “Is this real art?” Fake art? A threat to artists? A tool for artists? A disinformation machine? Two years later, it's worth a little retrospective if we want our view of Sora to age better.
The release of DALL-E 2 came just a few months earlier than two popular competitors, Midjourney and Stable Diffusion. They each had their strengths and weaknesses. DALL-E 2 created a more photorealistic image and followed the prompts a little more closely. Midjourney was more “artistic.” Together, they have made AI art available to millions of people at the click of a button.
At the time, much of the social impact of generative AI did not come directly from DALL-E 2, but from the wave of imaging models it led. Similarly, we can expect that the important question about Sora is not just what Sora can do, but what Sora's imitators and competitors can do.
Many believed that DALL-E and its competitors were heralding a flood of deepfake propaganda and fraud that threatened our democracy. Such an effect may someday emerge, but those calls appear to have been premature. Analyst Peter Carlyon wrote in December that the impact of deepfakes on democracy “always seems to be just around the corner” and that most propaganda remains of the boring variety. He pointed out that there was. For example, he pointed out, taking statements out of context or pulling out images of a certain conflict. It was shared and incorrectly labeled as something else.
Perhaps at some point this will change, but there should be some humility in claiming that Sora will be that change. We don't need deepfakes to lie to people, but deepfakes are still an expensive method. (The AI generation is relatively cheap, but if you want something specific and convincing, it's much more expensive.) is implied.)
But it's most important for me to remember the past two years of AI history when I read criticism that Sora's images were clumsy, stiff, inhuman, or clearly flawed. . It's true, they are. Sora “does not accurately model the physics of many fundamental interactions,” his OpenAI research release acknowledged, adding that there are problems with cause and effect, left-right confusion, and trajectory tracking. .
Of course, much the same criticism was leveled at DALL-E 2 and Midjourney — at least at first. Early coverage of DALL-E 2 highlighted its incompetence, such as creating terrifying monsters every time a scene required more than one character, and giving people claws instead of hands. AI experts have argued that AI cannot handle “compositionality,” or instructions about how to compose the elements of a scene. reflects a fundamental flaw in technology.
But in practice, models have gotten better at following very specific prompts, and users have gotten better at doing so too, resulting in them now creating images with complex and detailed scenes. It is now possible to do so. Almost all of the interesting flaws were fixed in the latest updates for DALL-E 3 and Midjourney released last year. Current image generators handle scenes of hands and crowds well.
In the period between DALL-E 2 and Sora, AI image generation has grown from a party trick to a large-scale industry. Many things you can't do with DALL-E 2, you can do with DALL-E 3. And even if DALL-E 3 can't do it, its competitors often can. This is an important perspective to keep in mind when reading Sora's predictions. We're probably looking at the early stages of a major new feature that could be used for both good and malicious purposes. It is possible to oversell, but it is also very easy to short sell.
Rather than getting too hung up on a particular view of what Sora and its successors can and cannot do, it's worth acknowledging some uncertainty about the direction ahead. It's much easier to say, “This technology will continue to advance by leaps and bounds,” than to speculate specifically about what that will be.
A version of this story was originally future perfect Newsletter. Register here!