A video clip of a WSJ interview with OpenAI CTO Mira Murati went viral on social media for the wrong reasons. Murati sat down with the magazine's Joanna Stern earlier this week to discuss OpenAI's new text-to-video conversion tool, Sora, but was decidedly less clear when it came to answering questions about the dataset the tool was trained on. It wasn't. upon.
When asked what kind of data the company used in Sora, Murati said it stuck to “publicly available data and licensed data.”
Mr. Stern also specifically asked where this came from. “So what about YouTube videos?”
In response, Murati looked confused and said, “I don't know.''
Stern continued with the same question, asking, “Videos from Facebook, Instagram?” What about Shutterstock? I know you guys have a deal with them. ”
Murati responded by saying, “I'm not actually sure about that,” adding that if it had been made public, it might have been made public, but “I wasn't sure about that.”
(Subscribe to our technology newsletter Today's Cache for the day's top tech news)
She concluded her answer by simply saying, “I won't go into details of the data used, but it is publicly available or licensed data.”
Murati's response is clear confusion about what the published data actually means, refusal to clearly answer questions, and possible ignorance. This has caused a backlash from Mr.
The source of training datasets for AI tools is a hotbed of legal confusion. Several authors and media publishers have already filed lawsuits against OpenAI for using their copyrighted works to train the AI chatbot ChatGPT without permission.
This is the last free article.