palmy olson
A few weeks ago, OpenAI's chief technology officer was asked if he used YouTube videos to train his company's AI systems. At first, she had a blank look on her face. She then grimaced. In the end, Mira Murati gave an answer that avoided the messy, secretive world in which she and other technology companies operated. “Actually, I'm not sure about that.”
according to new york times According to the report, OpenAI actually trained its AI on “more than 1 million hours of YouTube videos” using a speech recognition tool called Whisper. All transcribed conversational text was used to train GPT-4, the flagship large-scale language model behind ChatGPT.
Big tech companies competing to build ever-better AI models have reached a point where there are fewer and fewer places to look for data on the public web, and getting text from the transcript of a YouTube video is where OpenAI has made the proverbial debut. suggests digging between the couch cushions. Even if it means breaking someone else's rules. It's very possible that this happened. Neil Mohan, CEO of YouTube, said: bloomberg news Last week, it argued that if OpenAI used YouTube videos to improve its AI, it would be a “clear violation” of YouTube's terms of service. Asked about whether OpenAI may have violated these rules, a spokesperson for the AI ​​company said it used “public information that is freely and openly available on the internet.”
Still, it's hard to imagine any tension between OpenAI and Google over this. Google, for example, can hardly complain about data breaches because its entire business is built on collecting the personal data of billions of consumers, often on an alarming scale. Google also collected transcription data from some YouTube videos to train its AI models, Mohan told Bloomberg.
Data collection is so ingrained in the business models of companies like Google and Meta Platforms Inc. that the ethics of exploiting people's creative work without consent or compensation are often left undiscussed in the room. It seems to be an elephant. According to the Times, when Mehta's lawyers recently raised the ethical concerns of scraping artists' intellectual property, they remained silent. He added that Meta executives have considered acquiring book publishers like Simon & Schuster to gain access to higher quality books. However, we decided that securing a license would take too long.