What happens when artificial intelligence analyzes human thinking about AI and copyright?
While tech giants and startups alike are advancing the development of AI models, the legal landscape remains full of uncertainty when it comes to current and future rules regarding AI and copyright. Earlier this week, U.S. Representative Adam Schiff introduced new legislation that would require AI companies to disclose AI training content, including text, images, music, and video. Meanwhile, more writers, musicians and people in other creative professions are also speaking out. Last week, 200 musicians, including Billie Eilish, Jason Isbell, Nicki Minaj and Bon Jovi, signed an open letter calling on companies to protect artists from the “predatory use of AI.”
The U.S. Patent and Trademark Office is also considering new rules related to AI and copyright, including issuing guidance in February and again this week. It is considering whether works using AI can be copyright protected, and whether AI systems should be trained using content that is already protected. As part of the rulemaking process, the USPTO submitted approximately 10,000 comments from a variety of stakeholders, including businesses, AI experts, artists, and organizations, expressing a wide range of views on AI and intellectual property.
The huge pile of comments led to broader questions. What does an AI model recognize by analyzing an entire collection of human comments about copyright?The themes within the commentary paint a picture of what different stakeholders want from her USPTO. Will it help?
To better understand this sentiment, Digiday is collaborating with AI company IV.AI to develop an AI called natural language processing (NLP) that analyzes language to identify patterns in words and phrases, identifying meaning from text. We investigated the set of comments using a subset of Founded in 2016, IV.AI helps leading brands use AI to make decisions, find insights from unstructured data, and deploy AI within their businesses. Brands that have partnered with the Los Angeles-based company include Netflix, Estée Lauder, Aeromexico, Sony, and Capital One.
IV.AI focused on the four key questions the USPTO asked for submissions to frame its analysis. These are training AI with copyrighted material, copyright capacity for AI-generated content, liability for AI-generated infringements, and legal treatment. AI output that mimics the style and identity of a human artist.
While many of the comments in the same column reflect general concerns about human creative rights, they also reflect how businesses, individuals, and organizations are thinking about long-term ownership of content and data. I am. Just as social media companies learned from user-generated data, many AI companies are now doing the same by training their AI models on content posted on various platforms.
“It’s interesting when it comes to all these different things.” [companies] It was already working on other people's ideas,” said Vince Lynch, CEO and co-founder of IV.AI. “The same thing applies to social media platforms. They all learn from all the data we create and profit from it just by giving us the space to write… They [saying] “This is our data” But it wasn’t actually your data to begin with… Everyone continues to milk the general hoi polloi of humanity. ”
A number of macro and micro themes emerged from the analysis. Many of the comments referred to some form of fraud, with words like “theft,” “theft,” “infringement,” “plagiarism,” “blackmail,” and “devaluation.” Another theme IV.AI noticed was the numerous requests within comment posts that used words like “consent,” “compensation,” “permission,” “protection,” and “incentive.” .
The submission also addresses what is at stake in the future of AI and copyright: what this technology means for human creativity, original creations, and their creators.
To understand the sentiment of posts, IV.AI had its AI model examine the first 500 words of each post and found that 74% of comments were identified as negative. The remaining 26% were identified as more positive, primarily because commenters expressed hope that the new regulations might help address concerns about AI and copyright. is.
Many of the comments came from artists, writers, and musicians worried about their content being scraped by AI models without their consent or compensation. Voice actors have expressed concerns about losing their jobs to AI. Fanfiction writers pointed out that they are not allowed to make money with their work, but AI models may do the same and make money from it. One of the more notable findings is that more than 400 posts came from members of the Writers Guild of America, according to IV.AI, and many of his WGA members based their work on templates provided by his WGA. They also point out that the statement appears to have been copied and pasted.
The most popular monograms identified were words such as “AI,” “work,” and “copyright.” However, looking at the trigrams, the most popular was “training AI models,” followed by other terms related to he training AI, copyright, and content. The phrase “without permission” appeared nearly 900 times, “theft” nearly 1,300 times, and “replacing human creativity” nearly 500 times.
IV.AI also identified key themes based on the most frequently used terms and word sequences. By identifying patterns and relationships between words, the company was able to extract meaningful topics from comments. For example, our analysis revealed that the terms “infringement” and “copyright” frequently appeared together, indicating that copyright infringement was an important topic in the responses. It also brings together related topics such as the use of AI in training models, whether AI-generated content is copyrightable, and issues related to AI and copyright infringement liability. I also noticed.
The most mentioned company was Google, with 183 mentions, followed by Disney (138), Adobe (95), Amazon (95), YouTube (73), Microsoft (42), and Netflix (31). followed by Instagram (30 companies). The most mentioned platform was ChatGPT, which was mentioned 319 times. Others most mentioned were Midjourney (204), Stable Diffusion (136), Photoshop (94), DALL-E (57), DeviantArt (48), Stability AI (44), and Glaze (39) . Platforms working to protect artists from AI also received dozens of mentions, including Glaze and Nightshade, which received 39 and 26 mentions.
Entries were received from hundreds of companies, ranging from technology giants to startups and content companies. Examples include Qualcomm, Meta, Yelp, Adobe, Microsoft, OpenAI, Cohere, Getty Images, Shutterstock, The New York Times, and National Public Radio. Other participants were from the Recording Academy, the Motion Picture Association of America, and various publishers. Brands such as The Knot, NFL, and Duolingo also applied.
What does NLP reveal about AI-related copyright litigation?
Another thing that IV.AI analyzed are AI and copyright-related lawsuits against companies like OpenAI. We used NLP to analyze several initial complaints, including those submitted by the New York Times, Getty Images, publishers, and author groups, frequently to understand key themes such as “copyright infringement.” We have identified terms and phrases used in IV.AI also observed how certain terms, such as “Getty Images” and “Microsoft,” changed frequency depending on the context of the document. This analysis helped pinpoint the importance of common topics and different terminology in legal discussions regarding AI technologies, and provided insight into areas of concern or interest in these cases.
Other AI companies are also using their own AI models to identify AI-generated content and go after publishers who are trying to block unauthorized scanning of their content by AI crawlers. Another startup, Originality.AI, has created a dashboard that tracks the number of top websites that have blocked AI web crawlers from various AI companies. Among the top 1,000 websites by traffic, 34% block OpenAI's GPTBot, 19% block Google's Google-Extended, 11% block nonprofit Common Extended, and just 5% block Anthropic. website has been blocked.
It is also important to note which websites have blocked or allowed different crawlers. For example, YouTube allows all four, while Facebook and Instagram block OpenAI and Google's. Amazon, on the other hand, blocks OpenAI and Common Crawler, but allows those from Anthropic and Google.
“Google Extended is really interesting,” said John Gillham, founder and CEO of Originality.AI. “Why is he one-third less likely to be blocked than GPTBot? Google is using its potential monopoly power in search to gain an unfair advantage and the emerging field of AI Is that so?”
Another AI startup, Patronus AI, has built a tool called Copyright Catcher that detects the potential for different LLMs to create copyrighted content. Last month, the startup's initial results showed that OpenAI's GPT-4 generated copyrighted content on 44% of prompts, MIstral AI on 22%, Anthropic on 8%, and Llama 2 It was found that only 10% of the Patronus co-founder Anand Kanappan said companies that accidentally output copyrighted content are still putting their brands and company reputations at risk.
He said: “Many companies still feel very uncomfortable because they don't know where the responsibility actually lies, who is at risk, who is responsible.” said. “…if you are a user of the underlying model and you end up accidentally outputting copyrighted content, it will still endanger your brand or put your company’s reputation at risk. So even if it's not a legal issue, there are also other types of issues that most companies don't want to get involved with. [in]”