How to use licensed models for AI training data for Generated AI and beyond

Licensing will become more common between generative AI developers and rights-holding content companies. This would be the case even if AI companies wipe out the large number of pending copyright cases currently pending in U.S. courts.

First, there is no guarantee that all applications of genetic AI systems will be considered legal under U.S. copyright law.

In many U.S. AI lawsuits alleging copyright infringement, generative AI companies have argued that using copyrighted works to train models falls under the fair use exception in copyright law. The argument relies heavily on a 2015 court ruling that Google's digitization of millions of books was fair use.

However, fair use determinations are fact-specific and require courts to apply a four-factor test that considers the intended use and market impact. While Google's digitization of books for the purpose of displaying short quotations in response to search queries satisfied the application of the Second Circuit's fair use test, all AI model use cases are equally fair. It may not have been approved by a court as worthy of youth.

It is clear that courts will take time to resolve these cases, but that may not be fast enough for artificial intelligence to gain the trust and adoption of businesses.

It remains unclear where U.S. courts will ultimately land in the ongoing generative AI litigation. What's more predictable is that it will take some time for the courts to make a case for that landing. The Google Books fair use case took him more than a decade. Moreover, these decisions create laws that are only for the United States, by definition, and do not solve any of the cross-border issues swirling around artificial intelligence.

Customers of generative AI software want assurance that they can leverage AI models without incurring legal liability. Approved training datasets with documented and traceable provenance stand out as the quickest way to provide that assurance.

Developers' vested interest in fostering customer trust will therefore play a pivotal role in the upcoming transition to licensing third-party training data and paying corresponding license fees.

Potential Gen AI license structure
There are likely several commonly used models for generative AI licenses. So far, both direct licenses through one-on-one negotiations and licenses through commercial aggregators are already in operation. The direct licensing model will definitely remain and coexist with additional models where services are initiated.

Meanwhile, the U.S. generative AI market does not yet have a licensing model for collective, mandatory, or judicial resolution. However, one or more of these may materialize as a result of court proceedings or regulatory review.

The ability to leverage output without incurring legal liability remains a concern for generative AI customers. Ensemble or forced models may only deal with training data. In contrast, both forms of direct licensing (and some degree of judicial settlement) provide AI companies with an easier opportunity to negotiate with rights holders and provide greater protection for users of AI models who want to leverage the output produced. to ensure rights holder-approved guidelines.

Depending on your position, potential goals for the gen AI licensing model for training data include one or more of the following features:

Maximize your pool of high-quality training data: A compulsory licensing model that requires legally mandated licensing terms and fees is most likely to allow AI developers to take full advantage of the training materials available to them. Compulsory licenses allow the use of copyrighted material without the explicit owner's consent and do not provide the ability for the owner to opt out, making even the least popular model among rights owners unpopular. there is. Compulsory licenses will likely address specific content areas rather than covering all types of content.

Opt-out licenses generate a larger pool than opt-in licenses. However, both pose educational challenges in making all eligible rights holders aware of the license and encouraging them to take the necessary steps for participation and compensation.

Ensure training data is accessible to both small and large AI developers: This is largely driven by license fees and, to some extent, the transparency of license terms. In an industry characterized by high fees and licensing agreements with undisclosed terms, securing training data deals is even more difficult for AI developers with limited funding and limited market intelligence to negotiate. It will be.

One-to-one direct licensing lacks transparency and can act as a barrier to entry by limiting purchasing opportunities for economically disadvantaged developers. In contrast, collective and compulsory licensing models offer more balance in pricing, transparency, and fair treatment for different classes of both AI companies and rights holders. Masu.

Direct licenses through commercial aggregators fall into the intermediate range. When compared solely to one-to-one direct licensing, aggregation democratizes purchasing power by increasing transparency and streamlining administrative requirements while ideally lowering license fees to a price that is fair to rights holders. may become.

Ensuring fair license fee compensation for both small and large rights holders: The direct licensing model offers large rights holders the best opportunity to maximize licensing revenue. The downside to one-on-one negotiations and, to a lesser extent, direct licensing through commercial aggregators, is that negotiating a large number of contracts increases the administrative burden on AI companies. Due to the desire to minimize administrative burden, direct licensing models can prevent full participation by small rights holders.

Judicial, collective, and coercive models have the potential to more equally distribute licensing revenues and opportunities. However, small rights holders often believe that their collective license compensation is insufficient, while rights holders as a whole often believe that compulsory license fees set by governments are too low. there is.

Minimize license management burden and costs: A 1:1 direct licensing model increases the administrative burden and results in content loss for rights holders with small catalogs. Direct licensing typically has the highest administrative burden per license agreement, while other licensing models come with registration, record-keeping, and/or reporting responsibilities, as well as consideration of fees and other terms. It involves regular court and/or regulatory proceedings.

While working towards establishing licensing standards in any form, interested parties can take the following steps:

Generative AI model developers and their investors: Companies offering commercial AI products and seeking market share should, at a minimum, consider limiting their operations to models trained on approved content with documented and traceable provenance. .

Categories of authorized training data include owned material (the parties' own original or commissioned content), licensed, public domain (in particular, not identical to publicly available material), or open source. Contains material (used in accordance with the Terms).

AI companies can also seek to develop dataset licensing relationships with rights holders while simultaneously expanding the scope of the license and output rights available to end users of their AI models. As an example, Stability AI reports that he trained the text music generator Stable Audio on 800,000 tracks, acquired through a partnership with music library AudioSparx. As part of its partnership agreement, Stability AI secured the rights for Stable Audio customers to use the generated output in certain commercial contexts.

Approved datasets are a competitive advantage when pursuing a customer base that is increasingly concerned about generative AI risks. Providing your customers with clear and guaranteed guidance on the legal use of their AI model outputs will dramatically enhance your competitive advantage. The key to these initial license agreements is to build in sufficient flexibility to appropriately respond to evolving legal, regulatory and market conditions.

Companies looking to leverage output generated by AI: Distribution of AI-generated output requires the same scrutiny and rights review that we apply to third-party images, music, footage, quotes, and other assets that we produce or incorporate into creative works. (See his November 2023 special report “Film and TV Rights Handling” on VIP+.)

If you rely on a third party for your generative AI capabilities, the provider with the sanctioned dataset is most likely to be able to provide the information you need for a reliable sanctioning review. A certification program offered by Fairly Trained, a nonprofit founded by former stability executive Ed Newton Rex, recently emerged, relying on approved training materials and maintaining proper records for artificial intelligence. Companies can now be identified by the market. Similar authentication mechanisms are sure to emerge in the future.

Rights owners seeking additional revenue by making their content available to Gen AI: Most publicized data training licensing deals involve large rights holders such as the Associated Press, Getty Images, and AudioSparx. Big AI companies want millions or even billions of data points and find it too administratively burdensome to buy small catalogs from many rights holders.

As a result, prominent licensing agreements may continue to favor rights holders with large catalogs, leaving small and medium-sized catalog owners with less attractive licensing options. Small and medium-sized rights holders may consider licensing opportunities for specialized generative AI applications. In that case, certain proprietary content libraries, even if small, may represent greater value as part of the whole and may incur higher licensing fees.

Variety VIP+ investigates Gen AI from every angle — Choose your story

Source link

What's Hot

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

How to use licensed models for AI training data for Generated AI and beyond

Unraveling UN Gaza death toll data

Grindr’s chief privacy officer on the dating app’s data controversies

Everything your parents said about posture is true.For data security

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

AI-powered SEO software market [2024-2031] Size, Trends, Sales, Revenue Forecasts HubSpot. Marketo. Oracle – Economica

AMD Ryzen AI CPU beats Intel Core Ultra in AI LLM and GenAI benchmarks, delivers lower power consumption and lower cost with XDNA

Microsoft investigates harmful AI-powered chatbot 'Copilot'

AnkerWork S600 review: An AI-powered speakerphone that actually works

Our Picks

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

Most Popular

OnlyFans creator dishes dirt on dating

Anya Taylor-Joy has big plans to rival Gwyneth Paltrow's £197m business Goop as she prepares to launch a lifestyle business

OnlyFans star suffers from online stalking by family member: 'It hurts my stomach'

Subscribe to Updates

What's Hot

How to use licensed models for AI training data for Generated AI and beyond

Related Posts