Metaverse
Generative AI is a marvel. Is it also built on theft? – Crypto News
Generative artificial intelligence (AI) has caused a creative explosion of new writing, music, images and video. The internet is alive with AI-made content, while markets fizz with AI-inspired investment. OpenAI, which makes perhaps the most advanced generative-AI models, is valued at nearly $90bn; Microsoft, its partner, has become the world’s most valuable company, with a market capitalisation of $3.2trn.
But some wonder how creative the technology really is—and whether those cashing in have fairly compensated those on whose work the models were trained. ChatGPT, made by OpenAI, can be coaxed into regurgitating long newspaper articles that it appears to have memorised. Claude, a chatbot made by Anthropic, can be made to repeat lyrics from well-known songs. Stable Diffusion, made by Stability AI, reproduces features of others’ images, including the watermark of Getty, on whose archive it was trained.
To those who hold the rights to these creative works, generative AI is an outrage—and perhaps an opportunity. A frenzy of litigation and dealmaking is under way, as rights-holders angle for compensation for providing the fuel on which the machines of the future are run. For the AI model-makers it is an anxious period, notes Dan Hunter, a professor of law at King’s College London. “They have created an amazing edifice that’s built on a foundation of sand.”
The sincerest form of flattery
AIs are trained on vast quantities of human-made work, from novels to photos and songs. These training data are broken down into “tokens”—numerical representations of bits of text, image or sound—and the model learns by trial and error how tokens are normally combined. Following a prompt from a user, a trained model can then make creations of its own. More and better training data means better outputs.
Many AI companies have become cagey about what data their models are trained on, citing competitive confidentiality (and, their detractors suspect, fear of legal action). But it is widely acknowledged that, at least in their early stages, many hoovered up data that was subject to copyright. OpenAI’s past disclosures show that its GPT-3 model was trained on sources including the Common Crawl, a scraping of the open internet which includes masses of copyrighted data. Most of its rivals are thought to have taken a similar approach.
The tech firms argue that there is nothing wrong with using others’ data simply to train their models. Absorbing copyrighted works and then creating original ones is, after all, what humans do. Those who own the rights say there is a difference. “I’ve ingested all this incredible music and then I create from it,” says Harvey Mason Jr, a songwriter and chief executive of the Recording Academy, which represents musicians. “But the difference is, I’m a human, and as a human, I want to protect humans…I have no problem with a little bit of a double standard.” Roger Lynch, chief executive of Condé Nast, which owns titles such as Vogue and the New Yorker, told a Senate hearing in January that today’s generative-AI tools were “built with stolen goods”. AI companies “are spending literally billions of dollars on computer chips and energy, but they’re unwilling to put a similar investment into content”, complains Craig Peters, chief executive of Getty.
Media companies were badly burned by an earlier era of the internet. Publishers’ advertising revenue drained away to search engines and social networks, while record companies’ music was illegally shared on applications like Napster. The content-makers are determined not to be caught out again. Publishers (including The Economist) are blocking AI companies’ automated “crawlers” from scraping words from their websites: nearly half of the most popular news websites block OpenAI’s bots, according to a ten-country survey by Oxford University’s Reuters Institute in February. Record companies have told music-streaming services to stop AI companies from scraping their tunes. There is widespread irritation that tech firms are again seeking forgiveness rather than permission. “A $90bn valuation pays for a lot of lawyering,” says Mr Hunter. “That’s the business plan.”
The lawyering is now happening. The biggest rights-holders in various creative industries are leading the charge. The New York Times, the world’s largest newspaper by number of subscribers, is suing OpenAI and Microsoft for infringing the copyright of 3m of its articles. Universal Music Group, the largest record company, is suing Anthropic for using its song lyrics without permission. Getty, one of the biggest image libraries, is suing Stability AI for copying its images (as well as misusing its trademark). All four tech firms deny wrongdoing.
In America the tech companies are relying on the legal concept of fair use, which provides broad exemptions from the country’s otherwise-ferocious copyright laws. They have an encouraging precedent in the form of a ruling on Google Books in 2015. Then, the Authors Guild sued the search company for scanning copyrighted books without permission. But a court found that Google’s use of the material—making books searchable, but showing only small extracts—was sufficiently “transformative” to be considered fair use. Generative-AI firms argue that their use of the copyrighted material is similarly transformative. Rights-holders, meanwhile, are pinning their hopes on a Supreme Court judgment last year which tightened the definition of transformativeness, with its ruling that a series of artworks by Andy Warhol, which had altered a copyrighted photograph of Prince, a pop star, were insufficiently transformative to constitute fair use.
Not all media types enjoy equal protection. Copyright law covers creative expression, rather than ideas or information. This means that computer code, for example, is only thinly protected, since it is mostly functional rather than expressive, says Matthew Sag, who teaches law at Emory University in Atlanta. (A group of programmers are aiming to test this idea in court, claiming that Microsoft’s GitHub Copilot and OpenAI’s CodexComputer infringed their copyright by training on their work.) News can be tricky to protect for the same reason: the information within a scoop cannot itself be copyrighted. Newspapers in America were not covered by copyright at all until 1909, notes Jeff Jarvis, a journalist and author. Before then, many employed a “scissors editor” to literally cut and paste from rival titles.
At the other end of the spectrum, image-rights holders are better protected. AI models struggle to avoid learning how to draw copyrightable characters—the “Snoopy problem”, as Mr Sag calls it, referring to the cartoon beagle. Model-makers can try to stop their AIs drawing infringing images by blocking certain prompts, but they often fail. At The Economist’s prompting, Microsoft’s image creator, based on OpenAI’s Dall-E, happily drew images of “Captain America smoking a Marlboro” and “The Little Mermaid drinking Guinness”, despite lacking express permission from the brands in question. (Artists and organisations can report any concerns via an online form, says a Microsoft spokesman.) Musicians are also on relatively strong ground: music copyright in America is strictly enforced, with artists requiring licences even for short samples. Perhaps for this reason, many AI companies have been cautious in releasing their music-making models.
Outside America, the legal climate is mostly harsher for tech firms. The European Union, home to Mistral, a hot French AI company, has a limited copyright exception for data-mining, but no broad fair-use defence. Much the same is true in Britain, where Getty has brought its case against Stability AI, which is based in London (and had hoped to fight the lawsuit in America). Some jurisdictions offer safer havens. Israel and Japan, for instance, have copyright laws that are friendly for AI training. Tech companies hint at the potential threat to American business, should the country’s courts take a tough line. OpenAI says of its dispute with the New York Times that its use of copyrighted training data is “critical for US competitiveness”.
Rights-holders bridle at the notion that America should lower its protections to the level of other jurisdictions just to keep the tech business around. One describes it as unAmerican. But it is one reason why the big cases may end up being decided in favour of the AI companies. Courts may rule that models should not have trained on certain data, or that they committed too much to memory, says Mr Sag. “But I don’t believe any US court is going to reject the big fair-use argument. Partly because I think it’s a good argument. And partly because, if they do, we’re just sending a great American industry to Israel or Japan or the EU.”
Copyrights, copywrongs
While the lawyers sharpen their arguments, deals are being done. In some cases, suing is being used as leverage. “Lawsuits are negotiation by other means,” admits a party to one case. Even once trained, AIs need ongoing access to human-made content to stay up-to-date, and some rights-holders have done deals to keep them supplied with fresh material. OpenAI says it has sealed about a dozen licensing deals, with “many more” in the works. Partners so far include the Associated Press, Axel Springer (owner of Bild and Politico), Le Monde and Spain’s Prisa Media.
Rupert Murdoch’s News Corp, which owns the Wall Street Journal and Sun among other titles, said in February that it was in “advanced negotiations” with unnamed tech firms. “Courtship is preferable to courtrooms—we are wooing, not suing,” said its chief executive, Robert Thompson, who praised Sam Altman, OpenAI’s boss. Shutterstock, a photo library, has licensed its archive to both OpenAI and Meta, the social-media empire that is pouring resources into AI. Reddit and Tumblr, online forums, are reportedly licensing their content to AI firms as well. (The Economist Group, our parent company, has not taken a public position on whether it will license our work.)
Most rights-holders are privately pessimistic. A survey of media executives in 56 countries by the Reuters Institute found that 48% expected there to be “very little” money from AI licensing deals. Even the biggest publishers have not made a fortune. Axel Springer, which reported revenue of €3.9bn ($4.1bn) in 2022, will reportedly earn “tens of millions of euros” from its three-year deal with OpenAI. “There is not a big licensing opportunity. I don’t think the aim of [the AI models] is to provide alternatives to news,” says Alice Enders of Enders Analysis, a media-research firm. The licensing deals on offer are “anaemic”, says Mr Peters of Getty. “When companies are…saying, ‘We don’t need to license this content, we have full rights to scrape it,’ I think it definitely diminishes their motivations to come together and negotiate fair economics.”
Some owners of copyrighted material are therefore going it alone. Getty last year launched its own generative AI, in partnership with Nvidia, a chipmaker. Getty’s image-maker has been trained only on Getty’s own library, making it “commercially safe” and “worry-free”, the company promises. It plans to launch an AI video-maker this year, powered by Nvidia and Runway, another AI firm. As well as removing copyright risk, Getty has weeded out anything else that could get its customers into trouble with IP lawyers: brands, personalities and many less obvious things, from tattoo designs to firework displays. Only a small percentage of Getty’s subscribers have tried out the tools so far, the firm admits. But Mr Peters hopes that recurring revenue from the service will eventually exceed the “one-time royalty windfall” of a licensing deal.
A number of news publishers have reached a similar conclusion. Bloomberg said last year that it had trained an AI on its proprietary data and text. Schibsted, a big Norwegian publisher, is leading an effort to create a Norwegian-language model, using its content and that of other media companies. Others have set up chatbots. Last month the Financial Times unveiled Ask FT, which lets readers interrogate the paper’s archive. The San Francisco Chronicle’s Chowbot, launched in February, lets readers seek out the city’s best tacos or clam chowder, based on the paper’s restaurant reviews. The BBC said last month that it was exploring developing AI tools around its 100-year archive “in partnership or unilaterally”. Most big publications, including The Economist, are experimenting behind the scenes.
It is too early to say if audiences will take to such formats. Specialised AI tools may also find it hard to compete with the best generalist ones. OpenAI’s ChatGPT outperforms Bloomberg’s AI even on finance-specific tasks, according to a paper last year by researchers at Queen’s University, in Canada, and JPMorgan Chase, a bank. But licensing content to tech firms has its own risks, points out James Grimmelmann of Cornell University. Rights-holders “have to be thinking very hard about the degree to which this is being used to train their replacements”.
The new questions raised by AI may lead to new laws. “We’re stretching current laws about as far as they can go to adapt to this,” says Mr Grimmelmann. Tennessee last month passed the Ensuring Likeness Voice and Image Security (ELVIS) Act, banning unauthorised deepfakes in the state. But Congress seems more likely to let the courts sort it out. Some European politicians want to tighten up the law in favour of rights-holders; the eu’s directive on digital copyright was passed in 2019, when generative AI was not a thing. “There is no way the Europeans would pass [such a directive] today,” says Mr Sag.
Another question is whether copyright will extend to AI-made content. So far judges have been of the view that works created by AI are not themselves copyrightable. In August an American federal court ruled that “human authorship is a bedrock requirement of copyright”, dismissing a request by a computer scientist to copyright a work of art he had created using AI. This may change as AIs create a growing share of the world’s content. It took several decades of photography for courts to recognise that the person who took a picture could claim copyright over the image.
The current moment recalls a different legal case earlier this century. A wildlife photographer tried to claim copyright over photographs that macaque monkeys had taken of themselves, using a camera he had set up in an Indonesian jungle. A judge ruled that because the claimant had not taken the photos himself, no one owned the copyright. (A petition by an animal-rights group to grant the right to the monkeys was dismissed.) Generative AI promises to fill the world with content that lacks a human author, and therefore has no copyright protection, says Mr Hunter of King’s College. “We’re about to move into the infinite-monkey-selfie era.”
© 2024, The Economist Newspaper Limited. All rights reserved.
From The Economist, published under licence. The original content can be found on www.economist.com
-
others1 week ago
JPY soft and underperforming G10 in quiet trade – Scotiabank – Crypto News
-
Technology7 days agoSam Altman says OpenAI is developing a ‘legitimate AI researcher’ by 2028 that can discover new science on its own – Crypto News
-
Blockchain1 week agoXRP Price Gains Traction — Buyers Pile In Ahead Of Key Technical Breakout – Crypto News
-
De-fi1 week agoNearly Half of US Retail Crypto Holders Haven’t Earned Yield: MoreMarkets – Crypto News
-
De-fi6 days agoBittensor Rallies Ahead of First TAO Halving – Crypto News
-
Technology1 week agoMicrosoft ‘tricked users into pricier AI-linked 365 plans,’ says Australian watchdog; files lawsuit – Crypto News
-
others1 week ago
GBP/USD holds steady after UK data, US inflation fuels rate cut bets – Crypto News
-
De-fi1 week agoAI Sector Rebounds as Agent Payment Systems Gain Traction – Crypto News
-
Blockchain1 week agoKyrgyzstan Launches Stablecoin While Confirming Future CBDC – Crypto News
-
Blockchain1 week agoBig Iran Bank Goes Bankrupt, Affecting 42 Million Customers – Crypto News
-
Business1 week ago
Crypto Market Rally: BTC, ETH, SOL, DOGE Jump 3-7% as US China Trade Talks Progress – Crypto News
-
Cryptocurrency1 week agoCrypto wrap: Bitcoin, Ethereum, BNB, Solana, and XRP muted after CPI report – Crypto News
-
Cryptocurrency1 week agoBitcoin Accumulation Patterns Show Late-Stage Cycle Maturity, Not Definite End: CryptoQuant – Crypto News
-
Technology1 week ago
Ethereum Supercycle Strengthens as SharpLink Gaming Withdraws $78.3M in ETH – Crypto News
-
Blockchain1 week agoIBM Set to Launch Platform for Managing Digital Assets – Crypto News
-
others1 week agoGBP/USD floats around 1.3320 as softer US CPI reinforces Fed cut bets – Crypto News
-
others1 week ago
USD/JPY extends gains as strong US PMI offsets softer CPI data – Crypto News
-
Blockchain1 week agoEthereum Rebounds From Bull Market Support: Can It Conquer The ‘Golden Pocket’ Next? – Crypto News
-
Cryptocurrency1 week agoWestern Union eyes stablecoin rails in pursuit of a ‘super app’ vision – Crypto News
-
De-fi1 week agoNearly Half of US Retail Crypto Holders Haven’t Earned Yield: MoreMarkets – Crypto News
-
Blockchain1 week agoBinance Stablecoin Outflow On A Steady Rise — What This Means For The Market – Crypto News
-
others1 week ago
Indian Court Declares XRP as Property in WazirX Hack Case – Crypto News
-
Blockchain1 week agoSolana Eyes $210 Before Its Next Major Move—Uptrend Or Fakeout Ahead? – Crypto News
-
De-fi1 week agoHYPE Jumps 10% as Robinhood Announces Spot Listing – Crypto News
-
others1 week ago
Platinum price recovers from setback – Commerzbank – Crypto News
-
De-fi1 week agoREP Jumps 50% in a Week as Dev Gets Community Support for Augur Fork – Crypto News
-
Technology1 week agoMint Explainer | India’s draft AI rules and how they could affect creators, social media platforms – Crypto News
-
Technology7 days agoBenQ MA270U review: A 4K monitor that actually gets MacBook users right – Crypto News
-
Blockchain1 week agoEntire Startup Lifecycle to Move Onchain – Crypto News
-
De-fi1 week agoNearly Half of US Retail Crypto Holders Haven’t Earned Yield: MoreMarkets – Crypto News
-
Blockchain1 week agoXRP/BTC Retests 6-Year Breakout Trendline, Analyst Calls For Decoupling – Crypto News
-
Cryptocurrency1 week agoUSDJPY Forecast: The Dollar’s Winning Streak Why New Highs Could Be At Hand – Crypto News
-
others1 week ago
Is Changpeng “CZ” Zhao Returning To Binance? Probably Not – Crypto News
-
De-fi1 week agoTokenized Nasdaq Futures Enter Top 10 by Volume on Hyperliquid – Crypto News
-
Cryptocurrency1 week agoNEAR’s inflation reduction vote fails pass threshold, but it may still be implemented – Crypto News
-
Technology1 week agoSurvival instinct? New study says some leading AI models won’t let themselves be shut down – Crypto News
-
De-fi1 week agoMetaMask Fuels Airdrop Buzz With Token Claim Domain Registration – Crypto News
-
Cryptocurrency1 week agoGold Price Forecast 2025, 2030, 2040 & Investment Outlook – Crypto News
-
Cryptocurrency1 week agoCitigroup and Coinbase partner to expand digital-asset payment capabilities – Crypto News
-
Cryptocurrency7 days agoInside Bitwise’s milestone solana ETF launch – Crypto News
-
Cryptocurrency7 days agoWhy Is Pi Network’s (PI) Price Up by Double Digits Today? – Crypto News
-
others7 days ago
Can ASTER Price Rebound 50% as Whale Activity and Bullish Pattern Align? – Crypto News
-
De-fi6 days agoBitcoin Dips Under $110,000 After Fed Cuts Rates – Crypto News
-
Technology1 week ago
James Wynn Takes XRP Long Bet After Ripple Prime Announcement – Crypto News
-
others1 week agoGold weakens as US-China trade optimism lifts risk sentiment, focus turns to Fed – Crypto News
-
Business1 week ago
Crypto ETFs Attract $1B in Fresh Capital Ahead of Expected Fed Rate Cut This Week – Crypto News
-
De-fi1 week agoCRO Jumps After Trump’s Truth Social Announces Prediction Market Partnership with Crypto.Com – Crypto News
-
Cryptocurrency1 week agoKERNEL price goes vertical on Upbit listing, hits $0.23 – Crypto News
-
Technology1 week ago
Breaking: $2.6B Western Union Announces Plans for Solana-Powered Stablecoin by 2026 – Crypto News
-
Blockchain7 days agoVisa To Support Four Stablecoins on Four Blockchains – Crypto News
