At least 10% of research may already be co-authored by AI

Metaverse

At least 10% of research may already be co-authored by AI – Crypto News

Published

2 years ago

September 23, 2024

Dripp

It is a question ever more readers of scientific papers are asking. Large language models (LLMs) are now more than good enough to help write a scientific paper. They can breathe life into dense scientific prose and speed up the drafting process, especially for non-native English speakers. Such use also comes with risks: LLMs are particularly susceptible to reproducing biases, for example, and can churn out vast amounts of plausible nonsense. Just how widespread an issue this was, though, has been unclear.

In a preprint posted recently on arXiv, researchers based at the University of Tübingen in Germany and Northwestern University in America provide some clarity. Their research, which has not yet been peer-reviewed, suggests that at least one in ten new scientific papers contains material produced by an LLM. That means over 100,000 such papers will be published this year alone. And that is a lower bound. In some fields, such as computer science, over 20% of research abstracts are estimated to contain LLM-generated text. Among papers from Chinese computer scientists, the figure is one in three.

Spotting LLM-generated text is not easy. Researchers have typically relied on one of two methods: detection algorithms trained to identify the tell-tale rhythms of human prose, and a more straightforward hunt for suspicious words disproportionately favoured by LLMs, such as “pivotal” or “realm”. Both approaches rely on “ground truth” data: one pile of texts written by humans and one written by machines. These are surprisingly hard to collect: both human- and machine-generated text change over time, as languages evolve and models update. Moreover, researchers typically collect LLM text by prompting these models themselves, and the way they do so may be different from how scientists behave.

View Full Image

…

The latest research by Dmitry Kobak, at the University of Tübingen, and his colleagues, shows a third way, bypassing the need for ground-truth data altogether. The team’s method is inspired by demographic work on excess deaths, which allows mortality associated with an event to be ascertained by looking at differences between expected and observed death counts. Just as the excess-deaths method looks for abnormal death rates, their excess-vocabulary method looks for abnormal word use. Specifically, the researchers were looking for words that appeared in scientific abstracts with a significantly greater frequency than predicted by that in the existing literature (see chart 1). The corpus which they chose to analyse consisted of the abstracts of virtually all English-language papers available on PubMed, a search engine for biomedical research, published between January 2010 and March 2024, some 14.2m in all.

The researchers found that in most years, word usage was relatively stable: in no year from 2013-19 did a word increase in frequency beyond expectation by more than 1%. That changed in 2020, when “SARS”, “coronavirus”, “pandemic”, “disease”, “patients” and “severe” all exploded. (Covid-related words continued to merit abnormally high usage until 2022.)

By early 2024, about a year after LLMs like ChatGPT had become widely available, a different set of words took off. Of the 774 words whose use increased significantly between 2013 and 2024, 329 took off in the first three months of 2024. Fully 280 of these were related to style, rather than subject matter. Notable examples include: “delves”, “potential”, “intricate”, “meticulously”, “crucial”, “significant”, and “insights” (see chart 2).

The most likely reason for such increases, say the researchers, is help from LLMs. When they estimated the share of abstracts which used at least one of the excess words (omitting words which are widely used anyway), they found that at least 10% probably had LLM input. As PubMed indexes about 1.5m papers annually, that would mean that more than 150,000 papers per year are currently written with LLM assistance.

This seems to be more widespread in some fields than others. The researchers’ found that computer science had the most use, at over 20%, whereas ecology had the least, with a lower bound below 5%. There was also variation by geography: scientists from Taiwan, South Korea, Indonesia and China were the most frequent users, and those from Britain and New Zealand used them least (see chart 3). (Researchers from other English-speaking countries also deployed LLMs infrequently.) Different journals also yielded different results. Those in the Nature family, as well as other prestigious publications like Science and Cell, appear to have a low LLM-assistance rate (below 10%), while Sensors (a journal about, unimaginatively, sensors), exceeded 24%.

The excess-vocabulary method’s results are roughly consistent with those from older detection algorithms, which looked at smaller samples from more limited sources. For instance, in a preprint released in April 2024, a team at Stanford found that 17.5% of sentences in computer-science abstracts were likely to be LLM-generated. They also found a lower prevalence in Nature publications and mathematics papers (LLMs are terrible at maths). The excess vocabulary identified also fits with existing lists of suspicious words.

Such results should not be overly surprising. Researchers routinely acknowledge the use of LLMs to write papers. In one survey of 1,600 researchers conducted in September 2023, over 25% told Nature they used LLMs to write manuscripts. The largest benefit identified by the interviewees, many of whom studied or used AI in their own work, was to help with editing and translation for those who did not have English as their first language. Faster and easier coding came joint second, together with the simplification of administrative tasks; summarising or trawling the scientific literature; and, tellingly, speeding up the writing of research manuscripts.

For all these benefits, using LLMs to write manuscripts is not without risks. Scientific papers rely on the precise communication of uncertainty, for example, which is an area where the capabilities of LLMs remain murky. Hallucination—whereby LLMs confidently assert fantasies—remains common, as does a tendency to regurgitate other people’s words, verbatim and without attribution.

Studies also indicate that LLMs preferentially cite other papers that are highly cited in a field, potentially reinforcing existing biases and limiting creativity. As algorithms, they can also not be listed as authors on a paper or held accountable for the errors they introduce. Perhaps most worrying, the speed at which LLMs can churn out prose risks flooding the scientific world with low-quality publications.

Academic policies on LLM use are in flux. Some journals ban it outright. Others have changed their minds. Up until November 2023, Science labelled all LLM text as plagiarism, saying: “Ultimately the product must come from—and be expressed by—the wonderful computers in our heads.” They have since amended their policy: LLM text is now permitted if detailed notes on how they were used are provided in the method section of papers, as well as in accompanying cover letters. Nature and Cell also allow its use, as long as it is acknowledged clearly.

How enforceable such policies will be is not clear. For now, no reliable method exists to flush out LLM prose. Even the excess-vocabulary method, though useful at spotting large-scale trends, cannot tell if a specific abstract had LLM input. And researchers need only avoid certain words to evade detection altogether. As the new preprint puts it, these are challenges that must be meticulously delved into.

Up Next

Mint Primer | Strawberry: Can it unlock AI’s reasoning power? – Crypto News

Don't Miss

The information wars are about to get worse, Yuval Harari argues – Crypto News

Click to comment

Leave a Reply
Cancel reply

others1 week ago

CoinGape Announces Winners of the Web3 Innovation Awards 2026 – Crypto News

others1 week ago

CoinGape Announces Winners of the Web3 Innovation Awards 2026 – Crypto News

Will iPhone 17 Pro Max cost nearly ₹2 lakh in India? Leak suggests so

Technology1 week ago

Will iPhone 17 Pro Max cost nearly ₹2 lakh in India? Leak suggests so – Crypto News

others1 week ago

CoinGape Announces Winners of the Web3 Innovation Awards 2026 – Crypto News

Michael Saylor's Strategy Boosts US Dollar Reserves, Unveils 'Bitcoin Monetization Program'

others1 week ago

Michael Saylor’s Strategy Boosts US Dollar Reserves, Unveils ‘Bitcoin Monetization Program’ – Crypto News

others1 week ago

CoinGape Announces Winners of the Web3 Innovation Awards 2026 – Crypto News

Oppo Reno 16, Reno 16c launched in India: Check price, specs and more

Technology7 days ago

Oppo Reno 16, Reno 16c launched in India: Check price, specs and more – Crypto News

Technology7 days ago

Oppo Reno 16, Reno 16c launched in India: Check price, specs and more – Crypto News

Bybit Card Launches in Peru: Seamless Spending with Up to 120 USDT in Rewards

others7 days ago

Bybit Card Launches in Peru: Seamless Spending with Up to 120 USDT in Rewards – Crypto News

Pump.fun's PUMP Buybacks Top $400M as Token Stays Flat

De-fi1 week ago

Pump.fun’s PUMP Buybacks Top $400M as Token Stays Flat – Crypto News

Business1 week ago

SpaceX Stock in Focus as Citadel Securities Flags Major AI Risk – Crypto News

others1 week ago

Ripple Is “Planting Seeds” For Global XRP Adoption After CLARITY Act, Says Expert – Crypto News

Cryptocurrency1 week ago

Circle CEO says Open USD must break USDC’s network effect before its 140 backers matter – Crypto News

USDC And Bitcoin Lead $850 Million Exchange Outflow Wave

Blockchain1 week ago

USDC And Bitcoin Lead $850 Million Exchange Outflow Wave – Crypto News

Technology1 week ago

Cathie Wood’s ARK Buys $17.8M In Circle Stock As Price Crashes 13% – Crypto News

Technology7 days ago

Oppo Reno 16, Reno 16c launched in India: Check price, specs and more – Crypto News

Business7 days ago

XRP Ledger Hit By Fake OUSD Stablecoin Scam, Community Issues Alert – Crypto News

Cryptocurrency6 days ago

Venice’s $65M raise makes VVV holders ask how much of Venice’s growth reaches the token – Crypto News

JPMorgan's Kinexys Blockchain Hits $4 Trillion, Adds Five APAC Currencies

De-fi1 week ago

JPMorgan’s Kinexys Blockchain Hits $4 Trillion, Adds Five APAC Currencies – Crypto News

Business1 week ago

Breaking: Coinbase Announces New Partnership To Boost EU Stablecoin Payments After MiCA License – Crypto News

Business1 week ago

Breaking: Coinbase Announces New Partnership To Boost EU Stablecoin Payments After MiCA License – Crypto News

others1 week ago

Sharplink Buys 10K ETH as Ethereum Closes Three Straight Quarters in Red – Crypto News

When Will Bitcoin and Crypto Winter End? Fidelity Details Five Historical Catalysts

others1 week ago

When Will Bitcoin and Crypto Winter End? Fidelity Details Five Historical Catalysts – Crypto News

others1 week ago

Bitcoin Price Prediction as Oil Climbs Above $70 Amid Doubts Over US-Iran Qatar Talks – Crypto News

Trump Discloses Over $1.2 Billion in Crypto Earnings, $50M in Bitcoin Holdings

Cryptocurrency1 week ago

Trump Discloses Over $1.2 Billion in Crypto Earnings, $50M in Bitcoin Holdings – Crypto News

Business1 week ago

Breaking: Citigroup Cuts Bitcoin and Ethereum Price Targets – Crypto News

Business1 week ago

Breaking: Citigroup Cuts Bitcoin and Ethereum Price Targets – Crypto News

Crypto Strategist Details Bullish Scenario for Solana, Says SOL In ‘Absolutely Massive Spot’ – Here’s His Outlook

others1 week ago

Crypto Strategist Details Bullish Scenario for Solana, Says SOL In ‘Absolutely Massive Spot’ – Here’s His Outlook – Crypto News

Cryptocurrency7 days ago

France’s crypto kidnapping surge exposes the personal data trail behind wrench attacks – Crypto News

Cryptocurrency7 days ago

How tokenized stocks fail as collateral even when the stock price does not move – Crypto News

Vitalik Buterin-Linked Address Moves 7,000 ETH to Fresh Wall

Blockchain7 days ago

US Accounts for 96% of Global Bitcoin ATM Reductions in First Half of 2026 – Crypto News

others1 week ago

Ripple CEO Brad Garlinghouse Blames Michael Saylor’s Strategy for Crypto Market Slump – Crypto News

Business1 week ago

Breaking: Coinbase Announces New Partnership To Boost EU Stablecoin Payments After MiCA License – Crypto News

De-fi1 week ago

Magic Eden, Founders Sued by $ME Buyers Over Broken ‘Utility’ Promises – Crypto News

Technology1 week ago

Why the Crypto.com Card Is Winning the Crypto Card Race – Crypto News

Lighter to Burn Repurchased LIT, Fund Staking from Ecosystem Reserve

De-fi1 week ago

Lighter to Burn Repurchased LIT, Fund Staking from Ecosystem Reserve – Crypto News

others1 week ago

What’s to Expect for BTC, ETH, XRP, and BNB Prices Ahead of EU MiCA’s Tomorrow Deadline? – Crypto News

Business1 week ago

Breaking: Citigroup Cuts Bitcoin and Ethereum Price Targets – Crypto News

Technology1 week ago

Breaking: Citigroup Cuts Bitcoin and Ethereum Price Targets – Crypto News

Cryptocurrency1 week ago

Taiwan’s new crypto law gives banks the first real stablecoin advantage – Crypto News

Business1 week ago

CoinGape Announces Winners of the Web3 Innovation Awards 2026 – Crypto News

Billion-Dollar Lender Suffers Data Breach, Warns 'Unauthorized Threat Actor' Launched Ransomware Attack

others1 week ago

Billion-Dollar Lender Suffers Data Breach, Warns ‘Unauthorized Threat Actor’ Launched Ransomware Attack – Crypto News

Business7 days ago

Gold and Silver Price Prediction as US Nonfarm Payrolls Rise Again – Crypto News

Cryptocurrency7 days ago

Tether freezes 134 ISIS terror wallets as stablecoins now sit inside the sanctions machine – Crypto News

Blockchain7 days ago

Is Bitcoin Heading for $65K? Sharplink Buys $16M ETH. Market Moves. – Crypto News

Cryptocurrency7 days ago

Mystery owner challenges the $200B ‘lost’ Satoshi Bitcoin claim in New York court – Crypto News

Business6 days ago

Cantor Delays Adam Back’s $4B Bitcoin Treasury SPAC Merger Again – Crypto News

WhatsApp usernames: Will they make India's digital conversations safer or less trustworthy? Experts weigh in

Technology6 days ago

WhatsApp usernames: Will they make India’s digital conversations safer or less trustworthy? Experts weigh in – Crypto News

Technology6 days ago

Sen. Gillibrand Says Crypto Bills Need a Strict Ban On Members Issuing Memecoins – Crypto News

Blockchain6 days ago

MEXC Lists Ondo Yield Asset As Tokenized Treasury Demand Grows – Crypto News

Crypto News

At least 10% of research may already be co-authored by AI – Crypto News

Metaverse

At least 10% of research may already be co-authored by AI – Crypto News

You may like

Leave a Reply Cancel reply

Leave a Reply

Trending

Leave a Reply
Cancel reply