Researchers are figuring out how large language models work

Metaverse

Researchers are figuring out how large language models work – Crypto News

Published

11 months ago

September 23, 2024

Dripp

LLMs are built using a technique called deep learning, in which a network of billions of neurons, simulated in software and modelled on the structure of the human brain, is exposed to trillions of examples of something to discover inherent patterns. Trained on text strings, LLMs can hold conversations, generate text in a variety of styles, write software code, translate between languages and more besides.

Models are essentially grown, rather than designed, says Josh Batson, a researcher at Anthropic, an AI startup. Because LLMs are not explicitly programmed, nobody is entirely sure why they have such extraordinary abilities. Nor do they know why LLMs sometimes misbehave, or give wrong or made-up answers, known as “hallucinations”. LLMs really are black boxes. This is worrying, given that they and other deep-learning systems are starting to be used for all kinds of things, from offering customer support to preparing document summaries to writing software code.

It would be helpful to be able to poke around inside an LLM to see what is going on, just as it is possible, given the right tools, to do with a car engine or a microprocessor. Being able to understand a model’s inner workings in bottom-up, forensic detail is called “mechanistic interpretability”. But it is a daunting task for networks with billions of internal neurons. That has not stopped people trying, including Dr Batson and his colleagues. In a paper published in May, they explained how they have gained new insight into the workings of one of Anthropic’s LLMs.

One might think individual neurons inside an LLM would correspond to specific words. Unfortunately, things are not that simple. Instead, individual words or concepts are associated with the activation of complex patterns of neurons, and individual neurons may be activated by many different words or concepts. This problem was pointed out in earlier work by researchers at Anthropic, published in 2022. They proposed—and subsequently tried—various workarounds, achieving good results on very small language models in 2023 with a so-called “sparse autoencoder”. In their latest results they have scaled up this approach to work with Claude 3 Sonnet, a full-sized LLM.

A sparse autoencoder is, essentially, a second, smaller neural network that is trained on the activity of an LLM, looking for distinct patterns in activity when “sparse” (ie, very small) groups of its neurons fire together. Once many such patterns, known as features, have been identified, the researchers can determine which words trigger which features. The Anthropic team found individual features that corresponded to specific cities, people, animals and chemical elements, as well as higher-level concepts such as transport infrastructure, famous female tennis players, or the notion of secrecy. They performed this exercise three times, identifying 1m, 4m and, on the last go, 34m features within the Sonnet LLM.

The result is a sort of mind-map of the LLM, showing a small fraction of the concepts it has learned about from its training data. Places in the San Francisco Bay Area that are close geographically are also “close” to each other in the concept space, as are related concepts, such as diseases or emotions. “This is exciting because we have a partial conceptual map, a hazy one, of what’s happening,” says Dr Batson. “And that’s the starting point—we can enrich that map and branch out from there.”

Focus the mind

As well as seeing parts of the LLM light up, as it were, in response to specific concepts, it is also possible to change its behaviour by manipulating individual features. Anthropic tested this idea by “spiking” (ie, turning up) a feature associated with the Golden Gate Bridge. The result was a version of Claude that was obsessed with the bridge, and mentioned it at any opportunity. When asked how to spend $10, for example, it suggested paying the toll and driving over the bridge; when asked to write a love story, it made up one about a lovelorn car that could not wait to cross it.

That may sound silly, but the same principle could be used to discourage the model from talking about particular topics, such as bioweapons production. “AI safety is a major goal here,” says Dr Batson. It can also be applied to behaviours. By tuning specific features, models could be made more or less sycophantic, empathetic or deceptive. Might a feature emerge that corresponds to the tendency to hallucinate? “We didn’t find a smoking gun,” says Dr Batson. Whether hallucinations have an identifiable mechanism or signature is, he says, a “million-dollar question”. And it is one addressed, by another group of researchers, in a new paper in Nature.

Sebastian Farquhar and colleagues at the University of Oxford used a measure called “semantic entropy” to assess whether a statement from an LLM is likely to be a hallucination or not. Their technique is quite straightforward: essentially, an LLM is given the same prompt several times, and its answers are then clustered by “semantic similarity” (ie, according to their meaning). The researchers’ hunch was that the “entropy” of these answers—in other words, the degree of inconsistency—corresponds to the LLM’s uncertainty, and thus the likelihood of hallucination. If all its answers are essentially variations on a theme, they are probably not hallucinations (though they may still be incorrect).

In one example, the Oxford group asked an LLM which country is associated with fado music, and it consistently replied that fado is the national music of Portugal—which is correct, and not a hallucination. But when asked about the function of a protein called StarD10, the model gave several wildly different answers, which suggests hallucination. (The researchers prefer the term “confabulation”, a subset of hallucinations they define as “arbitrary and incorrect generations”.) Overall, this approach was able to distinguish between accurate statements and hallucinations 79% of the time; ten percentage points better than previous methods. This work is complementary, in many ways, to Anthropic’s.

Others have also been lifting the lid on LLMs: the “superalignment” team at OpenAI, maker of GPT-4 and ChatGPT, released its own paper on sparse autoencoders in June, though the team has now been dissolved after several researchers left the firm. But the OpenAI paper contained some innovative ideas, says Dr Batson. “We are really happy to see groups all over, working to understand models better,” he says. “We want everybody doing it.”

Up Next

Google introduces AI content labels for enhanced transparency online: How it works – Crypto News

Don't Miss

Piyush Goyal launches AI tool to expedite trademark clearance process – Crypto News

Click to comment

Leave a Reply
Cancel reply

ChatGPT users are mass cancelling OpenAI subscriptions after GPT-5 launch: Here's why

Technology4 days ago

ChatGPT users are mass cancelling OpenAI subscriptions after GPT-5 launch: Here’s why – Crypto News

Technology1 week ago

Binance to List Fireverse (FIR)- What You Need to Know Before August 6 – Crypto News

Altcoin Rally To Commence When These 2 Signals Activate – Details

Blockchain1 week ago

Altcoin Rally To Commence When These 2 Signals Activate – Details – Crypto News

Best computer set under ₹20000 for daily work and study needs: Top 6 affordable picks students and beginners

Technology1 week ago

Best computer set under ₹20000 for daily work and study needs: Top 6 affordable picks students and beginners – Crypto News

Cryptocurrency1 week ago

Cardano’s NIGHT Airdrop to Hit 2.2M XRP Wallets — Find Out How Much You Can Get – Crypto News

Technology1 week ago

Beyond Billboards: Why Crypto’s Future Depends on Smarter Sports Sponsorships – Crypto News

Stablecoins Are Finally Legal—Now Comes the Hard Part

Cryptocurrency1 week ago

Stablecoins Are Finally Legal—Now Comes the Hard Part – Crypto News

Cryptocurrency1 week ago

Tron Eyes 40% Surge as Whales Pile In – Crypto News

Google DeepMind CEO Demis Hassabis explains why AI could replace doctors but not nurses

Technology1 week ago

Google DeepMind CEO Demis Hassabis explains why AI could replace doctors but not nurses – Crypto News

Business1 week ago

Analyst Spots Death Cross on XRP Price as Exchange Inflows Surge – Is A Crash Ahead ? – Crypto News

TON Sinks 7.6% Despite Verb’s $558M Bid to Build First Public Toncoin Treasury Firm

De-fi1 week ago

TON Sinks 7.6% Despite Verb’s $558M Bid to Build First Public Toncoin Treasury Firm – Crypto News

Ethereum Hits Major 2025 Year Peak Despite Price Dropping to $3,500

Cryptocurrency1 week ago

Ethereum Hits Major 2025 Year Peak Despite Price Dropping to $3,500 – Crypto News

XRP Must Hold $2.65 Support Or Risk Major Breakdown – Analyst

Blockchain1 week ago

XRP Must Hold $2.65 Support Or Risk Major Breakdown – Analyst – Crypto News

Blockchain1 week ago

XRP Must Hold $2.65 Support Or Risk Major Breakdown – Analyst – Crypto News

others1 week ago

Japan CFTC JPY NC Net Positions down to ¥89.2K from previous ¥106.6K – Crypto News

Cryptocurrency1 week ago

How to Trade Meme Coins in 2025 – Crypto News

others1 week ago

Pi Network Invests In OpenMiind’s $20M Vision for Humanoid Robots- Is It A Right Move? – Crypto News

Business1 week ago

Pi Network Invests In OpenMiind’s $20M Vision for Humanoid Robots- Is It A Right Move? – Crypto News

Oppo K13 Turbo, K13 Turbo Pro to launch in India on 11 August: Expected price, specs and more

Technology1 week ago

Oppo K13 Turbo, K13 Turbo Pro to launch in India on 11 August: Expected price, specs and more – Crypto News

Blockchain7 days ago

Shiba Inu Team Member Reveals ‘Primary Challenge’ And ‘Top Priority’ Amid Market Uncertainty – Crypto News

Bank of America CEO Denies Alleged Debanking Trend, Says Regulators Need To Provide More Clarity To Avoid ‘Second-Guessing’

others7 days ago

Bank of America CEO Denies Alleged Debanking Trend, Says Regulators Need To Provide More Clarity To Avoid ‘Second-Guessing’ – Crypto News

OpenAI releases new reasoning-focused open-weight AI models optimised for laptops

Technology7 days ago

OpenAI releases new reasoning-focused open-weight AI models optimised for laptops – Crypto News

Crypto Market Might Be Undervalued Amid SEC’s New Stance

Blockchain6 days ago

Crypto Market Might Be Undervalued Amid SEC’s New Stance – Crypto News

Coinbase Pushes for ZK-enabled AML Overhaul Just Months After Data Breach

De-fi5 days ago

Coinbase Pushes for ZK-enabled AML Overhaul Just Months After Data Breach – Crypto News

Cryptocurrency3 days ago

DWP Management Secures $200M in XRP Post SEC-Win – Crypto News

Technology1 week ago

Will The First Spot XRP ETF Launch This Month? SEC Provides Update On Grayscale’s Fund – Crypto News

Amazon Great Freedom Sale deals on smartwatches: Up to 70% off on Samsung, Apple and more

Technology1 week ago

Amazon Great Freedom Sale deals on smartwatches: Up to 70% off on Samsung, Apple and more – Crypto News

Cryptocurrency1 week ago

Tron Eyes 40% Surge as Whales Pile In – Crypto News

others1 week ago

SharpLink Buys the Dip, Acquires $100M in ETH for Ethereum Treasury – Crypto News

Circle Extends Native USDC to Sei and Hyperliquid in Cross-Chain Push

De-fi1 week ago

Circle Extends Native USDC to Sei and Hyperliquid in Cross-Chain Push – Crypto News

Business1 week ago

Is Quantum Computing A Threat for Bitcoin- Elon Musk Asks Grok – Crypto News

Elon Musk reveals why AI won’t replace consultants anytime soon—and it’s not what you think

Technology1 week ago

Elon Musk reveals why AI won’t replace consultants anytime soon—and it’s not what you think – Crypto News

Lido Slashes 15% of Staff, Cites Operational Cost Concerns

Cryptocurrency1 week ago

Lido Slashes 15% of Staff, Cites Operational Cost Concerns – Crypto News

Is Friday’s sell-off the beginning of a downtrend?

others1 week ago

Is Friday’s sell-off the beginning of a downtrend? – Crypto News

others7 days ago

MetaPlanet Launches Online Clothing Store As Part of ‘Brand Strategy’ – Crypto News

ChatGPT won’t help you break up anymore as OpenAI tweaks rules

Metaverse6 days ago

ChatGPT won’t help you break up anymore as OpenAI tweaks rules – Crypto News

iPhone users alert! Truecaller to discontinue call recording feature for iOS from September 30. Here's what you can do…

Technology6 days ago

iPhone users alert! Truecaller to discontinue call recording feature for iOS from September 30. Here’s what you can do… – Crypto News

Technology6 days ago

iPhone users alert! Truecaller to discontinue call recording feature for iOS from September 30. Here’s what you can do… – Crypto News

US President Trump issues executive order imposing additional 25% tariff on India

others6 days ago

US President Trump issues executive order imposing additional 25% tariff on India – Crypto News

Business6 days ago

Analyst Predicts $4K Ethereum Rally as SEC Clarifies Liquid Staking Rules – Crypto News

SEC Says Some Stablecoins Can Be Treated as Cash, but Experts Warn of Innovation Risk

De-fi6 days ago

SEC Says Some Stablecoins Can Be Treated as Cash, but Experts Warn of Innovation Risk – Crypto News

Business5 days ago

XRP Price Prediction As $214B SBI Holdings Files for XRP ETF- Analyst Sees Rally to $4 Ahead – Crypto News

EUR/USD inches higher to near 1.0450, upside seems limited amid a risk-off mood

others5 days ago

EUR firmer but off overnight highs – Scotiabank – Crypto News

Trump to Sign an EO Over Ideological Debanking: Report

Blockchain5 days ago

Trump to Sign an EO Over Ideological Debanking: Report – Crypto News

Ripple Expands Its Stablecoin Payments Infra with $200M Rail Acquisition

De-fi5 days ago

Ripple Expands Its Stablecoin Payments Infra with $200M Rail Acquisition – Crypto News

Ripple To Gobble Up Payments Platform Rail for $200,000,000 To Support Transactions via XRP and RLUSD Stablecoin

others4 days ago

Ripple To Gobble Up Payments Platform Rail for $200,000,000 To Support Transactions via XRP and RLUSD Stablecoin – Crypto News

We have put a lot of emphasis on enhancing National Highways’ quality, safety: Nitin Gadkari

Technology3 days ago

Humanoid Robots Still Lack AI Technology, Unitree CEO Says – Crypto News

Business1 week ago

Is Powell Next As Fed Governor Adriana Kugler Resigns? – Crypto News

Amazon Great Freedom Festival Sale 2025 vs Flipkart Freedom Sale: Comparing MacBook deals

Technology1 week ago

Amazon Great Freedom Festival Sale 2025 vs Flipkart Freedom Sale: Comparing MacBook deals – Crypto News

Business1 week ago

India’s Jetking Targets 21,000 Bitcoin By 2032 As CFO Foresees $1M+ Price – Crypto News

Crypto News

Researchers are figuring out how large language models work – Crypto News

Metaverse

Researchers are figuring out how large language models work – Crypto News

Focus the mind

You may like

Leave a Reply Cancel reply

Leave a Reply

Trending

Leave a Reply
Cancel reply