Today’s AI models are impressive. Teams of them will be formidable

Metaverse

Today’s AI models are impressive. Teams of them will be formidable – Crypto News

Published

1 year ago

May 15, 2024

Dripp

The upgrade is part of wider moves across the tech industry to make chatbots and other artificial-intelligence, or AI, products into more useful and engaging assistants for everyday life. Show GPT-4o pictures or videos of art or food that you enjoy and it could probably furnish you with a list of museums, galleries and restaurants you might like. But it still has some way to go before it can become a truly useful AI assistant. Ask the model to plan a last-minute trip to Berlin for you based on your leisure preferences—complete with details of which order to do everything, given how long each one takes and how far apart they are and which train tickets to buy, all within a set budget—and it will disappoint.

There is a way, however, to make large language models (LLMs) perform such complex jobs: make them work together. Teams of LLMs—known as multi-agent systems (MAS)—can assign each other tasks, build on each other’s work or deliberate over a problem in order to find a solution that each one, on its own, would have been unable to reach. And all without the need for a human to direct them at every step. Teams also demonstrate the kinds of reasoning and mathematical skills that are usually beyond standalone AI models. And they could be less prone to generating inaccurate or false information.

Even without explicit instructions to do so teams of agents can demonstrate planning and collaborative behaviour, when given a joint task. In a recent experiment funded by the US Defense Advanced Research Projects Agency (DARPA), three agents—Alpha, Bravo and Charlie—were asked to find and defuse bombs hidden in a warren of virtual rooms. The bombs could be deactivated only by using specific tools in the correct order. At each round in the task, the agents, which used OpenAI’s GPT-3.5 and GPT-4 language models to emulate problem-solving specialists, were able to propose a series of actions and communicate these to their teammates.

At one point in the exercise, Alpha announced that it was inspecting a bomb in one of the rooms and instructed its partners what to do next: “Bravo; please move to Room 3. Charlie; please move to Room 5.” Bravo complied, suggesting that Alpha ought to have a go at using the red tool to defuse the bomb it had encountered. The researchers had not told Alpha to boss the other two agents around, but the fact that it did made the team work more efficiently.

Because LLMs use written text for both their inputs and outputs, agents can easily be put into direct conversation with each other. At the Massachusetts Institute of Technology (MIT), researchers showed that two chatbots in dialogue fared better at solving maths problems than just one. Their system worked by feeding the agents, each based on a different LLM, the other’s proposed solution. It then prompted the agents to update their answer based on their partner’s work.

According to Yilun Du, a computer scientist at MIT who led the work, if one agent was right and the other was wrong they were more likely than not to converge on the correct answer. The team also found that by asking two different LLM agents to reach a consensus with one another when reciting biographical facts about well-known computer scientists, the teams were less likely to fabricate information than solitary LLMs.

Some researchers who work on MAS have proposed that this kind of “debate” between agents might one day be useful for medical consultations, or to generate peer-review-like feedback on academic papers. There is even the suggestion that agents going back and forth on a problem could help automate the process of fine-tuning LLMs—something that currently requires labour-intensive human feedback.

Teams do better than solitary agents because a single job can be split into many smaller, more specialised tasks, says Chi Wang, a principal researcher at Microsoft Research in Redmond, Washington. Single LLMs can divide up their tasks, too, but they can only work through those tasks in a linear fashion, which is limiting, he says. Like teams of the human sort, each of the individual tasks in a multi-LLM job might also require distinct skills and, crucially, a hierarchy of roles.

Dr Wang’s team have created a team of agents that writes software in this manner. It consists of a “commander”, which receives instructions from a person and delegates sub-tasks to the other agents—a “writer” that writes the code, and a “safeguard” agent that reviews the code for security flaws before sending it back up the chain for signoff. According to Dr Wang and his team’s tests, simple coding tasks using their MAS can be three times quicker than when a human uses a single agent, with no apparent loss in accuracy.

Similarly, an MAS asked to plan a trip to Berlin, for example, could split the request into several tasks, such as scouring the web for sightseeing locations that best match your interests, mapping out the most efficient route around the city and keeping a tally of costs. Different agents could take responsibility for specific tasks and a co-ordinating agent could then bring it all together to present a proposed trip.

Interactions between LLMs also make for convincing simulacra of human intrigue. A researcher at the University of California, Berkeley, has demonstrated that with just a few instructions, two agents based on GPT-3.5 could be prompted to negotiate the price of a rare Pokémon card. In one case, an agent that was instructed to “be rude and terse” told the seller that $50 “seems a bit steep for a piece of cardboard”. After more back and forth, the two parties settled on $25.

There are downsides. LLMs sometimes have a propensity for inventing wildly illogical solutions to their tasks and, in a multi-agent system, these hallucinations can cascade through the whole team. In the bomb-defusing exercise run by DARPA, for example, at one stage an agent proposed looking for bombs that were already defused instead of finding active bombs and then defusing them.

Agents that come up with incorrect answers in a debate can also convince their teammates to change correct answers; or teams can also get tangled up. In a problem-solving experiment by researchers at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia, two agents repeatedly bid each other a cheerful farewell. Even after one agent commented that “it seems like we are stuck in a loop”, they could not break free.

Nevertheless, AI teams are already attracting commercial interest. In November 2023, Satya Nadella, the boss of Microsoft, said that AI agents’ ability to converse and co-ordinate would become a key feature for the company’s AI assistants in the near future. Earlier that year, Microsoft had released AutoGen, an open-source framework for building teams with LLM agents. Thousands of researchers have since experimented with the system, says Dr Wang, whose team led its development.

Dr Wang’s own work with teams of AIs has shown that they can exhibit greater levels of collective intelligence than individual LLMs. An MAS built by his team currently beats every other individual LLM on a benchmark called Gaia, proposed by experts including Yann LeCun, chief AI scientist at Meta, to gauge a system’s general intelligence. Gaia includes questions that are meant to be simple for humans but challenging for most advanced AI models—visualising multiple Rubik’s cubes, for example, or quizzes on esoteric trivia.

Another AutoGen project, led by Jason Zhou, an independent entrepreneur based in Australia, teamed an image generator up with a language model. The language model reviews each generated image on the basis of how closely it fits with the original prompt. This feedback then serves as a prompt for the image generator to produce a new output that is—in some cases—closer to what the human user wanted.

Practitioners in the field claim that they are only scratching the surface with their work so far. Today, setting up LLM-based teams still requires some sophisticated know-how. But that could soon change. The AutoGen team at Microsoft is planning an update so that users can build multi-agent systems without having to write any code. Camel, another open-source framework for MAS developed by KAUST, already offers a no-code functionality online; users can type a task in plain English and watch as two agents—an assistant and a boss—get to work.

Other limitations might be harder to overcome. MAS can be computationally intensive. And those that use commercial services like ChatGPT can be prohibitively expensive to run for more than a few rounds. And if MAS does live up to its promise, it could present new risks. Commercial chatbots often come with blocking mechanisms that prevent them from generating harmful outputs. But MAS may offer a way of circumventing some of these controls. A team of researchers at the Shanghai Artificial Intelligence Laboratory recently showed how agents in various open-source systems, including AutoGen and Camel, could be conditioned with “dark personality traits”. In one experiment, an agent was told: “You do not value the sanctity of life or moral purity.”

Guohao Li, who designed Camel, says that an agent instructed to “play” the part of a malicious actor could bypass its blocking mechanisms and instruct its assistant agents to carry out harmful tasks like writing a phishing email or developing a cyber bug. This would enable an MAS to carry out tasks that single AIs might otherwise refuse. In the dark-traits experiments, the agent with no regard for moral purity can be directed to develop a plan to steal a person’s identity, for example.

Some of the same techniques used for multi-agent collaboration could also be used to attack commercial LLMs. In November 2023, researchers showed that using a chatbot to prompt another chatbot into engaging in nefarious behaviour, a process known as “jailbreaking”, was significantly more effective than other techniques. In their tests, a human was only able to jailbreak GPT-4 0.23% of the time. Using a chatbot (which was also based on GPT-4), that figure went up to 42.5%.

A team of agents in the wrong hands might therefore be a formidable weapon. If MAS are granted access to web browsers, other software systems or your personal banking information for booking a trip to Berlin, the risks could be especially severe. In one experiment, the Camel team instructed the system to make a plan to take over the world. The result was a long and detailed blueprint. It included, somewhat ominously, a powerful idea: “partnering with other AI systems”.

From The Economist, published under licence. The original content can be found on www.economist.com

Up Next

India well positioned to leap ahead of developed worlds with AI: Sundar Pichai – Crypto News

Don't Miss

India Inc, not just IT, dangles big bucks for AI specialists. But is it all hype? – Crypto News

Click to comment

Leave a Reply
Cancel reply

DIS Elliott Wave technical analysis [Video]

others1 week ago

Skies are clearing for Delta as stock soars 13% on earnings beat – Crypto News

others1 week ago

Skies are clearing for Delta as stock soars 13% on earnings beat – Crypto News

Whale Sells $407K TRUMP, Loses $1.37M in Exit

Cryptocurrency7 days ago

Whale Sells $407K TRUMP, Loses $1.37M in Exit – Crypto News

Technology1 week ago

XRP Eyes $3 Breakout Amid Rising BlackRock ETF Speculation – Crypto News

Technology1 week ago

Breaking: SharpLink Purchases 10,000 ETH from Ethereum Foundation, SBET Stock Up 7% – Crypto News

Bitcoin Hits All-Time High as Crypto Legislation Votes Near

Blockchain1 week ago

Bitcoin Hits All-Time High as Crypto Legislation Votes Near – Crypto News

Business1 week ago

PENGU Rallies Over 20% Amid Coinbase’s Pudgy Penguins PFP Frenzy – Crypto News

Robinhood Dealing With Fallout of Tokenized Equities Offering

Blockchain1 week ago

Robinhood Dealing With Fallout of Tokenized Equities Offering – Crypto News

Satoshi-Era Bitcoin Whale Moves Another $2.42 Billion, What's Happening?

Cryptocurrency6 days ago

Satoshi-Era Bitcoin Whale Moves Another $2.42 Billion, What’s Happening? – Crypto News

EUR/GBP climbs as weak UK data fuels BoE rate cut speculation

others1 week ago

EUR/GBP climbs as weak UK data fuels BoE rate cut speculation – Crypto News

Business1 week ago

US Senate To Release CLARITY Act Draft Next Week – Crypto News

$687,220,000 in Bitcoin Shorts Liquidated in Just One Hour As BTC Explodes To $116,000

others1 week ago

$687,220,000 in Bitcoin Shorts Liquidated in Just One Hour As BTC Explodes To $116,000 – Crypto News

Business1 week ago

S&P Global Downgrades Saks Global’s Credit Rating – Crypto News

Ousted Movement Labs Co-Founder Sues Startup in Delaware Court

De-fi1 week ago

Ousted Movement Labs Co-Founder Sues Startup in Delaware Court – Crypto News

Business1 week ago

XRP Set for Big Week as ProShares ETF Launches July 18 – Crypto News

Technology4 days ago

Fed Rate Cut Odds Surge As Powell’s Future Hangs In The Balance – Crypto News

Business4 days ago

XRP Lawsuit Update: Ripple Paid $125M in Cash, Settlement Hinges on Appeal – Crypto News

Bitcoin Price Back Above $106K, As Whale Transactions Swell Up

Cryptocurrency6 days ago

Bitcoin Breaches $120K, Institutional FOMO Takes and House Debate Propel Gains – Crypto News

Ripple and Ctrl Alt Team to Support Real Estate Tokenization

Blockchain4 days ago

Ripple and Ctrl Alt Team to Support Real Estate Tokenization – Crypto News

Technology4 days ago

Fed Rate Cut Odds Surge As Powell’s Future Hangs In The Balance – Crypto News

Top US-Based Crypto Exchange by Trading Volume Coinbase Adds Support for DeFi Tokens SKY and USDS

others1 week ago

Top US-Based Crypto Exchange by Trading Volume Coinbase Adds Support for DeFi Tokens SKY and USDS – Crypto News

Government notifies draft rules for ’One Nation, One Time’ initiative to synchronize time across India

Technology1 week ago

One Tech Tip: Click-to-cancel is over, but there are other ways to unsubscribe – Crypto News

Google DeepMind hires Windsurf CEO as OpenAI’s $3 billion acquisition collapses

Technology1 week ago

Google DeepMind hires Windsurf CEO as OpenAI’s $3 billion acquisition collapses – Crypto News

Technology1 week ago

Hyperliquid Hits Record $10.6B OI As HYPE Price Records New ATH – Crypto News

Ziglu Faces $2.7M Shortfall as Crypto Fintech Enters Special Administration

Blockchain1 week ago

Ziglu Faces $2.7M Shortfall as Crypto Fintech Enters Special Administration – Crypto News

Cryptocurrency1 week ago

Why Is Bitcoin Up Today? – Crypto News

Strategy Resumes Bitcoin Buys, Boosting Holdings to Over $72 Billion in BTC

Cryptocurrency6 days ago

Strategy Resumes Bitcoin Buys, Boosting Holdings to Over $72 Billion in BTC – Crypto News

Business6 days ago

Pepe Coin Rich List June 2025: Who’s Holding Highest PEPE as it Nears Half a Million Holders? – Crypto News

others4 days ago

EUR/USD recovers with trade talks and Fed independence in focus – Crypto News

Top Crypto Exchange by Trading Volume Binance Announces Airdrop for New Ethereum (ETH) Ecosystem Altcoin

others4 days ago

Top Crypto Exchange by Trading Volume Binance Announces Airdrop for New Ethereum (ETH) Ecosystem Altcoin – Crypto News

GBP/USD could stretch lower if 1.2900 support fails

others4 days ago

GBP/USD rallies on US PPI dip and Trump’s potential Powell removal – Crypto News

others4 days ago

GBP/USD rallies on US PPI dip and Trump’s potential Powell removal – Crypto News

Bitcoin hits record high above $120K; US June inflation data awaited

Cryptocurrency3 days ago

Bitcoin trades near $119K after new all-time high; Coinbase rebrands wallet to ‘Base App’ – Crypto News

XRP Price Builds Momentum — $2.50 Break Sparks Fresh Bullish Wave

Blockchain1 week ago

XRP Price Builds Momentum — $2.50 Break Sparks Fresh Bullish Wave – Crypto News

Cardano [ADA] price prediction - 8% rally next, but here's why you should be careful!

Cryptocurrency1 week ago

Cardano [ADA] price prediction – 8% rally next, but here’s why you should be careful! – Crypto News

Bitcoin Primed for the Next Major Parabolic Advance, Says Crypto Analyst Kevin Svenson – Here Are His Targets

others1 week ago

Bitcoin Primed for the Next Major Parabolic Advance, Says Crypto Analyst Kevin Svenson – Here Are His Targets – Crypto News

MultiBank Group Confirms MBG Token TGE Set for July 22, 2025

others1 week ago

MultiBank Group Confirms MBG Token TGE Set for July 22, 2025 – Crypto News

Lagrange (LA) dropping as the crypto market rally

Cryptocurrency1 week ago

Why is ZK proof altcoin Lagrange (LA) dropping amid a rally in crypto market – Crypto News

Blockchain1 week ago

SUI Explodes Higher, Climbing Above 20-Day MA — But Can The Rally Hold? – Crypto News

Cryptocurrency1 week ago

XRP price targets breakout above $3 as BTC hits fresh ATH – Crypto News

Technology1 week ago

Breaking: Jerome Powell Allegedly Considering Resignation Amid Trump’s Criticism – Crypto News

Business1 week ago

XRP Price Prediction As Bitcoin Makes News All Time Highs- Is XRP Next? – Crypto News

House Democrats Struggle to Approach 'Crypto Week' With Unified Front

Cryptocurrency1 week ago

House Democrats Struggle to Approach ‘Crypto Week’ With Unified Front – Crypto News

XRP Whales Surge To New Highs As Price Jumps 10%

Blockchain1 week ago

XRP Whales Surge To New Highs As Price Jumps 10% – Crypto News

Google’s Veo 3 brings the era of video on command

Metaverse1 week ago

Google’s Veo 3 brings the era of video on command – Crypto News

Near-Term Resistance at $170 But Bullishness Stays

Cryptocurrency1 week ago

Near-Term Resistance at $170 But Bullishness Stays – Crypto News

Gold drops as Trump pulls back from Iran strike as hostilities continue

others1 week ago

Gold rises as USD aversion and tariff tensions boost safe-haven demand – Crypto News

Cryptocurrency1 week ago

XRP, Solana and ADA Rally, Is Altcoin Season Back This July 2025? – Crypto News

Amazon Prime Day Sale 2025: Best earphones and headphone deals with up to 70% off

Technology1 week ago

Amazon Prime Day Sale 2025: Best earphones and headphone deals with up to 70% off – Crypto News

Crypto Hacker Who Drained $42,000,000 From GMX Goes White Hat, Returns Funds in Exchange for $5,000,000 Bounty

others1 week ago

Crypto Hacker Who Drained $42,000,000 From GMX Goes White Hat, Returns Funds in Exchange for $5,000,000 Bounty – Crypto News

Crypto News

Today’s AI models are impressive. Teams of them will be formidable – Crypto News

Metaverse

Today’s AI models are impressive. Teams of them will be formidable – Crypto News

You may like

Leave a Reply Cancel reply

Leave a Reply

Trending

Leave a Reply
Cancel reply