Is OpenAI exaggerating the powers of its new ChatGPT Agent?

Metaverse

Is OpenAI exaggerating the powers of its new ChatGPT Agent? – Crypto News

Published

8 hours ago

July 20, 2025

Dripp

That being said, OpenAI has flagged the agent as high-risk under its safety framework, warning it could potentially be used to create dangerous biological or chemical substances. Is this just marketing hype, timed to build momentum for the launch of GPT-5, or a sign that AI agents are genuinely becoming more powerful and autonomous, akin to the agents who protect the computer-generated world of The Matrix?

What is ChatGPT Agent?

Say you want to rearrange your calendar, find a doctor and schedule an appointment, or research competitors and deliver a report. ChatGPT Agent can now do it for you.

The agent can browse websites, run code, analyse data, and even create slide decks or spreadsheets—all based on your instructions. It combines the strengths of OpenAI’s earlier tools—operator (which could navigate the web) and deep research (which could analyse and summarise information)—into a single system. You stay in control throughout: ChatGPT asks for permission before doing anything important, and you can stop or take over at any time. This new capability is available to Pro, Plus, and Team users through the tools dropdown.

How does it work?

ChatGPT Auses a powerful set of tools to complete tasks, including a visual browser to interact with websites like a human, a text-based browser for reasoning-heavy searches, a terminal for code execution, and direct application programming interface (API) access.

It can also connect to apps such as Gmail or GitHub to fetch relevant information. You can log in to websites within the agent’s browser, allowing it to dig deeper into personalised content. All of this runs on its own virtual computer, which keeps track of context even across multiple tools.

The agent can switch between browsers, download and edit files, and adapt its methods to complete tasks quickly and accurately. It’s built for back-and-forth collaboration—you can step in anytime to guide or change the task, and ChatGPT can ask for more input when needed. If a task takes time, you’ll get updates and a notification on your phone once it’s done.

Has OpenAI tested its performance?

OpenAI said on Humanity’s Last Exam (HLE), which tests expert-level reasoning across subjects, ChatGPT Agent achieved a new high score of 41.6, rising to 44.4 when multiple attempts were run in parallel and the most confident response was selected. On FrontierMath, the toughest known math benchmark, the agent scored 27.4% using tools such as a code-executing terminal—far ahead of previous models.

In real-world tasks, ChatGPT agent performs at or above human levels in about half of the cases, based on OpenAI’s internal evaluations. These tasks include building financial models, analysing competitors, and identifying suitable sites for green hydrogen projects.

ChatGPT Agent also outperforms others on specialised tests such as DSBench for data science, and the SpreadsheetBench for spreadsheet editing (45.5% vs Copilot Excel’s 20.0%). On BrowseComp and WebArena, which test browsing skills, the agent achieves the highest scores to date, according to OpenAI.

What are some of the things it can do?

Consider the case of travel planning. The agent won’t just suggest ideas but navigate booking websites, fill out forms, and even make reservations one you give it permission.

You can also ask it to read your emails, find meeting invitations, and automatically schedule appointments in your calendar, or even draft and send follow-up emails. This level of coordination typically required juggling between apps, but the agent manages it in a single conversational flow.

Another example involves shopping and price comparison. You can tell the agent to “order the best-reviewed smartphone under ₹15,000″, and it can search online stores, compare prices and reviews, and proceed to checkout on a preferred platform. Customer support and task automation are other examples, where the agent is used to troubleshoot an issue, log into support portals, and even file return or refund requests.

How are AI agents typically built?

Unlike basic chat bots, AI agents are autonomous systems that can plan, reason, and complete complex, multi-step tasks with minimal input—such as coding, data analysis, or generating reports.

They are built by combining ways to take in information, think, and take action. Developers begin by deciding what the agent should do, following which the agent collects data like such as or images from its environment. AI agents use large language models (LLMs) like GPT-4 as their core “brain”, which allows them to understand and respond to natural language instructions.

To allow AI agents to take action, developers connect the LLM to things like a web browser, code editor, calculator, and APIs for services such as Gmail or Slack. Frameworks like LangChain help integrate these parts, and keep track of information. Some AI agents learn from experience and get better over time. Testing and careful setup make sure they work well and follow rules.

Does ChatGPT Agent have credible competition?

Google’s Project Astra, part of its Gemini AI line, is developing a multimodal assistant that can see, hear, and respond in real time. Gemini CLI is an open-source AI agent that brings Google’s Gemini model directly to the terminal for fast, lightweight access. It integrates with Gemini Code Assist, offering developers on all plans AI-powered coding in both VS Code and the command line.

Microsoft is embedding Copilot into Windows, Office, and Teams, giving its agent access to workflows, system controls, and productivity tools, soon enhanced by a dedicated Copilot Runtime.

Meta is building more socially focused agents within messaging and the metaverse, which could evolve into utility tools.

Apple is revamping Siri through Apple Intelligence, combining GPT-level reasoning with strict privacy features and deep on-device integration.

Other smart agents include Oracle’s Miracle Agent, IBM’s Watson tools, Agentforce from Salesforce Anthropic’s Claude 3.5, and Perplexity AI’s action-oriented agents through its Comet project, blending search with agentic behaviour.

The competitive advantage, though, may go to companies that can integrate these AI agents into everyday applications and call for action with a single, unified tool – a task that ChatGPT Agent has demonstrated.

Why did OpenAI warn that ChatGPT Agent could be used to trigger biological warfare?

OpenAI claimed ChatGPT Agent’s superior capabilities could, in theory, be misused to help someone create dangerous biological or chemical substances. However, it clarified that there was no solid evidence it could actually do so.

Regardless, OpenAI is activating the highest level of safety measures under its internal ‘preparedness framework’. These include thorough threat modeling to anticipate potential misuse, special training to ensure the model refuses harmful requests, and constant monitoring using automated systems that watch for risky behaviour. There are also clear procedures in place for suspicious activity.

Should we take this risk seriously?

Ja-Nae Duane, AI expert and MIT Research Fellow and co-author of SuperShifts, said the more autonomous the agent, the more permissions and access rights it would require. For example, buying a dress requires wallet access; scheduling an event requires calendar and contact list access.

“While standard ChatGPT already presents privacy risks, the risks from ChatGPT Agent are exponentially higher because people will be granting it access rights to external tools containing personal information (like calendar, email, wallet, and more). There’s a significant gap between the pace of AI development and AI literacy; many people haven’t even fully understood ChatGPT’s existing privacy risks, and now they’re being introduced to a feature with exponentially more risks,” he said.

Duane added that the key risks included data leaks, mistaken actions, prompt injection, and account compromise, especially when handling sensitive information. Malicious actors, he warned, could exploit them by manipulating inputs, abusing tool access, stealing credentials, or poisoning data to bias outputs. Poor third-party integration and an over-reliance of them could worsen the impact, while the agent’s “black box” nature would make it hard to trace errors, he added. In the wrong hands, these agents could be weaponised for fraud, phishing, or even to generate malware.

What are the other concern areas for enterprises?

Developers are increasingly deploying AI agents across IT, customer service, and enterprise workflows. According to Nasscom, 46% of Indian firms are experimenting with these agents, particularly in IT, HR, and finance, while manufacturing leads in robotics, quality control, and automation.

Beyond concerns around hallucinations, security, privacy, and copyright or intellectual property (IP) violations, a key challenge for businesses is ensuring a return on investment. Gartner noted that many so-called agentic use cases could be handled by simpler tools and predicted that more than 40% of such projects would be scrapped by 2027 over high costs, unclear value, or inadequate risk controls.

Of the thousands of vendors in this space, only around 130 are seen as credible; many engage in “agent washing” by repackaging chatbots, robotic process automation (RPA), or basic assistants as autonomous agents. Nasscom corroborated these concerns, highlighting that 62% of enterprises were still only testing agents in-house.

Why is ‘humans-in-the-loop’ a must?

OpenAI CEO Sam Altman advised granting agents only the minimum access needed for each task, not blanket permissions. Nasscom believes that to scale responsibly, enterprises must prioritise human-AI collaboration, trust, and data readiness. It has recommended firms adopt AI agents with a “human-in-the-loop” approach, reflecting the need for oversight and contextual judgment.

According to Duane, users must understand both the tool’s strengths and its limits, especially when handling sensitive data. Caution is key, as misuse could have serious consequences. She also emphasised the importance of AI literacy, noting that AI was evolving far faster than most people’s understanding of how to use it responsibly.

Don't Miss

AI is the new intern at San Jose’s City Hall: Mayor Matt Mahan uses ChatGPT to draft speeches and budgets – Crypto News

DIS Elliott Wave technical analysis [Video]

others1 week ago

Skies are clearing for Delta as stock soars 13% on earnings beat – Crypto News

others1 week ago

Skies are clearing for Delta as stock soars 13% on earnings beat – Crypto News

Whale Sells $407K TRUMP, Loses $1.37M in Exit

Cryptocurrency6 days ago

Whale Sells $407K TRUMP, Loses $1.37M in Exit – Crypto News

Technology1 week ago

XRP Eyes $3 Breakout Amid Rising BlackRock ETF Speculation – Crypto News

Technology1 week ago

Breaking: SharpLink Purchases 10,000 ETH from Ethereum Foundation, SBET Stock Up 7% – Crypto News

Bitcoin Hits All-Time High as Crypto Legislation Votes Near

Blockchain1 week ago

Bitcoin Hits All-Time High as Crypto Legislation Votes Near – Crypto News

Business1 week ago

PENGU Rallies Over 20% Amid Coinbase’s Pudgy Penguins PFP Frenzy – Crypto News

Robinhood Dealing With Fallout of Tokenized Equities Offering

Blockchain7 days ago

Robinhood Dealing With Fallout of Tokenized Equities Offering – Crypto News

Satoshi-Era Bitcoin Whale Moves Another $2.42 Billion, What's Happening?

Cryptocurrency6 days ago

Satoshi-Era Bitcoin Whale Moves Another $2.42 Billion, What’s Happening? – Crypto News

metaverse, digital transformation, omnichannel commerce

Metaverse1 week ago

How Brands Can Deepen Customer Connections in the Metaverse – Crypto News

Perplexity launches Comet, an AI-powered browser to challenge Google Chrome; OpenAI expected to enter the space soon

Technology1 week ago

Perplexity launches Comet, an AI-powered browser to challenge Google Chrome; OpenAI expected to enter the space soon – Crypto News

Bitcoin Traders Brace for Volatility Amid Crypto Market Uncertainty

Cryptocurrency1 week ago

Bitcoin Breaks New Record at $111K, What’s Fueling the $120K Price Target? – Crypto News

EUR/GBP climbs as weak UK data fuels BoE rate cut speculation

others1 week ago

EUR/GBP climbs as weak UK data fuels BoE rate cut speculation – Crypto News

SUI Chart Pattern Confirmation Sets $3.89 Price Target

Blockchain1 week ago

SUI Chart Pattern Confirmation Sets $3.89 Price Target – Crypto News

Business1 week ago

US Senate To Release CLARITY Act Draft Next Week – Crypto News

$687,220,000 in Bitcoin Shorts Liquidated in Just One Hour As BTC Explodes To $116,000

others1 week ago

$687,220,000 in Bitcoin Shorts Liquidated in Just One Hour As BTC Explodes To $116,000 – Crypto News

Business1 week ago

S&P Global Downgrades Saks Global’s Credit Rating – Crypto News

Anthony Scaramucci Says $180,000 Bitcoin Price Explosion Possible As BTC 'Supremacy' Creeps Up – Here’s His Timeline

others1 week ago

Anthony Scaramucci Says $180,000 Bitcoin Price Explosion Possible As BTC ‘Supremacy’ Creeps Up – Here’s His Timeline – Crypto News

Cryptocurrency1 week ago

Bitcoin Breaks New Record at $111K, What’s Fueling the $120K Price Target? – Crypto News

Business1 week ago

XRP Set for Big Week as ProShares ETF Launches July 18 – Crypto News

Technology4 days ago

Fed Rate Cut Odds Surge As Powell’s Future Hangs In The Balance – Crypto News

Business1 week ago

Breaking: US SEC Delays Grayscale Avalanche ETF Launch – Crypto News

Ousted Movement Labs Co-Founder Sues Startup in Delaware Court

De-fi1 week ago

Ousted Movement Labs Co-Founder Sues Startup in Delaware Court – Crypto News

Technology4 days ago

Fed Rate Cut Odds Surge As Powell’s Future Hangs In The Balance – Crypto News

Government notifies draft rules for ’One Nation, One Time’ initiative to synchronize time across India

Technology1 week ago

One Tech Tip: Click-to-cancel is over, but there are other ways to unsubscribe – Crypto News

Google DeepMind hires Windsurf CEO as OpenAI’s $3 billion acquisition collapses

Technology1 week ago

Google DeepMind hires Windsurf CEO as OpenAI’s $3 billion acquisition collapses – Crypto News

Technology1 week ago

Hyperliquid Hits Record $10.6B OI As HYPE Price Records New ATH – Crypto News

Ziglu Faces $2.7M Shortfall as Crypto Fintech Enters Special Administration

Blockchain1 week ago

Ziglu Faces $2.7M Shortfall as Crypto Fintech Enters Special Administration – Crypto News

Cryptocurrency7 days ago

Why Is Bitcoin Up Today? – Crypto News

Strategy Resumes Bitcoin Buys, Boosting Holdings to Over $72 Billion in BTC

Cryptocurrency6 days ago

Strategy Resumes Bitcoin Buys, Boosting Holdings to Over $72 Billion in BTC – Crypto News

Bitcoin Price Back Above $106K, As Whale Transactions Swell Up

Cryptocurrency6 days ago

Bitcoin Breaches $120K, Institutional FOMO Takes and House Debate Propel Gains – Crypto News

Ripple and Ctrl Alt Team to Support Real Estate Tokenization

Blockchain4 days ago

Ripple and Ctrl Alt Team to Support Real Estate Tokenization – Crypto News

Business4 days ago

XRP Lawsuit Update: Ripple Paid $125M in Cash, Settlement Hinges on Appeal – Crypto News

Coinbase Partners With Perplexity AI to Integrate Market Data

Blockchain1 week ago

Coinbase Partners With Perplexity AI to Integrate Market Data – Crypto News

Outlook outage: Microsoft responds, but frustrated users clap back with hilarious memes

Technology1 week ago

Outlook outage: Microsoft responds, but frustrated users clap back with hilarious memes – Crypto News

Blockchain1 week ago

Bitcoin Breakout Not Just Hype—$4.4B Inflows Back The Move – Crypto News

Nvidia now briefly hits $4 trillion in trading, stocks and bitcoin rally

Technology1 week ago

Nvidia now briefly hits $4 trillion in trading, stocks and bitcoin rally – Crypto News

Technology1 week ago

ETH Is Next BTC? BlackRock Is Ditching Bitcoin And Buying More Ethereum – Crypto News

Gold sticks to modest losses; remains close to record high amid trade war fears

others1 week ago

Gold struggles to capitalize on its modest intraday gains amid mixed cues – Crypto News

XRP Price Builds Momentum — $2.50 Break Sparks Fresh Bullish Wave

Blockchain1 week ago

XRP Price Builds Momentum — $2.50 Break Sparks Fresh Bullish Wave – Crypto News

Cardano [ADA] price prediction - 8% rally next, but here's why you should be careful!

Cryptocurrency1 week ago

Cardano [ADA] price prediction – 8% rally next, but here’s why you should be careful! – Crypto News

Bitcoin Primed for the Next Major Parabolic Advance, Says Crypto Analyst Kevin Svenson – Here Are His Targets

others1 week ago

Bitcoin Primed for the Next Major Parabolic Advance, Says Crypto Analyst Kevin Svenson – Here Are His Targets – Crypto News

MultiBank Group Confirms MBG Token TGE Set for July 22, 2025

others1 week ago

MultiBank Group Confirms MBG Token TGE Set for July 22, 2025 – Crypto News

Lagrange (LA) dropping as the crypto market rally

Cryptocurrency1 week ago

Why is ZK proof altcoin Lagrange (LA) dropping amid a rally in crypto market – Crypto News

Blockchain1 week ago

SUI Explodes Higher, Climbing Above 20-Day MA — But Can The Rally Hold? – Crypto News

Cryptocurrency1 week ago

XRP price targets breakout above $3 as BTC hits fresh ATH – Crypto News

Technology1 week ago

Breaking: Jerome Powell Allegedly Considering Resignation Amid Trump’s Criticism – Crypto News

Business1 week ago

XRP Price Prediction As Bitcoin Makes News All Time Highs- Is XRP Next? – Crypto News

Top US-Based Crypto Exchange by Trading Volume Coinbase Adds Support for DeFi Tokens SKY and USDS

others1 week ago

Top US-Based Crypto Exchange by Trading Volume Coinbase Adds Support for DeFi Tokens SKY and USDS – Crypto News

House Democrats Struggle to Approach 'Crypto Week' With Unified Front

Cryptocurrency1 week ago

House Democrats Struggle to Approach ‘Crypto Week’ With Unified Front – Crypto News

Crypto News

Is OpenAI exaggerating the powers of its new ChatGPT Agent? – Crypto News

Metaverse

Is OpenAI exaggerating the powers of its new ChatGPT Agent? – Crypto News

What is ChatGPT Agent?

How does it work?

Has OpenAI tested its performance?

What are some of the things it can do?

How are AI agents typically built?

Does ChatGPT Agent have credible competition?

Why did OpenAI warn that ChatGPT Agent could be used to trigger biological warfare?

Should we take this risk seriously?

What are the other concern areas for enterprises?

Why is ‘humans-in-the-loop’ a must?

You may like

Trending