{"id":393295,"date":"2025-05-26T19:50:26","date_gmt":"2025-05-26T14:20:26","guid":{"rendered":"https:\/\/dripp.zone\/news\/anthropic-ceo-ai-could-be-more-factually-reliable-than-people-in-structured-tasks-crypto-news\/"},"modified":"2025-05-26T20:01:29","modified_gmt":"2025-05-26T14:31:29","slug":"anthropic-ceo-ai-could-be-more-factually-reliable-than-people-in-structured-tasks-crypto-news","status":"publish","type":"post","link":"https:\/\/dripp.zone\/news\/anthropic-ceo-ai-could-be-more-factually-reliable-than-people-in-structured-tasks-crypto-news\/","title":{"rendered":"Anthropic CEO: AI could be more factually reliable than people in structured tasks &#8211; Crypto News"},"content":{"rendered":"<p><\/p>\n<div id=\"article-index-0\">\n<p>Artificial intelligence may now surpass humans in factual accuracy\u2014at least in certain structured scenarios\u2014according to Anthropic CEO Dario Amodei. Speaking at two major tech events this month, VivaTech 2025 in Paris and the inaugural<i>Code With Claude<\/i> developer day, Amodei asserted that modern AI models, including the newly launched Claude 4 series, may hallucinate less often than people when answering well-defined factual questions,<i> reported Business Today<\/i>.<\/p>\n<\/div>\n<div id=\"article-index-1\">\n<p><a rel=\"nofollow\" target=\"_blank\" class=\"backlink\" target=\"_blank\" href=\"https:\/\/www.livemint.com\/technology\/tech-news\/anthropic-unveils-claude-opus-4-and-sonnet-4-featuring-whistleblowing-capability-what-it-means-for-users-11747998632322.html\" data-vars-page-type=\"story\" data-vars-link-type=\"Manual\">Hallucination<\/a>, in the context of AI, refers to the tendency of models to confidently produce inaccurate or fabricated information, the report added. This longstanding flaw has raised concerns in fields such as journalism, medicine, and law. However, Amodei\u2019s remarks suggest that the tables may be turning\u2014at least in controlled conditions.<\/p>\n<\/div>\n<div id=\"article-index-2\">\n<p>\u201cIf you define hallucination as confidently stating something incorrect, humans actually do that quite frequently,\u201d Amodei said during his keynote at VivaTech. He cited internal testing which showed<a rel=\"nofollow\" target=\"_blank\" class=\"backlink\" target=\"_blank\" href=\"https:\/\/www.livemint.com\/technology\/tech-news\/anthropic-claude-now-fuels-genetic-research-and-drug-discovery-firm-announces-up-to-20-000-in-ai-credits-how-to-apply-11746611518168.html\" data-vars-page-type=\"story\" data-vars-link-type=\"Manual\"> Claude 3.5 <\/a>outperforming human participants on structured factual quizzes. The results, he claimed, demonstrate a notable shift in reliability when it comes to straightforward question-answer tasks.<\/p>\n<\/div>\n<div id=\"article-index-3\">\n<p>Reportedly, at the developer-focused<i>Code With Claude<\/i> event, where Anthropic introduced the Claude Opus 4 and Claude Sonnet 4 models, Amodei reiterated his stance. \u201cIt really depends on how you measure it,\u201d he noted. \u201cBut I suspect that AI models probably hallucinate less than humans, though when they do, the mistakes are often more surprising.\u201d<\/p>\n<\/div>\n<div id=\"article-index-5\">\n<p>The newly unveiled <a rel=\"nofollow\" target=\"_blank\" class=\"backlink\" target=\"_blank\" href=\"https:\/\/www.livemint.com\/companies\/news\/anthropic-claude-copyright-lawsuit-ai-training-data-copyright-universal-music-anthropic-court-ai-lyrics-copyright-11742998317096.html\" data-vars-page-type=\"story\" data-vars-link-type=\"Manual\" data-vars-anchor-text=\"Claude 4\">Claude 4<\/a> models reflect Anthropic\u2019s latest advances in the pursuit of artificial general intelligence (AGI), boasting improved capabilities in long-term memory, coding, writing, and tool integration. Of particular note, Claude Sonnet 4 achieved a 72.7 per cent score on the SWE-Bench software engineering benchmark, surpassing previous models and setting a new industry standard.<\/p>\n<\/div>\n<div id=\"article-index-6\">\n<p>However, Amodei was quick to acknowledge that hallucinations have not been eradicated. In unstructured or open-ended conversations, even state-of-the-art models remain vulnerable to error. The CEO stressed that context, prompt design, and domain-specific application heavily influence a model\u2019s accuracy, particularly in high-stakes settings like legal filings or healthcare.<\/p>\n<\/div>\n<div id=\"article-index-7\">\n<p>His remarks follow a recent legal incident involving Anthropic\u2019s chatbot, where the AI cited a non-existent case during a lawsuit filed by music publishers. The error led to an apology from the company&#8217;s legal team, reinforcing the ongoing challenge of ensuring factual consistency in real-world use.<\/p>\n<\/div>\n<div id=\"article-index-8\">\n<p>Amodei also reportedly highlighted the lack of clear, industry-wide metrics for hallucination. \u201cYou can\u2019t fix what you don\u2019t measure precisely,\u201d he cautioned, calling for standardised definitions and evaluation frameworks to track and mitigate <a rel=\"nofollow\" target=\"_blank\" class=\"backlink\" target=\"_blank\" href=\"https:\/\/www.livemint.com\/companies\/news\/anthropic-claude-copyright-lawsuit-ai-training-data-copyright-universal-music-anthropic-court-ai-lyrics-copyright-11742998317096.html\" data-vars-page-type=\"story\" data-vars-link-type=\"Manual\" data-vars-anchor-text=\"AI errors\">AI errors<\/a>.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Artificial intelligence may now surpass humans in factual accuracy\u2014at least in certain structured scenarios\u2014according to Anthropic CEO Dario Amodei. Speaking at two major tech events this month, VivaTech 2025 in Paris and the inauguralCode With Claude developer day, Amodei asserted that modern AI models, including the newly launched Claude 4 series, may hallucinate less often [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":393299,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[9379,36127,36121,36118,36114,36126,36124,36123,22919,36100,36106,36110,36108,36111,36116,36119,11622,13778,36090,9493,284,263,262,22731,36092,36087,36102,36103,36093,33549,260,259,36097,258,36089,265,202,36112,261,36098,36105,36095,264],"class_list":["post-393295","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-metaverse","tag-agi","tag-ai-accuracy-standards","tag-ai-benchmark-testing","tag-ai-coding-performance","tag-ai-context-sensitivity","tag-ai-errors-in-legal-use","tag-ai-evaluation-frameworks","tag-ai-factual-consistency","tag-ai-hallucination","tag-ai-hallucination-metrics","tag-ai-in-journalism","tag-ai-in-law","tag-ai-in-medicine","tag-ai-legal-errors","tag-ai-memory-capabilities","tag-ai-reliability","tag-ai-vs-humans","tag-anthropic-ai","tag-anthropic-ceo","tag-artificial-general-intelligence","tag-artificial-intelligence","tag-axie-infinity","tag-axs","tag-claude-3-5","tag-claude-4-models","tag-claude-4-series","tag-claude-opus-4","tag-claude-sonnet-4","tag-code-with-claude","tag-dario-amodei","tag-decentraland","tag-facebook","tag-factual-accuracy-ai","tag-game","tag-hallucination-in-ai","tag-mark-zuckerberg","tag-nft","tag-prompt-design","tag-sandbox","tag-structured-factual-questions","tag-swe-bench-benchmark","tag-vivatech-2025","tag-vr"],"_links":{"self":[{"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/posts\/393295","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/comments?post=393295"}],"version-history":[{"count":1,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/posts\/393295\/revisions"}],"predecessor-version":[{"id":393301,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/posts\/393295\/revisions\/393301"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/media\/393299"}],"wp:attachment":[{"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/media?parent=393295"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/categories?post=393295"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/tags?post=393295"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}