LLaMA requires “far less computing power and resources to test new approaches, validate others’ work, and explore new use cases”, according to Meta (AP)

Metaverse

Meta’s Llama 2: Why open-source LLMs are the joker in the generative AI pack – Crypto News

Published

2 years ago

July 19, 2023

Dripp

Leslie D’Monte

The Generative AI race just got hotter with Meta releasing the second version of its free open-source large language model, Llama 2, for research and commercial use, thus providing an alternative to the pricy proprietary LLMs sold by OpenAI like ChatGPT Plus and Google Bard while giving a boost to open source LLMs.

Developers began flocking to LLaMA–Meta’s open-source LLM that was released in February (https://ai.meta.com/blog/large-language-model-llama-meta-ai/). Researchers made more than 100,000 requests for Llama 1, according to Meta. LLaMA requires “far less computing power and resources to test new approaches, validate others’ work, and explore new use cases”, according to Meta. Meta made LLaMA available in several sizes (7B, 13B, 33B, and 65B parameters — B stands for billion) and had also shared a LLaMA model card that detailed how it built the model, very unlike the lack of transparency at OpenAI.

The Generative Pre-trained Transformer series (GPT-3), on the other hand, has 175 billion parameters while GPT-4 was rumored to have been launched with 100 trillion parameters, a claim that was dismissed by OpenAI CEO Sam Altman. Foundation models train on a large set of unlabelled data, which makes them ideal for fine-tuning a variety of tasks. For instance, ChatGPT based on GPT 3.5 was trained on 570GB of text data from the internet containing hundreds of billions of words, including text harvested from books, articles, and websites, including social media.

However, according to Meta, smaller models trained on more tokens—pieces of words—are easier to re-train and fine-tune for specific potential product use cases. Meta says it has trained LLaMA 65B and LLaMA 33B on 1.4 trillion tokens. Its smallest model, LLaMA 7B, is trained on one trillion tokens. Like other LLMs, LLaMA takes a sequence of words as input and predicts the next word to generate text recursively. Meta says it chose a text from the 20 languages with the most speakers, focusing on those with Latin and Cyrillic alphabets, to train LLaMa.

The newly-released Llama 2, according to Meta, is a collection of pretrained and fine-tuned LLMs, ranging from 7 billion to 70 billion parameters. Meta has also released Llama 2-Chat, a fine-tuned version of Llama 2 that is optimized for dialogue with the same parameter ranges. Meta claims that these models “have demonstrated their competitiveness with existing open-source chat models, as well as competence that is equivalent to some proprietary models on evaluation sets we examined” but acknowledges that they still lag other models like OpenAI’s GPT-4.

One may note, though, that scraping of data has become a thorny issue and the reason for many class-action suits too. In a 157-page class action lawsuit filed on June 28 in the US District Court, Northern District of California, the plaintiffs alleged that the defendants have used “unlawful and harmful conduct in developing, marketing, and operating their AI products including ChatGPT-3.5 , ChatGPT-4.0, Dall-E, and Vall-E”, which use “stolen private information” from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge, and continue to do so to develop and train the products (https://www.livemint.com/news/india/why-is-musk-angry-and-why-is-openai-being-sued-11688448802931.html).

Meta says Llama 2 has been trained on a mix of data from publicly-available sources, which does not include data from Meta’s products or services. The company adds that it has made an effort to remove data from certain sites known to contain a high volume of personal information about private individuals. Llama 2 was trained on 2 trillion tokens of data “as this provides a good performance–cost trade-off, up-sampling the most factual sources in an effort to increase knowledge and dampen hallucinations”, according to Meta. It, however, adds that since the training corpus was mostly in English, the model may not be suitable for use in other languages.

The AI model and its new version of Llama 2 will be distributed by Microsoft through its Azure cloud service and will run on the Windows operating system (https://www.livemint.com/ai/artificial-intelligence/meta-joins-hands -with-microsoft-for-its-latest-ai-model-llama-2-likely-to-beat-chatgpt-and-bard-11689698965564.html). It’s also available on Amazon Web Services (AWS), Hugging Face and other providers too, Chief Scientist at Meta Yann LeCun tweeted soon after the release.

According to Jim Fan, senior AI scientist at Nvidia, Llama 2 is likely to cost a little over $20 million to train. He believes that Meta has done “an incredible service to the community” by releasing the model with a commercially-friendly license. “AI researchers from big companies were wary of Llama-1 due to licensing issues, but now I think many of them will jump on the ship and contribute their firepower,” Fan tweeted after Llama 2’s release.

Fan also complemented Meta on the human study they did to evaluate its efficiency. Meta’s team did a human study on 4000 prompts to evaluate Llama-2’s helpfulness. “I trust these real human ratings more than academic benchmarks, because they typically capture the “in-the-wild vibe” better,” said Fan. He added, though, that Llama-2 is not as good as GPT-3.5 as yet, mainly because of its weak coding abilities. But he added that “Meta’s team goes above and beyond on AI safety issues. In fact, almost half of the paper is talking about safety guardrails, red-teaming, and evaluations. A round of applause for such responsible efforts!” According to Fan, Llama-2 will dramatically boost multimodal AI and robotics research.

In my earlier column titled ‘Five trends that may change the course of Generative AI models (https://www.livemint.com/mint-top-newsletter/techtalk12052023.html)’, I had spoken about the rise of smaller open- source large language models (LLMs). Big tech companies like Microsoft and Oracle were strongly opposed to open-source technologies but embraced them after realizing that they couldn’t survive without doing so. Open-source language models are demonstrating this once again.

A couple of months back, a Google employee had claimed in a leaked document accessed by Semianalysis that, “Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params (parameters) that we struggle with at $10M (million) and 540B (billion). And they are doing so in weeks, not months.” The employee believes that people will not pay for a restricted model when free, unrestricted alternatives are comparable in quality. He opined that “giant models are slowing us down. In the long run, the best models are the ones which can be iterated upon quickly. We should make small variants more than an afterthought now that we know what is possible in the < 20B parameter regime".

Google may or may not subscribe to this point of view, but the fact is that open-source LLMs have not only come of age but are providing developers with a lighter and much more flexible option.

As an example, Low-Rank Adaptation of Large Language Models (LoRA) claims to have reduced the number of trainable parameters, which has lowered the storage requirement for LLMs adapted to specific tasks and enables efficient task-switching during deployment without inference latency. “LoRA also outperforms several other adaptation methods, including adapter, prefix-tuning, and fine-tuning”. In simple terms, developers can use LoRA to fine-tune LLaMA.

Pythia (from EluetherAI, which itself is likened to an open-source version of OpenAI) comprises 16 LLMs that have been trained on public data and range in size from 70M to 12B parameters.

Databricks Inc. released its LLM called Dolly in March, which it “trained for less than $30 to exhibit ChatGPT-like human interactivity”. A month later, it released Dolly 2.0–a 12B parameter language model based on the EleutherAI Pythia model family “and fine -tuned exclusively on a new, high-quality human-generated instruction following dataset, crowdsourced among Databricks employees”. The company has open-sourced Dolly 2.0 in its entirety, including the training code, dataset and model weights for commercial use, enabling any organization to create, own, and customize powerful LLMs without paying for API access or sharing data with third parties.

Hugging Face’s BigScience Large Open-science Open-access Multilingual Language Model (BLOOM) has 176 billion parameters and is able to generate text in 46 natural languages and 13 programming languages. Researchers can download, run and study BLOOM to investigate the performance and behavior of recently-developed LLMs.

Falcon, a family of LLMs developed by the Technology Innovation Institute in Abu Dhabi and released under the Apache 2.0 license, comprises two models — the Falcon-40B and the smaller Falcon-7B. According to Hugging Face, “The Falcon models still include some curated sources in their training (such as conversational data from Reddit), but significantly less so than has been common for state-of-the-art LLMs like GPT-3 or PaLM. “

The open-source LLM march has only begun.

Catch all the business news, market news, breaking news Events and Latest News Updates on Live Mint. Download Mint News App to get Daily Market Updates.

More
less

Updated: 19 Jul 2023, 02:21 PM IST

Up Next

Meta unleashes Llama, a free AI language model, challenging OpenAI and Google: 5 things to know – Crypto News

Don't Miss

Meta joins hands with Microsoft for its latest AI model Llama 2; likely to beat ChatGPT and Bard – Crypto News

Click to comment

Leave a Reply
Cancel reply

ChatGPT users are mass cancelling OpenAI subscriptions after GPT-5 launch: Here's why

Technology1 week ago

ChatGPT users are mass cancelling OpenAI subscriptions after GPT-5 launch: Here’s why – Crypto News

iPhone 17 series tipped to cost more than iPhone 16: Here's how much it could cost in India and US

Technology7 days ago

iPhone 17 series tipped to cost more than iPhone 16: Here’s how much it could cost in India and US – Crypto News

XRP gains legal clarity in US after Ripple settles SEC case

Cryptocurrency1 week ago

XRP gains legal clarity in US after Ripple settles SEC case – Crypto News

Cryptocurrency1 week ago

DWP Management Secures $200M in XRP Post SEC-Win – Crypto News

others4 days ago

Breaking: USDC Issuer Circle To Launch Arc Blockchain for Stablecoin Payments – Crypto News

others1 week ago

SEC Latest Filing Reveal Ripple Case Win Could Trigger XRP Treasury Boom Like Ethereum – Crypto News

We have put a lot of emphasis on enhancing National Highways’ quality, safety: Nitin Gadkari

Technology1 week ago

Humanoid Robots Still Lack AI Technology, Unitree CEO Says – Crypto News

Ripple Expands Its Stablecoin Payments Infra with $200M Rail Acquisition

De-fi1 week ago

Ripple Expands Its Stablecoin Payments Infra with $200M Rail Acquisition – Crypto News

Harvard Reveals $116 Million Investment in BlackRock Bitcoin ETF

Cryptocurrency1 week ago

Harvard Reveals $116 Million Investment in BlackRock Bitcoin ETF – Crypto News

Sam Altman drops major GPT 5 update, unveils Auto, Fast and Thinking as response modes to choose; check rate limits here

Technology4 days ago

Sam Altman drops major GPT 5 update, unveils Auto, Fast and Thinking as response modes to choose; check rate limits here – Crypto News

Technology3 days ago

99% Approval Odds? How Close Are We to Spot Solana ETF Launch in US? – Crypto News

EUR/USD inches higher to near 1.0450, upside seems limited amid a risk-off mood

others1 week ago

EUR firmer but off overnight highs – Scotiabank – Crypto News

Ripple To Gobble Up Payments Platform Rail for $200,000,000 To Support Transactions via XRP and RLUSD Stablecoin

others1 week ago

Ripple To Gobble Up Payments Platform Rail for $200,000,000 To Support Transactions via XRP and RLUSD Stablecoin – Crypto News

Bitcoin pulls back; AI token sector market cap hits $29.6B

Cryptocurrency1 week ago

BTC hovers at $115K; ETF flows turn negative, short-term holder profitability drops – Crypto News

Business1 week ago

Trump’s World Liberty Financial Targets $1.5B Crypto Vehicle Backed by WLFI Tokens – Crypto News

others1 week ago

United Kingdom CFTC GBP NC Net Positions fell from previous £-12K to £-33.3K – Crypto News

GPT-5 brings four new personalities to ChatGPT: what they do and how to use them — check our step-by-step guide

Technology1 week ago

GPT-5 brings four new personalities to ChatGPT: what they do and how to use them — check our step-by-step guide – Crypto News

Trump to Sign an EO Over Ideological Debanking: Report

Blockchain1 week ago

Trump to Sign an EO Over Ideological Debanking: Report – Crypto News

Business1 week ago

OpenAI Launches GPT-5 Amid Competition From Elon Musk’s Grok – Crypto News

Technology1 week ago

Breaking: XRP Lawsuit Ends as Ripple and SEC File Joint Dismissal – Crypto News

Hulu app to shut down in 2026 as Disney fully merges platform into Disney+

Technology1 week ago

Hulu app to shut down in 2026 as Disney fully merges platform into Disney+ – Crypto News

Technology1 week ago

Trump Removes IRS Commissioner, Pro Crypto Scott Bessent to Serve as Acting Head – Crypto News

Circle Mints About $1 Billion in USDC After Flurry of Treasury Moves

De-fi1 week ago

Circle Mints About $1 Billion in USDC After Flurry of Treasury Moves – Crypto News

others1 week ago

Crypto Adviser Bo Hines Move to AI Role Sparks Concern Over White House Policy Shift – Crypto News

Business6 days ago

Uniswap Proposes DUNI Legal Entity in Wyoming to Boost DAO Governance – Crypto News

Metaplanet Adds 518 BTC, Now Holds $1.85 Billion Worth of Bitcoin

Cryptocurrency5 days ago

Metaplanet Adds 518 BTC, Now Holds $1.85 Billion Worth of Bitcoin – Crypto News

Cryptocurrency5 days ago

Trump Mulls Lawsuit Against Powell Amid Fed Rate-Cut Push – Crypto News

others4 days ago

Breaking: Bitcoin Price Hits New All-Time High As Traders Price In Rate Cut – Crypto News

36% of Indian enterprises started budgeting for Gen AI: E&Y report

Technology3 days ago

Generative AI set to improve banking operations in India by 46%: RBI Report – Crypto News

others1 week ago

Robinhood Lists FLOKI Meme Coin As Market Cap Surpasses $1B – Crypto News

Sei Network Gets MetaMask Support as Buy Signals Emerge for SEI Token, $0.5 on the Horizon

Blockchain1 week ago

Sei Network Gets MetaMask Support as Buy Signals Emerge for SEI Token, $0.5 on the Horizon – Crypto News

US City Handing $2,000,000 to Residents After New Guaranteed Income Plan Approved

others1 week ago

US City Handing $2,000,000 to Residents After New Guaranteed Income Plan Approved – Crypto News

others1 week ago

Michael Saylor Predicts Capital To Flow From Gold to Bitcoin Amid Tariff Rumors – Crypto News

This Ripple (XRP) Metric Flashes Critical Warning Sign

Cryptocurrency1 week ago

This Ripple (XRP) Metric Flashes Critical Warning Sign – Crypto News

$4.5M rug pull? - CrediX disappears while NFTs outpace DeFi

Cryptocurrency1 week ago

$4.5M rug pull? – CrediX disappears while NFTs outpace DeFi – Crypto News

Technology1 week ago

Ripple’s 4-Year Lawsuit Battle with the SEC Ends: Timeline, Turning Points, and the Final Verdict – Crypto News

Infinix Hot 60i 5G with 6,000mAh battery, MediaTek SoC to launch in India soon: Expect price, specs and more

Technology7 days ago

Infinix Hot 60i 5G with 6,000mAh battery, MediaTek SoC to launch in India soon: Expect price, specs and more – Crypto News

Crypto investors hopeful amid new regulatory orders

Cryptocurrency7 days ago

Crypto investors hopeful amid new regulatory orders – Crypto News

Bitcoin Miner MARA Holdings Buys 64% Stake in AI And HPC Firm Exaion

Blockchain6 days ago

Bitcoin Miner MARA Holdings Buys 64% Stake in AI And HPC Firm Exaion – Crypto News

Cryptocurrency5 days ago

Did crypto just have its LLC moment? – Crypto News

Technology5 days ago

A New RWA Project Launches Platform to Challenge VC Dominance in Crypto Startup Funding – Crypto News

others5 days ago

Canary Capital CEO Says XRP ETF Will Surpass ETH ETFs as SEC Confirms Lawsuit End – Crypto News

iPhone 17 Pro’s new camera design could also boost cellular performance, lower latency: Here’s what we know

Technology5 days ago

iPhone 17 Pro’s new camera design could also boost cellular performance, lower latency: Here’s what we know – Crypto News

US Subsidiary of Early Chinese Game Dev Snail Digital Explores Stablecoin

De-fi5 days ago

US Subsidiary of Early Chinese Game Dev Snail Digital Explores Stablecoin – Crypto News

Stripe Valued at $70 Billion Amid Possible Sequoia Deal

Blockchain5 days ago

Stripe Developing Blockchain in Tandem With VC Firm Paradigm – Crypto News

Privacy Is Crucial for Scaling Blockchain Across Financial Services

Blockchain4 days ago

How Blockchain Network Layers Apply Across CFO Tech Stacks – Crypto News

Business4 days ago

Bullish IPO: BLSH Stock Surges Following Crypto Exchange’s Debut On NYSE – Crypto News

Technology3 days ago

Generative AI set to improve banking operations in India by 46%: RBI Report – Crypto News

Cryptocurrency3 days ago

Trump’s Thumzup Media Secures $50M and Coinbase Partnership to Grow XRP Treasury – Crypto News

Business3 days ago

Breaking: U.S. Bitcoin Reserves Worth Up To $20 Billion, Scott Bessent Confirms – Crypto News

Crypto News

Meta’s Llama 2: Why open-source LLMs are the joker in the generative AI pack – Crypto News

Metaverse

Meta’s Llama 2: Why open-source LLMs are the joker in the generative AI pack – Crypto News

You may like

Leave a Reply Cancel reply

Leave a Reply

Trending

Leave a Reply
Cancel reply