

Metaverse
Meta’s Llama 2: Why open-source LLMs are the joker in the generative AI pack – Crypto News
Leslie D’Monte
The Generative AI race just got hotter with Meta releasing the second version of its free open-source large language model, Llama 2, for research and commercial use, thus providing an alternative to the pricy proprietary LLMs sold by OpenAI like ChatGPT Plus and Google Bard while giving a boost to open source LLMs.
Developers began flocking to LLaMA–Meta’s open-source LLM that was released in February (https://ai.meta.com/blog/large-language-model-llama-meta-ai/). Researchers made more than 100,000 requests for Llama 1, according to Meta. LLaMA requires “far less computing power and resources to test new approaches, validate others’ work, and explore new use cases”, according to Meta. Meta made LLaMA available in several sizes (7B, 13B, 33B, and 65B parameters — B stands for billion) and had also shared a LLaMA model card that detailed how it built the model, very unlike the lack of transparency at OpenAI.
The Generative Pre-trained Transformer series (GPT-3), on the other hand, has 175 billion parameters while GPT-4 was rumored to have been launched with 100 trillion parameters, a claim that was dismissed by OpenAI CEO Sam Altman. Foundation models train on a large set of unlabelled data, which makes them ideal for fine-tuning a variety of tasks. For instance, ChatGPT based on GPT 3.5 was trained on 570GB of text data from the internet containing hundreds of billions of words, including text harvested from books, articles, and websites, including social media.
However, according to Meta, smaller models trained on more tokens—pieces of words—are easier to re-train and fine-tune for specific potential product use cases. Meta says it has trained LLaMA 65B and LLaMA 33B on 1.4 trillion tokens. Its smallest model, LLaMA 7B, is trained on one trillion tokens. Like other LLMs, LLaMA takes a sequence of words as input and predicts the next word to generate text recursively. Meta says it chose a text from the 20 languages with the most speakers, focusing on those with Latin and Cyrillic alphabets, to train LLaMa.
The newly-released Llama 2, according to Meta, is a collection of pretrained and fine-tuned LLMs, ranging from 7 billion to 70 billion parameters. Meta has also released Llama 2-Chat, a fine-tuned version of Llama 2 that is optimized for dialogue with the same parameter ranges. Meta claims that these models “have demonstrated their competitiveness with existing open-source chat models, as well as competence that is equivalent to some proprietary models on evaluation sets we examined” but acknowledges that they still lag other models like OpenAI’s GPT-4.
One may note, though, that scraping of data has become a thorny issue and the reason for many class-action suits too. In a 157-page class action lawsuit filed on June 28 in the US District Court, Northern District of California, the plaintiffs alleged that the defendants have used “unlawful and harmful conduct in developing, marketing, and operating their AI products including ChatGPT-3.5 , ChatGPT-4.0, Dall-E, and Vall-E”, which use “stolen private information” from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge, and continue to do so to develop and train the products (https://www.livemint.com/news/india/why-is-musk-angry-and-why-is-openai-being-sued-11688448802931.html).
Meta says Llama 2 has been trained on a mix of data from publicly-available sources, which does not include data from Meta’s products or services. The company adds that it has made an effort to remove data from certain sites known to contain a high volume of personal information about private individuals. Llama 2 was trained on 2 trillion tokens of data “as this provides a good performance–cost trade-off, up-sampling the most factual sources in an effort to increase knowledge and dampen hallucinations”, according to Meta. It, however, adds that since the training corpus was mostly in English, the model may not be suitable for use in other languages.
The AI model and its new version of Llama 2 will be distributed by Microsoft through its Azure cloud service and will run on the Windows operating system (https://www.livemint.com/ai/artificial-intelligence/meta-joins-hands -with-microsoft-for-its-latest-ai-model-llama-2-likely-to-beat-chatgpt-and-bard-11689698965564.html). It’s also available on Amazon Web Services (AWS), Hugging Face and other providers too, Chief Scientist at Meta Yann LeCun tweeted soon after the release.
According to Jim Fan, senior AI scientist at Nvidia, Llama 2 is likely to cost a little over $20 million to train. He believes that Meta has done “an incredible service to the community” by releasing the model with a commercially-friendly license. “AI researchers from big companies were wary of Llama-1 due to licensing issues, but now I think many of them will jump on the ship and contribute their firepower,” Fan tweeted after Llama 2’s release.
Fan also complemented Meta on the human study they did to evaluate its efficiency. Meta’s team did a human study on 4000 prompts to evaluate Llama-2’s helpfulness. “I trust these real human ratings more than academic benchmarks, because they typically capture the “in-the-wild vibe” better,” said Fan. He added, though, that Llama-2 is not as good as GPT-3.5 as yet, mainly because of its weak coding abilities. But he added that “Meta’s team goes above and beyond on AI safety issues. In fact, almost half of the paper is talking about safety guardrails, red-teaming, and evaluations. A round of applause for such responsible efforts!” According to Fan, Llama-2 will dramatically boost multimodal AI and robotics research.
In my earlier column titled ‘Five trends that may change the course of Generative AI models (https://www.livemint.com/mint-top-newsletter/techtalk12052023.html)’, I had spoken about the rise of smaller open- source large language models (LLMs). Big tech companies like Microsoft and Oracle were strongly opposed to open-source technologies but embraced them after realizing that they couldn’t survive without doing so. Open-source language models are demonstrating this once again.
A couple of months back, a Google employee had claimed in a leaked document accessed by Semianalysis that, “Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params (parameters) that we struggle with at $10M (million) and 540B (billion). And they are doing so in weeks, not months.” The employee believes that people will not pay for a restricted model when free, unrestricted alternatives are comparable in quality. He opined that “giant models are slowing us down. In the long run, the best models are the ones which can be iterated upon quickly. We should make small variants more than an afterthought now that we know what is possible in the < 20B parameter regime".
Google may or may not subscribe to this point of view, but the fact is that open-source LLMs have not only come of age but are providing developers with a lighter and much more flexible option.
As an example, Low-Rank Adaptation of Large Language Models (LoRA) claims to have reduced the number of trainable parameters, which has lowered the storage requirement for LLMs adapted to specific tasks and enables efficient task-switching during deployment without inference latency. “LoRA also outperforms several other adaptation methods, including adapter, prefix-tuning, and fine-tuning”. In simple terms, developers can use LoRA to fine-tune LLaMA.
Pythia (from EluetherAI, which itself is likened to an open-source version of OpenAI) comprises 16 LLMs that have been trained on public data and range in size from 70M to 12B parameters.
Databricks Inc. released its LLM called Dolly in March, which it “trained for less than $30 to exhibit ChatGPT-like human interactivity”. A month later, it released Dolly 2.0–a 12B parameter language model based on the EleutherAI Pythia model family “and fine -tuned exclusively on a new, high-quality human-generated instruction following dataset, crowdsourced among Databricks employees”. The company has open-sourced Dolly 2.0 in its entirety, including the training code, dataset and model weights for commercial use, enabling any organization to create, own, and customize powerful LLMs without paying for API access or sharing data with third parties.
Hugging Face’s BigScience Large Open-science Open-access Multilingual Language Model (BLOOM) has 176 billion parameters and is able to generate text in 46 natural languages and 13 programming languages. Researchers can download, run and study BLOOM to investigate the performance and behavior of recently-developed LLMs.
Falcon, a family of LLMs developed by the Technology Innovation Institute in Abu Dhabi and released under the Apache 2.0 license, comprises two models — the Falcon-40B and the smaller Falcon-7B. According to Hugging Face, “The Falcon models still include some curated sources in their training (such as conversational data from Reddit), but significantly less so than has been common for state-of-the-art LLMs like GPT-3 or PaLM. “
The open-source LLM march has only begun.
Catch all the business news, market news, breaking news Events and Latest News Updates on Live Mint. Download Mint News App to get Daily Market Updates.
Updated: 19 Jul 2023, 02:21 PM IST
-
Technology1 week ago
ChatGPT users are mass cancelling OpenAI subscriptions after GPT-5 launch: Here’s why – Crypto News
-
Technology7 days ago
iPhone 17 series tipped to cost more than iPhone 16: Here’s how much it could cost in India and US – Crypto News
-
Cryptocurrency1 week ago
XRP gains legal clarity in US after Ripple settles SEC case – Crypto News
-
Cryptocurrency1 week ago
DWP Management Secures $200M in XRP Post SEC-Win – Crypto News
-
others4 days ago
Breaking: USDC Issuer Circle To Launch Arc Blockchain for Stablecoin Payments – Crypto News
-
others1 week ago
SEC Latest Filing Reveal Ripple Case Win Could Trigger XRP Treasury Boom Like Ethereum – Crypto News
-
Technology1 week ago
Humanoid Robots Still Lack AI Technology, Unitree CEO Says – Crypto News
-
De-fi1 week ago
Ripple Expands Its Stablecoin Payments Infra with $200M Rail Acquisition – Crypto News
-
Cryptocurrency1 week ago
Harvard Reveals $116 Million Investment in BlackRock Bitcoin ETF – Crypto News
-
Technology4 days ago
Sam Altman drops major GPT 5 update, unveils Auto, Fast and Thinking as response modes to choose; check rate limits here – Crypto News
-
Technology3 days ago
99% Approval Odds? How Close Are We to Spot Solana ETF Launch in US? – Crypto News
-
others1 week ago
EUR firmer but off overnight highs – Scotiabank – Crypto News
-
others1 week ago
Ripple To Gobble Up Payments Platform Rail for $200,000,000 To Support Transactions via XRP and RLUSD Stablecoin – Crypto News
-
Cryptocurrency1 week ago
BTC hovers at $115K; ETF flows turn negative, short-term holder profitability drops – Crypto News
-
Business1 week ago
Trump’s World Liberty Financial Targets $1.5B Crypto Vehicle Backed by WLFI Tokens – Crypto News
-
others1 week ago
United Kingdom CFTC GBP NC Net Positions fell from previous £-12K to £-33.3K – Crypto News
-
Technology1 week ago
GPT-5 brings four new personalities to ChatGPT: what they do and how to use them — check our step-by-step guide – Crypto News
-
Blockchain1 week ago
Trump to Sign an EO Over Ideological Debanking: Report – Crypto News
-
Business1 week ago
OpenAI Launches GPT-5 Amid Competition From Elon Musk’s Grok – Crypto News
-
Technology1 week ago
Breaking: XRP Lawsuit Ends as Ripple and SEC File Joint Dismissal – Crypto News
-
Technology1 week ago
Hulu app to shut down in 2026 as Disney fully merges platform into Disney+ – Crypto News
-
Technology1 week ago
Trump Removes IRS Commissioner, Pro Crypto Scott Bessent to Serve as Acting Head – Crypto News
-
De-fi1 week ago
Circle Mints About $1 Billion in USDC After Flurry of Treasury Moves – Crypto News
-
others1 week ago
Crypto Adviser Bo Hines Move to AI Role Sparks Concern Over White House Policy Shift – Crypto News
-
Business6 days ago
Uniswap Proposes DUNI Legal Entity in Wyoming to Boost DAO Governance – Crypto News
-
Cryptocurrency5 days ago
Metaplanet Adds 518 BTC, Now Holds $1.85 Billion Worth of Bitcoin – Crypto News
-
Cryptocurrency5 days ago
Trump Mulls Lawsuit Against Powell Amid Fed Rate-Cut Push – Crypto News
-
others4 days ago
Breaking: Bitcoin Price Hits New All-Time High As Traders Price In Rate Cut – Crypto News
-
Technology3 days ago
Generative AI set to improve banking operations in India by 46%: RBI Report – Crypto News
-
others1 week ago
Robinhood Lists FLOKI Meme Coin As Market Cap Surpasses $1B – Crypto News
-
Blockchain1 week ago
Sei Network Gets MetaMask Support as Buy Signals Emerge for SEI Token, $0.5 on the Horizon – Crypto News
-
others1 week ago
US City Handing $2,000,000 to Residents After New Guaranteed Income Plan Approved – Crypto News
-
others1 week ago
Michael Saylor Predicts Capital To Flow From Gold to Bitcoin Amid Tariff Rumors – Crypto News
-
Cryptocurrency1 week ago
This Ripple (XRP) Metric Flashes Critical Warning Sign – Crypto News
-
Cryptocurrency1 week ago
$4.5M rug pull? – CrediX disappears while NFTs outpace DeFi – Crypto News
-
Technology1 week ago
Ripple’s 4-Year Lawsuit Battle with the SEC Ends: Timeline, Turning Points, and the Final Verdict – Crypto News
-
Technology7 days ago
Infinix Hot 60i 5G with 6,000mAh battery, MediaTek SoC to launch in India soon: Expect price, specs and more – Crypto News
-
Cryptocurrency7 days ago
Crypto investors hopeful amid new regulatory orders – Crypto News
-
Blockchain6 days ago
Bitcoin Miner MARA Holdings Buys 64% Stake in AI And HPC Firm Exaion – Crypto News
-
Cryptocurrency5 days ago
Did crypto just have its LLC moment? – Crypto News
-
Technology5 days ago
A New RWA Project Launches Platform to Challenge VC Dominance in Crypto Startup Funding – Crypto News
-
others5 days ago
Canary Capital CEO Says XRP ETF Will Surpass ETH ETFs as SEC Confirms Lawsuit End – Crypto News
-
Technology5 days ago
iPhone 17 Pro’s new camera design could also boost cellular performance, lower latency: Here’s what we know – Crypto News
-
De-fi5 days ago
US Subsidiary of Early Chinese Game Dev Snail Digital Explores Stablecoin – Crypto News
-
Blockchain5 days ago
Stripe Developing Blockchain in Tandem With VC Firm Paradigm – Crypto News
-
Blockchain4 days ago
How Blockchain Network Layers Apply Across CFO Tech Stacks – Crypto News
-
Business4 days ago
Bullish IPO: BLSH Stock Surges Following Crypto Exchange’s Debut On NYSE – Crypto News
-
Technology3 days ago
Generative AI set to improve banking operations in India by 46%: RBI Report – Crypto News
-
Cryptocurrency3 days ago
Trump’s Thumzup Media Secures $50M and Coinbase Partnership to Grow XRP Treasury – Crypto News
-
Business3 days ago
Breaking: U.S. Bitcoin Reserves Worth Up To $20 Billion, Scott Bessent Confirms – Crypto News