

Metaverse
Meta’s Llama 2: Why open-source LLMs are the joker in the generative AI pack – Crypto News
Leslie D’Monte
The Generative AI race just got hotter with Meta releasing the second version of its free open-source large language model, Llama 2, for research and commercial use, thus providing an alternative to the pricy proprietary LLMs sold by OpenAI like ChatGPT Plus and Google Bard while giving a boost to open source LLMs.
Developers began flocking to LLaMA–Meta’s open-source LLM that was released in February (https://ai.meta.com/blog/large-language-model-llama-meta-ai/). Researchers made more than 100,000 requests for Llama 1, according to Meta. LLaMA requires “far less computing power and resources to test new approaches, validate others’ work, and explore new use cases”, according to Meta. Meta made LLaMA available in several sizes (7B, 13B, 33B, and 65B parameters — B stands for billion) and had also shared a LLaMA model card that detailed how it built the model, very unlike the lack of transparency at OpenAI.
The Generative Pre-trained Transformer series (GPT-3), on the other hand, has 175 billion parameters while GPT-4 was rumored to have been launched with 100 trillion parameters, a claim that was dismissed by OpenAI CEO Sam Altman. Foundation models train on a large set of unlabelled data, which makes them ideal for fine-tuning a variety of tasks. For instance, ChatGPT based on GPT 3.5 was trained on 570GB of text data from the internet containing hundreds of billions of words, including text harvested from books, articles, and websites, including social media.
However, according to Meta, smaller models trained on more tokens—pieces of words—are easier to re-train and fine-tune for specific potential product use cases. Meta says it has trained LLaMA 65B and LLaMA 33B on 1.4 trillion tokens. Its smallest model, LLaMA 7B, is trained on one trillion tokens. Like other LLMs, LLaMA takes a sequence of words as input and predicts the next word to generate text recursively. Meta says it chose a text from the 20 languages with the most speakers, focusing on those with Latin and Cyrillic alphabets, to train LLaMa.
The newly-released Llama 2, according to Meta, is a collection of pretrained and fine-tuned LLMs, ranging from 7 billion to 70 billion parameters. Meta has also released Llama 2-Chat, a fine-tuned version of Llama 2 that is optimized for dialogue with the same parameter ranges. Meta claims that these models “have demonstrated their competitiveness with existing open-source chat models, as well as competence that is equivalent to some proprietary models on evaluation sets we examined” but acknowledges that they still lag other models like OpenAI’s GPT-4.
One may note, though, that scraping of data has become a thorny issue and the reason for many class-action suits too. In a 157-page class action lawsuit filed on June 28 in the US District Court, Northern District of California, the plaintiffs alleged that the defendants have used “unlawful and harmful conduct in developing, marketing, and operating their AI products including ChatGPT-3.5 , ChatGPT-4.0, Dall-E, and Vall-E”, which use “stolen private information” from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge, and continue to do so to develop and train the products (https://www.livemint.com/news/india/why-is-musk-angry-and-why-is-openai-being-sued-11688448802931.html).
Meta says Llama 2 has been trained on a mix of data from publicly-available sources, which does not include data from Meta’s products or services. The company adds that it has made an effort to remove data from certain sites known to contain a high volume of personal information about private individuals. Llama 2 was trained on 2 trillion tokens of data “as this provides a good performance–cost trade-off, up-sampling the most factual sources in an effort to increase knowledge and dampen hallucinations”, according to Meta. It, however, adds that since the training corpus was mostly in English, the model may not be suitable for use in other languages.
The AI model and its new version of Llama 2 will be distributed by Microsoft through its Azure cloud service and will run on the Windows operating system (https://www.livemint.com/ai/artificial-intelligence/meta-joins-hands -with-microsoft-for-its-latest-ai-model-llama-2-likely-to-beat-chatgpt-and-bard-11689698965564.html). It’s also available on Amazon Web Services (AWS), Hugging Face and other providers too, Chief Scientist at Meta Yann LeCun tweeted soon after the release.
According to Jim Fan, senior AI scientist at Nvidia, Llama 2 is likely to cost a little over $20 million to train. He believes that Meta has done “an incredible service to the community” by releasing the model with a commercially-friendly license. “AI researchers from big companies were wary of Llama-1 due to licensing issues, but now I think many of them will jump on the ship and contribute their firepower,” Fan tweeted after Llama 2’s release.
Fan also complemented Meta on the human study they did to evaluate its efficiency. Meta’s team did a human study on 4000 prompts to evaluate Llama-2’s helpfulness. “I trust these real human ratings more than academic benchmarks, because they typically capture the “in-the-wild vibe” better,” said Fan. He added, though, that Llama-2 is not as good as GPT-3.5 as yet, mainly because of its weak coding abilities. But he added that “Meta’s team goes above and beyond on AI safety issues. In fact, almost half of the paper is talking about safety guardrails, red-teaming, and evaluations. A round of applause for such responsible efforts!” According to Fan, Llama-2 will dramatically boost multimodal AI and robotics research.
In my earlier column titled ‘Five trends that may change the course of Generative AI models (https://www.livemint.com/mint-top-newsletter/techtalk12052023.html)’, I had spoken about the rise of smaller open- source large language models (LLMs). Big tech companies like Microsoft and Oracle were strongly opposed to open-source technologies but embraced them after realizing that they couldn’t survive without doing so. Open-source language models are demonstrating this once again.
A couple of months back, a Google employee had claimed in a leaked document accessed by Semianalysis that, “Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params (parameters) that we struggle with at $10M (million) and 540B (billion). And they are doing so in weeks, not months.” The employee believes that people will not pay for a restricted model when free, unrestricted alternatives are comparable in quality. He opined that “giant models are slowing us down. In the long run, the best models are the ones which can be iterated upon quickly. We should make small variants more than an afterthought now that we know what is possible in the < 20B parameter regime".
Google may or may not subscribe to this point of view, but the fact is that open-source LLMs have not only come of age but are providing developers with a lighter and much more flexible option.
As an example, Low-Rank Adaptation of Large Language Models (LoRA) claims to have reduced the number of trainable parameters, which has lowered the storage requirement for LLMs adapted to specific tasks and enables efficient task-switching during deployment without inference latency. “LoRA also outperforms several other adaptation methods, including adapter, prefix-tuning, and fine-tuning”. In simple terms, developers can use LoRA to fine-tune LLaMA.
Pythia (from EluetherAI, which itself is likened to an open-source version of OpenAI) comprises 16 LLMs that have been trained on public data and range in size from 70M to 12B parameters.
Databricks Inc. released its LLM called Dolly in March, which it “trained for less than $30 to exhibit ChatGPT-like human interactivity”. A month later, it released Dolly 2.0–a 12B parameter language model based on the EleutherAI Pythia model family “and fine -tuned exclusively on a new, high-quality human-generated instruction following dataset, crowdsourced among Databricks employees”. The company has open-sourced Dolly 2.0 in its entirety, including the training code, dataset and model weights for commercial use, enabling any organization to create, own, and customize powerful LLMs without paying for API access or sharing data with third parties.
Hugging Face’s BigScience Large Open-science Open-access Multilingual Language Model (BLOOM) has 176 billion parameters and is able to generate text in 46 natural languages and 13 programming languages. Researchers can download, run and study BLOOM to investigate the performance and behavior of recently-developed LLMs.
Falcon, a family of LLMs developed by the Technology Innovation Institute in Abu Dhabi and released under the Apache 2.0 license, comprises two models — the Falcon-40B and the smaller Falcon-7B. According to Hugging Face, “The Falcon models still include some curated sources in their training (such as conversational data from Reddit), but significantly less so than has been common for state-of-the-art LLMs like GPT-3 or PaLM. “
The open-source LLM march has only begun.
Catch all the business news, market news, breaking news Events and Latest News Updates on Live Mint. Download Mint News App to get Daily Market Updates.
Updated: 19 Jul 2023, 02:21 PM IST
-
Blockchain1 week ago
Change In US Crypto Laws May Affect Charges In Do Kwon’s Criminal Case – Crypto News
-
others1 week ago
Gold retreats while Fed Powell and President Trump clash over interest rates – Crypto News
-
others1 week ago
EUR/JPY steadies near 169.00 as traders await the next catalyst – Crypto News
-
Cryptocurrency1 week ago
Friday charts: Retail is one-upping Wall Street – Crypto News
-
Technology1 week ago
Top 10 air coolers for monsoon: Handpicked products for effective cooling from trusted brands – Crypto News
-
Cryptocurrency1 week ago
SHIB Price Prediction for June 26 – Crypto News
-
Blockchain1 week ago
Bitcoin Price Could Rally To $110,000 ATH As These Macroeconomic Factors Align – Crypto News
-
Technology1 week ago
Too many messages to read? WhatsApp launches AI-enabled summary feature for unread chats. Here’s how it works – Crypto News
-
others1 week ago
USD/INR drops to two-week low as Rupee gains on weak US Dollar – Crypto News
-
Blockchain1 week ago
Breakout To $2,800 Or Crash To $2,000? – Crypto News
-
Technology1 week ago
Microsoft launches Mu AI model for smart local tasks on Windows PCs – Crypto News
-
Cryptocurrency1 week ago
Wormhole price jumps 12% amid Ripple’s XRPL integration – Crypto News
-
Cryptocurrency1 week ago
Permissionless IV, Day 3 takeaways – Crypto News
-
Blockchain1 week ago
Deaton Says Ripple IPO Could Trigger $100B Valuation, How High Will The XRP Price Be? – Crypto News
-
Cryptocurrency1 week ago
Vodafone Share Price Tests 78p Ahead of July Earnings, Is a Breakout Imminent? – Crypto News
-
Cryptocurrency1 week ago
TRON price forecast as USDT supply surpasses $80 billion – Crypto News
-
others6 days ago
Japan CFTC JPY NC Net Positions increased to ¥132.3K from previous ¥130.9K – Crypto News
-
Technology1 week ago
Turkey plans stricter crypto rules to fight money laundering – Crypto News
-
others1 week ago
Winnebago Industries (WGO) tops Q3 earnings estimates – Crypto News
-
Cryptocurrency1 week ago
US Housing Chief Orders Fannie Mae, Freddie Mac to Prepare for Crypto Assessment in Mortgages – Crypto News
-
De-fi1 week ago
Barclays to Ban Crypto Purchases via Credit Card – Crypto News
-
Cryptocurrency1 week ago
Is LTC ready for a breakout after testing the $75 low? – Crypto News
-
others1 week ago
AI-Focused Layer-1 Blockchain Altcoin SAHARA Flames Out Following New Binance Listing – Crypto News
-
De-fi1 week ago
Sei Soars 70% as Wallet Growth and On-Chain Activity Hit New Highs – Crypto News
-
De-fi1 week ago
Mastercard, Chainlink Integration Expands Web3 Access as Firms Push TradFi and DeFi Convergence – Crypto News
-
Cryptocurrency1 week ago
Komodo tanks 25% after Binance announces delisting – Crypto News
-
Technology1 week ago
It isn’t a dream. Tech stocks are breaking out—and the gains can keep coming. – Crypto News
-
De-fi1 week ago
Aptos DEX Activity Hits Record, Nears $200 Million in Daily Volume – Crypto News
-
De-fi1 week ago
U.S Judge Denies Ripple-SEC Request to Lift Injunction and Reduce $125 Million Fine – Crypto News
-
others1 week ago
USD/INR drops to two-week low as Rupee gains on weak US Dollar – Crypto News
-
Metaverse1 week ago
Gemini CLI debuts as Google’s open-source AI coding assistant: How it works – Crypto News
-
Cryptocurrency1 week ago
XRP crashes 12.5% in TVL, ETF delay and war fears trigger selloff – Crypto News
-
others1 week ago
Employee at Billion Dollar Bank Embezzles $44,000 From Customer Accounts Before Being Banned From Industry – Crypto News
-
Cryptocurrency1 week ago
Risk-on in Solana, even as ceasefire cools tensions – Crypto News
-
others1 week ago
Sweden Trade Balance (MoM) fell from previous 6.6B to 3.9B in May – Crypto News
-
Metaverse1 week ago
AI valuations are verging on the unhinged – Crypto News
-
Cryptocurrency1 week ago
FUNToken’s Telegram Bot Gets a Major Upgrade With High-Stakes Spins – Crypto News
-
Technology1 week ago
Xiaomi Mix Flip 2, a clamshell foldable smartphone, launched: Price, features and more – Crypto News
-
De-fi1 week ago
Russia’s Central Bank Pushes CBDC Launch to 2026 – Crypto News
-
Blockchain1 week ago
Bitcoin Crash To $100K Likely Due To Tariffs, War And Weather – Crypto News
-
others1 week ago
United States 4-Week Bill Auction down to 4% from previous 4.06% – Crypto News
-
Cryptocurrency1 week ago
Will Huge $15 Billion Bitcoin Options Expiry Impact Crypto Markets? – Crypto News
-
Technology1 week ago
Google to stop Chrome updates for older Android versions: Should you be worried? – Crypto News
-
Blockchain1 week ago
The Smarter Web Company Raises $56M After 196 Bitcoin Buy – Crypto News
-
Technology1 week ago
Is Google preparing to bring Pixel Call Screening to India after 7 years? Know what report suggests – Crypto News
-
Cryptocurrency1 week ago
Epic $100 Trillion Prediction Issued by Jim Cramer: What’s Going On? – Crypto News
-
others1 week ago
Asian Crime Syndicates Tap Chase, Bank of America, Wells Fargo and Other Lenders To Launder Billions Siphoned in Pig Butchering Scams: Report – Crypto News
-
De-fi1 week ago
Bitcoin Consolidates as U.S. Inflation Ticks Higher – Crypto News
-
Technology6 days ago
Best 5G phone under ₹10,000 in June 2025: Lava Storm Play, Samsung M06 and more – Crypto News
-
Metaverse1 week ago
Gemini CLI debuts as Google’s open-source AI coding assistant: How it works – Crypto News