Metaverse
Meta’s Llama 2: Why open-source LLMs are the joker in the generative AI pack – Crypto News
Leslie D’Monte
The Generative AI race just got hotter with Meta releasing the second version of its free open-source large language model, Llama 2, for research and commercial use, thus providing an alternative to the pricy proprietary LLMs sold by OpenAI like ChatGPT Plus and Google Bard while giving a boost to open source LLMs.
Developers began flocking to LLaMA–Meta’s open-source LLM that was released in February (https://ai.meta.com/blog/large-language-model-llama-meta-ai/). Researchers made more than 100,000 requests for Llama 1, according to Meta. LLaMA requires “far less computing power and resources to test new approaches, validate others’ work, and explore new use cases”, according to Meta. Meta made LLaMA available in several sizes (7B, 13B, 33B, and 65B parameters — B stands for billion) and had also shared a LLaMA model card that detailed how it built the model, very unlike the lack of transparency at OpenAI.
The Generative Pre-trained Transformer series (GPT-3), on the other hand, has 175 billion parameters while GPT-4 was rumored to have been launched with 100 trillion parameters, a claim that was dismissed by OpenAI CEO Sam Altman. Foundation models train on a large set of unlabelled data, which makes them ideal for fine-tuning a variety of tasks. For instance, ChatGPT based on GPT 3.5 was trained on 570GB of text data from the internet containing hundreds of billions of words, including text harvested from books, articles, and websites, including social media.
However, according to Meta, smaller models trained on more tokens—pieces of words—are easier to re-train and fine-tune for specific potential product use cases. Meta says it has trained LLaMA 65B and LLaMA 33B on 1.4 trillion tokens. Its smallest model, LLaMA 7B, is trained on one trillion tokens. Like other LLMs, LLaMA takes a sequence of words as input and predicts the next word to generate text recursively. Meta says it chose a text from the 20 languages with the most speakers, focusing on those with Latin and Cyrillic alphabets, to train LLaMa.
The newly-released Llama 2, according to Meta, is a collection of pretrained and fine-tuned LLMs, ranging from 7 billion to 70 billion parameters. Meta has also released Llama 2-Chat, a fine-tuned version of Llama 2 that is optimized for dialogue with the same parameter ranges. Meta claims that these models “have demonstrated their competitiveness with existing open-source chat models, as well as competence that is equivalent to some proprietary models on evaluation sets we examined” but acknowledges that they still lag other models like OpenAI’s GPT-4.
One may note, though, that scraping of data has become a thorny issue and the reason for many class-action suits too. In a 157-page class action lawsuit filed on June 28 in the US District Court, Northern District of California, the plaintiffs alleged that the defendants have used “unlawful and harmful conduct in developing, marketing, and operating their AI products including ChatGPT-3.5 , ChatGPT-4.0, Dall-E, and Vall-E”, which use “stolen private information” from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge, and continue to do so to develop and train the products (https://www.livemint.com/news/india/why-is-musk-angry-and-why-is-openai-being-sued-11688448802931.html).
Meta says Llama 2 has been trained on a mix of data from publicly-available sources, which does not include data from Meta’s products or services. The company adds that it has made an effort to remove data from certain sites known to contain a high volume of personal information about private individuals. Llama 2 was trained on 2 trillion tokens of data “as this provides a good performance–cost trade-off, up-sampling the most factual sources in an effort to increase knowledge and dampen hallucinations”, according to Meta. It, however, adds that since the training corpus was mostly in English, the model may not be suitable for use in other languages.
The AI model and its new version of Llama 2 will be distributed by Microsoft through its Azure cloud service and will run on the Windows operating system (https://www.livemint.com/ai/artificial-intelligence/meta-joins-hands -with-microsoft-for-its-latest-ai-model-llama-2-likely-to-beat-chatgpt-and-bard-11689698965564.html). It’s also available on Amazon Web Services (AWS), Hugging Face and other providers too, Chief Scientist at Meta Yann LeCun tweeted soon after the release.
According to Jim Fan, senior AI scientist at Nvidia, Llama 2 is likely to cost a little over $20 million to train. He believes that Meta has done “an incredible service to the community” by releasing the model with a commercially-friendly license. “AI researchers from big companies were wary of Llama-1 due to licensing issues, but now I think many of them will jump on the ship and contribute their firepower,” Fan tweeted after Llama 2’s release.
Fan also complemented Meta on the human study they did to evaluate its efficiency. Meta’s team did a human study on 4000 prompts to evaluate Llama-2’s helpfulness. “I trust these real human ratings more than academic benchmarks, because they typically capture the “in-the-wild vibe” better,” said Fan. He added, though, that Llama-2 is not as good as GPT-3.5 as yet, mainly because of its weak coding abilities. But he added that “Meta’s team goes above and beyond on AI safety issues. In fact, almost half of the paper is talking about safety guardrails, red-teaming, and evaluations. A round of applause for such responsible efforts!” According to Fan, Llama-2 will dramatically boost multimodal AI and robotics research.
In my earlier column titled ‘Five trends that may change the course of Generative AI models (https://www.livemint.com/mint-top-newsletter/techtalk12052023.html)’, I had spoken about the rise of smaller open- source large language models (LLMs). Big tech companies like Microsoft and Oracle were strongly opposed to open-source technologies but embraced them after realizing that they couldn’t survive without doing so. Open-source language models are demonstrating this once again.
A couple of months back, a Google employee had claimed in a leaked document accessed by Semianalysis that, “Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params (parameters) that we struggle with at $10M (million) and 540B (billion). And they are doing so in weeks, not months.” The employee believes that people will not pay for a restricted model when free, unrestricted alternatives are comparable in quality. He opined that “giant models are slowing us down. In the long run, the best models are the ones which can be iterated upon quickly. We should make small variants more than an afterthought now that we know what is possible in the < 20B parameter regime".
Google may or may not subscribe to this point of view, but the fact is that open-source LLMs have not only come of age but are providing developers with a lighter and much more flexible option.
As an example, Low-Rank Adaptation of Large Language Models (LoRA) claims to have reduced the number of trainable parameters, which has lowered the storage requirement for LLMs adapted to specific tasks and enables efficient task-switching during deployment without inference latency. “LoRA also outperforms several other adaptation methods, including adapter, prefix-tuning, and fine-tuning”. In simple terms, developers can use LoRA to fine-tune LLaMA.
Pythia (from EluetherAI, which itself is likened to an open-source version of OpenAI) comprises 16 LLMs that have been trained on public data and range in size from 70M to 12B parameters.
Databricks Inc. released its LLM called Dolly in March, which it “trained for less than $30 to exhibit ChatGPT-like human interactivity”. A month later, it released Dolly 2.0–a 12B parameter language model based on the EleutherAI Pythia model family “and fine -tuned exclusively on a new, high-quality human-generated instruction following dataset, crowdsourced among Databricks employees”. The company has open-sourced Dolly 2.0 in its entirety, including the training code, dataset and model weights for commercial use, enabling any organization to create, own, and customize powerful LLMs without paying for API access or sharing data with third parties.
Hugging Face’s BigScience Large Open-science Open-access Multilingual Language Model (BLOOM) has 176 billion parameters and is able to generate text in 46 natural languages and 13 programming languages. Researchers can download, run and study BLOOM to investigate the performance and behavior of recently-developed LLMs.
Falcon, a family of LLMs developed by the Technology Innovation Institute in Abu Dhabi and released under the Apache 2.0 license, comprises two models — the Falcon-40B and the smaller Falcon-7B. According to Hugging Face, “The Falcon models still include some curated sources in their training (such as conversational data from Reddit), but significantly less so than has been common for state-of-the-art LLMs like GPT-3 or PaLM. “
The open-source LLM march has only begun.
Catch all the business news, market news, breaking news Events and Latest News Updates on Live Mint. Download Mint News App to get Daily Market Updates.
Updated: 19 Jul 2023, 02:21 PM IST
-
others1 week ago
JPY soft and underperforming G10 in quiet trade – Scotiabank – Crypto News
-
Blockchain1 week agoXRP Price Gains Traction — Buyers Pile In Ahead Of Key Technical Breakout – Crypto News
-
Technology1 week agoSam Altman says OpenAI is developing a ‘legitimate AI researcher’ by 2028 that can discover new science on its own – Crypto News
-
De-fi7 days agoBittensor Rallies Ahead of First TAO Halving – Crypto News
-
De-fi1 week agoNearly Half of US Retail Crypto Holders Haven’t Earned Yield: MoreMarkets – Crypto News
-
Technology1 week agoMicrosoft ‘tricked users into pricier AI-linked 365 plans,’ says Australian watchdog; files lawsuit – Crypto News
-
De-fi1 week agoAI Sector Rebounds as Agent Payment Systems Gain Traction – Crypto News
-
Blockchain1 week agoBig Iran Bank Goes Bankrupt, Affecting 42 Million Customers – Crypto News
-
Business1 week ago
Crypto Market Rally: BTC, ETH, SOL, DOGE Jump 3-7% as US China Trade Talks Progress – Crypto News
-
Cryptocurrency1 week agoBitcoin Accumulation Patterns Show Late-Stage Cycle Maturity, Not Definite End: CryptoQuant – Crypto News
-
Technology1 week ago
Ethereum Supercycle Strengthens as SharpLink Gaming Withdraws $78.3M in ETH – Crypto News
-
Blockchain1 week agoIBM Set to Launch Platform for Managing Digital Assets – Crypto News
-
others1 week agoGBP/USD floats around 1.3320 as softer US CPI reinforces Fed cut bets – Crypto News
-
Cryptocurrency1 week agoWestern Union eyes stablecoin rails in pursuit of a ‘super app’ vision – Crypto News
-
De-fi1 week agoNearly Half of US Retail Crypto Holders Haven’t Earned Yield: MoreMarkets – Crypto News
-
others1 week ago
Platinum price recovers from setback – Commerzbank – Crypto News
-
others1 week ago
Indian Court Declares XRP as Property in WazirX Hack Case – Crypto News
-
Blockchain1 week agoSolana Eyes $210 Before Its Next Major Move—Uptrend Or Fakeout Ahead? – Crypto News
-
De-fi1 week agoREP Jumps 50% in a Week as Dev Gets Community Support for Augur Fork – Crypto News
-
De-fi6 days agoBitcoin Dips Under $110,000 After Fed Cuts Rates – Crypto News
-
Cryptocurrency1 week agoUSDJPY Forecast: The Dollar’s Winning Streak Why New Highs Could Be At Hand – Crypto News
-
Technology1 week agoBenQ MA270U review: A 4K monitor that actually gets MacBook users right – Crypto News
-
De-fi1 week agoNearly Half of US Retail Crypto Holders Haven’t Earned Yield: MoreMarkets – Crypto News
-
Blockchain1 week agoXRP/BTC Retests 6-Year Breakout Trendline, Analyst Calls For Decoupling – Crypto News
-
others1 week ago
Is Changpeng “CZ” Zhao Returning To Binance? Probably Not – Crypto News
-
Business1 week ago
Crypto ETFs Attract $1B in Fresh Capital Ahead of Expected Fed Rate Cut This Week – Crypto News
-
De-fi1 week agoTokenized Nasdaq Futures Enter Top 10 by Volume on Hyperliquid – Crypto News
-
Cryptocurrency1 week agoNEAR’s inflation reduction vote fails pass threshold, but it may still be implemented – Crypto News
-
Technology1 week agoSurvival instinct? New study says some leading AI models won’t let themselves be shut down – Crypto News
-
De-fi1 week agoMetaMask Fuels Airdrop Buzz With Token Claim Domain Registration – Crypto News
-
Cryptocurrency1 week agoGold Price Forecast 2025, 2030, 2040 & Investment Outlook – Crypto News
-
Cryptocurrency1 week agoCitigroup and Coinbase partner to expand digital-asset payment capabilities – Crypto News
-
Cryptocurrency1 week agoInside Bitwise’s milestone solana ETF launch – Crypto News
-
Cryptocurrency1 week agoWhy Is Pi Network’s (PI) Price Up by Double Digits Today? – Crypto News
-
others1 week ago
Can ASTER Price Rebound 50% as Whale Activity and Bullish Pattern Align? – Crypto News
-
Technology6 days agoGiving Nvidias Blackwell chip to China would slash USs AI advantage, experts say – Crypto News
-
Business6 days agoStarbucks Says Turnaround Strategy Drives Growth in Global Sales – Crypto News
-
others1 week agoGold weakens as US-China trade optimism lifts risk sentiment, focus turns to Fed – Crypto News
-
De-fi1 week agoCRO Jumps After Trump’s Truth Social Announces Prediction Market Partnership with Crypto.Com – Crypto News
-
Cryptocurrency1 week agoKERNEL price goes vertical on Upbit listing, hits $0.23 – Crypto News
-
Technology1 week ago
Breaking: $2.6B Western Union Announces Plans for Solana-Powered Stablecoin by 2026 – Crypto News
-
Blockchain1 week agoVisa To Support Four Stablecoins on Four Blockchains – Crypto News
-
De-fi1 week agoCrypto Market Edges Lower While US Stocks Hit New Highs – Crypto News
-
others1 week agoBank of Canada set to cut interest rate for second consecutive meeting – Crypto News
-
others5 days agoMETA stock has lower gaps to fill – Crypto News
-
De-fi1 week agoKalshi Taps RedStone to Bring Real-World Event Data On-Chain – Crypto News
-
Blockchain1 week agoThe Next Era Of Crypto Belongs To Decentralized Markets – Crypto News
-
Technology1 week agoInstagram finally lets you relive every Reel you’ve watched with ‘Watch History’ feature – Crypto News
-
Business1 week ago
Trump Tariffs: Secretary Bessent Declares ‘Fantastic’ Trump–Xi Talks, Bitcoin Breaks $113,000 – Crypto News
-
Cryptocurrency1 week ago‘Moments of the Unknown’: Justin Aversano Shares Globetrotting Love Letter to Humanity – Crypto News
