Metaverse
Google and Microsoft bet on Manu Chopra, a 27-year-old Stanford alum, to make AI work for a billion users – Crypto News
Preethi, who goes by a single name, as is common in the region, is among the 70 workers hired in Agara and neighboring villages by a startup called Karya to gather text, voice and image data in India’s vernacular languages. She is part of a vast, unseen global workforce — operating in countries like India, Kenya and the Philippines — who collect and label the data that AI chatbots and virtual assistants rely on to generate relevant responses. Unlike many other data contractors, however, Preethi gets paid well for her efforts, at least by local standards.
After three days of working with Karya, Preethi earned 4,500 rupees ($54), more than four times the amount the 22-year-old high school graduate usually makes as a tailor in an entire month. The money is enough, she said, to pay off that month’s installment on a loan taken to partly repair the crumbling mud walls of her home that have been carefully patched up with colorful saris. “All I need is a phone and the internet.”
Karya was founded in 2021, before the rise of ChatGPT, but this year’s frenzy around generative AI has only added to tech companies’ insatiable demand for data. India alone is expected to have nearly one million data annotation workers by 2030, according to Nasscom, the country’s tech industry trade body. Karya differentiates itself from other data vendors by offering its contractors – mostly women, and mostly in rural communities – as much as 20 times the prevailing minimum wage, with the promise of producing better quality Indian-language data that tech companies will pay more to obtain.
“Every year, big tech companies spend billions of dollars collecting training data for their AI” and machine learning models, said Manu Chopra, the 27-year-old Stanford-educated computer engineer behind the startup, told Bloomberg in an interview. “Poor pay for such work is an industry failure.”
If meager wages are an industry failure, it’s one that Silicon Valley bears some responsibility for creating. For years, tech companies have outsourced tasks like data labeling and content moderation to cheaper contractors overseas. But now, some of Silicon Valley’s most prominent names are turning to Karya to address one of the biggest challenges for their AI products: finding high-quality data to build tools that can better serve billions of potential non-English speaking users. These partnerships could represent a powerful shift in the economics of the data industry and Silicon Valley’s relationship with data providers.
Microsoft Corp. has used Karya to source local speech data for its AI products. The Bill & Melinda Gates Foundation is working with Karya to reduce gender biases in data that feeds into large language models, the technology underpinning AI chatbots. And Alphabet Inc.’s Google is leaning on Karya and other local partners to gather speech data in 85 Indian districts. Google plans to expand to every district to include the majority language or dialect spoken and build a generative AI model for 125 Indian languages.
Many AI services have been disproportionately developed with English-language internet data, such as articles, books and social media posts. As a result, these AI models poorly represent the diversity of languages for internet users in other countries who are accessing AI-powered smartphones and apps faster than they’re learning English. Nearly one billion such potential users live in India alone, as the government pushes for a rollout of AI tools in every sphere from healthcare to education to financial services.
“India is the first non-Western country we are doing this in, and we are testing Bard in nine Indian languages,” said Manish Gupta, head of Google Research in India, referring to the company’s AI chatbot. “Over 70 Indian languages spoken by over a million people each had zero digital corpus. The problem is so stark.”
Gupta ticked off a list of issues that AI firms need to address in order to serve India’s internet users: Non-English datasets are dismally low quality; hardly any conversational data exists in Hindi and other Indian languages; and digitized content from books and newspapers in Indian languages is very limited.
When used for South Asian languages, some large language models have been found to make up words and struggle with basic grammar. There are also concerns these AI services may reflect a more skewed view of other cultures. It’s critical to have broad representation of training data, including non-English data, so AI systems “don’t perpetuate harmful stereotypes, produce hate speech, nor yield misinformation,” said Mehran Sahami, a professor in the computer science department at Stanford University.
Karya, a social impact startup headquartered in Bangalore and supported by grants, is able to broaden the pool of languages represented in part by specifically targeting workers in rural areas who might not otherwise be contracted for such tasks. Karya’s app can work without internet access and it provides voice support for those with limited literacy. In India, over 32,000 crowdsourced workers have logged into the app, completing 40 million paid digital tasks such as image recognition, contour alignments, video annotation and speech annotation.
For Chopra, the goal isn’t just to improve the supply of data but to fight poverty. Karya’s founder grew up in an impoverished neighborhood called Shakur Basti in West Delhi. He won a scholarship to study in an elite school where he was bullied because his classmates said he “smelled poor.” Chopra landed at Stanford to study computer science but realized he hated the “how you make a billion dollars” mindset he encountered there.
After graduating in 2017, he began working on his long-held interest: using technology to tackle poverty. “It takes a mere $1,500 in savings to make an Indian eligible to enter the middle class,” Chopra said. “But the impoverished can take 200 years to reach that level of savings.”
Microsoft, he learned, had been paying a hefty amount for collecting speech data, albeit of poor quality, to feed its AI systems and research. In 2017, for instance, although 1 million hours of digitized spoken data was available in Marathi, a language spoken in Mumbai and its Western India region, only 165 hours was available for purchase. His startup has since put together 10,000 hours of Marathi speech data for Microsoft’s AI services, read by men and women from five different regions.
“Tech companies want the data, accent and all,” Chopra said. “You cough, they want that in the speech – it represents natural language.”Saikat Guha, a researcher at Microsoft Research India who focuses on the ethics of data collection, said he has also used Karya’s content for a project to aid those with visual disabilities in finding jobs. “The quality of data is far better than any other source I’ve used,” said Guha. “If you pay workers fairly, they’re more invested in their work, and the end result is better data.”
Meanwhile, over 30,000 young, school-educated women are working with Karya to help collect “gender intentional” datasets – such as that the doctor or boss isn’t always a he – in six Indian languages for the Bill & Melinda Gates Foundation. It’s the biggest such effort in Indian languages and will serve as a corpus to build datasets to reduce gender-related biases in LLMs.Karya isn’t stopping with India. The company said it’s in talks to sell its platform as a service to organizations in Africa and South America who will do similar work.
For now, women in Yelandur, another village southwest of Bangalore, eagerly await Karya’s next project: transcribing from a Kannada audio recording. Among them is Shambhavi S., 25, who earned a few thousand rupees from a previous assignment while working in the quiet of her home after feeding her in-laws dinner and putting her children to bed.
“I don’t know what artificial intelligence is, I haven’t heard of it,” said Shambhavi. “I want to earn and educate my children, so they can learn how to use it.”
Milestone Alert!Livemint tops charts as the fastest growing news website in the world 🌏 Click here to know more.
Download The Mint News App to get Daily Market Updates.
Updated: 03 Nov 2023, 02:04 PM IST
-
Blockchain1 week agoXRP Price Gains Traction — Buyers Pile In Ahead Of Key Technical Breakout – Crypto News
-
Technology1 week agoSam Altman says OpenAI is developing a ‘legitimate AI researcher’ by 2028 that can discover new science on its own – Crypto News
-
De-fi7 days agoBittensor Rallies Ahead of First TAO Halving – Crypto News
-
De-fi1 week agoNearly Half of US Retail Crypto Holders Haven’t Earned Yield: MoreMarkets – Crypto News
-
Technology1 week agoMicrosoft ‘tricked users into pricier AI-linked 365 plans,’ says Australian watchdog; files lawsuit – Crypto News
-
De-fi1 week agoAI Sector Rebounds as Agent Payment Systems Gain Traction – Crypto News
-
Blockchain1 week agoBig Iran Bank Goes Bankrupt, Affecting 42 Million Customers – Crypto News
-
Business1 week ago
Crypto Market Rally: BTC, ETH, SOL, DOGE Jump 3-7% as US China Trade Talks Progress – Crypto News
-
Cryptocurrency1 week agoBitcoin Accumulation Patterns Show Late-Stage Cycle Maturity, Not Definite End: CryptoQuant – Crypto News
-
Technology1 week ago
Ethereum Supercycle Strengthens as SharpLink Gaming Withdraws $78.3M in ETH – Crypto News
-
Blockchain1 week agoIBM Set to Launch Platform for Managing Digital Assets – Crypto News
-
others1 week agoGBP/USD floats around 1.3320 as softer US CPI reinforces Fed cut bets – Crypto News
-
Cryptocurrency1 week agoWestern Union eyes stablecoin rails in pursuit of a ‘super app’ vision – Crypto News
-
De-fi1 week agoNearly Half of US Retail Crypto Holders Haven’t Earned Yield: MoreMarkets – Crypto News
-
others1 week ago
Platinum price recovers from setback – Commerzbank – Crypto News
-
others1 week ago
Indian Court Declares XRP as Property in WazirX Hack Case – Crypto News
-
Blockchain1 week agoSolana Eyes $210 Before Its Next Major Move—Uptrend Or Fakeout Ahead? – Crypto News
-
De-fi1 week agoREP Jumps 50% in a Week as Dev Gets Community Support for Augur Fork – Crypto News
-
De-fi7 days agoBitcoin Dips Under $110,000 After Fed Cuts Rates – Crypto News
-
De-fi1 week agoNearly Half of US Retail Crypto Holders Haven’t Earned Yield: MoreMarkets – Crypto News
-
Blockchain1 week agoXRP/BTC Retests 6-Year Breakout Trendline, Analyst Calls For Decoupling – Crypto News
-
Cryptocurrency1 week agoUSDJPY Forecast: The Dollar’s Winning Streak Why New Highs Could Be At Hand – Crypto News
-
De-fi1 week agoMetaMask Fuels Airdrop Buzz With Token Claim Domain Registration – Crypto News
-
De-fi1 week agoTokenized Nasdaq Futures Enter Top 10 by Volume on Hyperliquid – Crypto News
-
Technology1 week agoBenQ MA270U review: A 4K monitor that actually gets MacBook users right – Crypto News
-
others1 week ago
Is Changpeng “CZ” Zhao Returning To Binance? Probably Not – Crypto News
-
Business1 week ago
Crypto ETFs Attract $1B in Fresh Capital Ahead of Expected Fed Rate Cut This Week – Crypto News
-
Cryptocurrency1 week agoInside Bitwise’s milestone solana ETF launch – Crypto News
-
Business7 days agoStarbucks Says Turnaround Strategy Drives Growth in Global Sales – Crypto News
-
Technology1 week agoSurvival instinct? New study says some leading AI models won’t let themselves be shut down – Crypto News
-
others1 week agoGold weakens as US-China trade optimism lifts risk sentiment, focus turns to Fed – Crypto News
-
Cryptocurrency1 week agoGold Price Forecast 2025, 2030, 2040 & Investment Outlook – Crypto News
-
Metaverse1 week agoIt isn‘t just AI. Earnings and the economy show the rally has legs. – Crypto News
-
Cryptocurrency1 week agoKERNEL price goes vertical on Upbit listing, hits $0.23 – Crypto News
-
Cryptocurrency1 week agoCitigroup and Coinbase partner to expand digital-asset payment capabilities – Crypto News
-
Cryptocurrency1 week agoWhy Is Pi Network’s (PI) Price Up by Double Digits Today? – Crypto News
-
De-fi1 week agoCrypto Market Edges Lower While US Stocks Hit New Highs – Crypto News
-
others1 week ago
Can ASTER Price Rebound 50% as Whale Activity and Bullish Pattern Align? – Crypto News
-
Technology7 days agoGiving Nvidias Blackwell chip to China would slash USs AI advantage, experts say – Crypto News
-
others5 days agoMETA stock has lower gaps to fill – Crypto News
-
Blockchain1 week agoThe Bitcoin Stock-To-Flow ModelIsn’t the Best BTC Forecast Model: Analyst – Crypto News
-
Metaverse1 week agoIt isn‘t just AI. Earnings and the economy show the rally has legs. – Crypto News
-
De-fi1 week agoCRO Jumps After Trump’s Truth Social Announces Prediction Market Partnership with Crypto.Com – Crypto News
-
Technology1 week ago
Breaking: $2.6B Western Union Announces Plans for Solana-Powered Stablecoin by 2026 – Crypto News
-
Blockchain1 week agoVisa To Support Four Stablecoins on Four Blockchains – Crypto News
-
others1 week ago
Pi Coin Gains Another 15% As Pi Network Joins ISO 20022 For Seamless Banking Integration – Crypto News
-
others1 week agoBank of Canada set to cut interest rate for second consecutive meeting – Crypto News
-
Technology1 week agoInstagram finally lets you relive every Reel you’ve watched with ‘Watch History’ feature – Crypto News
-
Business1 week ago
Trump Tariffs: Secretary Bessent Declares ‘Fantastic’ Trump–Xi Talks, Bitcoin Breaks $113,000 – Crypto News
-
Cryptocurrency1 week ago‘Moments of the Unknown’: Justin Aversano Shares Globetrotting Love Letter to Humanity – Crypto News
