

Metaverse
Google and Microsoft bet on Manu Chopra, a 27-year-old Stanford alum, to make AI work for a billion users – Crypto News
Preethi, who goes by a single name, as is common in the region, is among the 70 workers hired in Agara and neighboring villages by a startup called Karya to gather text, voice and image data in India’s vernacular languages. She is part of a vast, unseen global workforce — operating in countries like India, Kenya and the Philippines — who collect and label the data that AI chatbots and virtual assistants rely on to generate relevant responses. Unlike many other data contractors, however, Preethi gets paid well for her efforts, at least by local standards.
After three days of working with Karya, Preethi earned 4,500 rupees ($54), more than four times the amount the 22-year-old high school graduate usually makes as a tailor in an entire month. The money is enough, she said, to pay off that month’s installment on a loan taken to partly repair the crumbling mud walls of her home that have been carefully patched up with colorful saris. “All I need is a phone and the internet.”
Karya was founded in 2021, before the rise of ChatGPT, but this year’s frenzy around generative AI has only added to tech companies’ insatiable demand for data. India alone is expected to have nearly one million data annotation workers by 2030, according to Nasscom, the country’s tech industry trade body. Karya differentiates itself from other data vendors by offering its contractors – mostly women, and mostly in rural communities – as much as 20 times the prevailing minimum wage, with the promise of producing better quality Indian-language data that tech companies will pay more to obtain.
“Every year, big tech companies spend billions of dollars collecting training data for their AI” and machine learning models, said Manu Chopra, the 27-year-old Stanford-educated computer engineer behind the startup, told Bloomberg in an interview. “Poor pay for such work is an industry failure.”
If meager wages are an industry failure, it’s one that Silicon Valley bears some responsibility for creating. For years, tech companies have outsourced tasks like data labeling and content moderation to cheaper contractors overseas. But now, some of Silicon Valley’s most prominent names are turning to Karya to address one of the biggest challenges for their AI products: finding high-quality data to build tools that can better serve billions of potential non-English speaking users. These partnerships could represent a powerful shift in the economics of the data industry and Silicon Valley’s relationship with data providers.
Microsoft Corp. has used Karya to source local speech data for its AI products. The Bill & Melinda Gates Foundation is working with Karya to reduce gender biases in data that feeds into large language models, the technology underpinning AI chatbots. And Alphabet Inc.’s Google is leaning on Karya and other local partners to gather speech data in 85 Indian districts. Google plans to expand to every district to include the majority language or dialect spoken and build a generative AI model for 125 Indian languages.
Many AI services have been disproportionately developed with English-language internet data, such as articles, books and social media posts. As a result, these AI models poorly represent the diversity of languages for internet users in other countries who are accessing AI-powered smartphones and apps faster than they’re learning English. Nearly one billion such potential users live in India alone, as the government pushes for a rollout of AI tools in every sphere from healthcare to education to financial services.
“India is the first non-Western country we are doing this in, and we are testing Bard in nine Indian languages,” said Manish Gupta, head of Google Research in India, referring to the company’s AI chatbot. “Over 70 Indian languages spoken by over a million people each had zero digital corpus. The problem is so stark.”
Gupta ticked off a list of issues that AI firms need to address in order to serve India’s internet users: Non-English datasets are dismally low quality; hardly any conversational data exists in Hindi and other Indian languages; and digitized content from books and newspapers in Indian languages is very limited.
When used for South Asian languages, some large language models have been found to make up words and struggle with basic grammar. There are also concerns these AI services may reflect a more skewed view of other cultures. It’s critical to have broad representation of training data, including non-English data, so AI systems “don’t perpetuate harmful stereotypes, produce hate speech, nor yield misinformation,” said Mehran Sahami, a professor in the computer science department at Stanford University.
Karya, a social impact startup headquartered in Bangalore and supported by grants, is able to broaden the pool of languages represented in part by specifically targeting workers in rural areas who might not otherwise be contracted for such tasks. Karya’s app can work without internet access and it provides voice support for those with limited literacy. In India, over 32,000 crowdsourced workers have logged into the app, completing 40 million paid digital tasks such as image recognition, contour alignments, video annotation and speech annotation.
For Chopra, the goal isn’t just to improve the supply of data but to fight poverty. Karya’s founder grew up in an impoverished neighborhood called Shakur Basti in West Delhi. He won a scholarship to study in an elite school where he was bullied because his classmates said he “smelled poor.” Chopra landed at Stanford to study computer science but realized he hated the “how you make a billion dollars” mindset he encountered there.
After graduating in 2017, he began working on his long-held interest: using technology to tackle poverty. “It takes a mere $1,500 in savings to make an Indian eligible to enter the middle class,” Chopra said. “But the impoverished can take 200 years to reach that level of savings.”
Microsoft, he learned, had been paying a hefty amount for collecting speech data, albeit of poor quality, to feed its AI systems and research. In 2017, for instance, although 1 million hours of digitized spoken data was available in Marathi, a language spoken in Mumbai and its Western India region, only 165 hours was available for purchase. His startup has since put together 10,000 hours of Marathi speech data for Microsoft’s AI services, read by men and women from five different regions.
“Tech companies want the data, accent and all,” Chopra said. “You cough, they want that in the speech – it represents natural language.”Saikat Guha, a researcher at Microsoft Research India who focuses on the ethics of data collection, said he has also used Karya’s content for a project to aid those with visual disabilities in finding jobs. “The quality of data is far better than any other source I’ve used,” said Guha. “If you pay workers fairly, they’re more invested in their work, and the end result is better data.”
Meanwhile, over 30,000 young, school-educated women are working with Karya to help collect “gender intentional” datasets – such as that the doctor or boss isn’t always a he – in six Indian languages for the Bill & Melinda Gates Foundation. It’s the biggest such effort in Indian languages and will serve as a corpus to build datasets to reduce gender-related biases in LLMs.Karya isn’t stopping with India. The company said it’s in talks to sell its platform as a service to organizations in Africa and South America who will do similar work.
For now, women in Yelandur, another village southwest of Bangalore, eagerly await Karya’s next project: transcribing from a Kannada audio recording. Among them is Shambhavi S., 25, who earned a few thousand rupees from a previous assignment while working in the quiet of her home after feeding her in-laws dinner and putting her children to bed.
“I don’t know what artificial intelligence is, I haven’t heard of it,” said Shambhavi. “I want to earn and educate my children, so they can learn how to use it.”
Milestone Alert!Livemint tops charts as the fastest growing news website in the world 🌏 Click here to know more.
Download The Mint News App to get Daily Market Updates.
Updated: 03 Nov 2023, 02:04 PM IST
-
Blockchain6 days ago
The CFO and Treasurer’s Guide to Digital Assets – Crypto News
-
Blockchain1 week ago
Former Kraken execs acquire real state firm Janover, disclose SOL treasury plans – Crypto News
-
Cryptocurrency1 week ago
Famous Crypto Analyst Advises to Sell NVIDIA Stock: Here’s Why – Crypto News
-
Blockchain1 week ago
Bitcoin on verge of largest ‘price drawdown’ of the bull market — Analyst – Crypto News
-
Cryptocurrency1 week ago
BlackRock CEO Says Market Could Tank Another 20%—But It’s a ‘Buying Opportunity’ – Crypto News
-
Technology1 week ago
Meity, dept of science working with Cert-In to build quantum cyber framework – Crypto News
-
Business1 week ago
Binance Enables Apple & Google Pay Features With This Latest Partnership – Crypto News
-
Cryptocurrency1 week ago
Tariffs Are Just the Tip of the Iceberg, Warns Billionaire Investor Ray Dalio – Crypto News
-
Cryptocurrency1 week ago
VIX shows volatility will not be stopping anytime soon – Crypto News
-
others1 week ago
NYSE Arca Approves Listing of Teucrium’s XRP ETF – Crypto News
-
Business1 week ago
Bitcoin Price Shortly Rebounds On “Fake” 90-Day Tariffs Pause – Crypto News
-
Technology1 week ago
Ripple Whale Moves $355 Million To Binance, XRP Price To Dip Further? – Crypto News
-
Blockchain1 week ago
Bitcoin Flashes ‘Death Cross’ Amid Tariff-Induced Market Turmoil – Crypto News
-
others1 week ago
EUR/USD continues to pare brief tariff gains – Crypto News
-
others1 week ago
Altcoins Will ‘Get To Fire’ Once Bitcoin Hits This Price Level, According to BitMEX Founder Arthur Hayes – Crypto News
-
Business1 week ago
MicroStrategy Halts Bitcoin Purchase, MSTR Stock Slides 13% – Crypto News
-
Cryptocurrency1 week ago
BitMEX Study Reveals Exchange-Specific Price Trends for Perpetual Swaps Across Leading Exchanges – Crypto News
-
Technology1 week ago
Apple could give iPhone a radical makeover for its 20th anniversary, report says – Crypto News
-
Cryptocurrency1 week ago
Dire Wolf Solana Meme Coin Soars to $13.6M Market Cap After ‘De-Extinction’ – Crypto News
-
Technology1 week ago
Apple exported iPhones worth ₹1.5 trillion from India in FY25: Union Minister Ashwini Vaishnaw – Crypto News
-
others1 week ago
SEC Commissioner Says Regulatory Agency Drastically Understating Risks of US Dollar Stablecoin Market – Crypto News
-
Business1 week ago
Will Donald Trump Ease Tariffs to Prevent More Fallout? – Crypto News
-
Blockchain1 week ago
XRP Dump? Engineer Says Panic Selling Makes No Sense – Crypto News
-
Cryptocurrency1 week ago
The Odds Stuck Against PI Currently Insurmountab – Crypto News
-
Technology1 week ago
Can It Take The Baton And Initiate The Next Altcoin Rally As The Market Strengthens? – Crypto News
-
Cryptocurrency1 week ago
The Downside Prevails As Cardano Price Rejected at $0.60 – Crypto News
-
Cryptocurrency1 week ago
Dogecoin hits multi-month low, but is a market reset on the way? – Crypto News
-
Business1 week ago
Will Dogecoin Price Ever Reach $1? Top Analysts Weigh In – Crypto News
-
Technology1 week ago
Musks DOGE using AI to snoop on U.S. federal workers, sources say – Crypto News
-
Cryptocurrency1 week ago
ETH Hits 2-Year Low as BTC, XRP Hold Support – Crypto News
-
Cryptocurrency1 week ago
What’s Next for ETH Amid the Bearish Trend Sparked by Trump’s Tariff Decisions? – Crypto News
-
Technology1 week ago
Did OpenAI crack down on ChatGPT’s fake ID generation after online outrage? Here’s how the AI responds – Crypto News
-
others1 week ago
Ex-Binance CEO Changpeng Zhao Appointed Strategic Advisor by Pakistan’s Crypto Council – Crypto News
-
Business1 week ago
Programmer Reveals Reason To Be Bullish On Pi Network Despite Pi Coin Price Crash – Crypto News
-
Technology1 week ago
iPad Air M3 (2025) Review: Still the most practical iPad – Crypto News
-
Business1 week ago
Cathie Wood’s Ark Invest Loads $13 Million of Coinbase Stock, COIN Price Reversal Soon? – Crypto News
-
Business1 week ago
“Perfect Time to Buy” – Patterns Point to a Pepe Coin Price Resurgence – Crypto News
-
Business1 week ago
Sui Price Recovers As CBOE Files To List SUI ETF – Crypto News
-
Cryptocurrency1 week ago
Peter Schiff Cautions US Against Trade War Escalation With China – Crypto News
-
others1 week ago
US Dollar steady on tariff jitters – Crypto News
-
Cryptocurrency1 week ago
FARTCOIN – How a short-term flip could mean more gains for memecoin! – Crypto News
-
others1 week ago
Solana’s Fartcoin Jumps 20% Despite Market Selloff – Crypto News
-
Cryptocurrency1 week ago
Analyst Warns Of Ripple Price Crash To $0.74, What’s Next? – Crypto News
-
Business1 week ago
Is Ripple Hinting at Cardano Partnership? – Crypto News
-
Cryptocurrency1 week ago
Bitcoin is highly correlated with stock market since August 2024 – Crypto News
-
others1 week ago
Top 3 Reasons XRP Price May Surge as Analyst Delivers a $693 Billion Prediction – Crypto News
-
Technology5 days ago
Microsoft’s Greatest Hits and Epic Fails: A 50-Year Wild Ride – Crypto News
-
Blockchain5 days ago
How to mine Bitcoin at home in 2025: A realistic guide – Crypto News
-
others5 days ago
Binance Issues Important Update On 10 Crypto, Here’s All – Crypto News
-
Technology1 week ago
Attorney Files Lawsuit Against DHS in Effort to Reveal Satoshi Nakamoto’s Identity – Crypto News