Metaverse
Google and Microsoft bet on Manu Chopra, a 27-year-old Stanford alum, to make AI work for a billion users – Crypto News
Preethi, who goes by a single name, as is common in the region, is among the 70 workers hired in Agara and neighboring villages by a startup called Karya to gather text, voice and image data in India’s vernacular languages. She is part of a vast, unseen global workforce — operating in countries like India, Kenya and the Philippines — who collect and label the data that AI chatbots and virtual assistants rely on to generate relevant responses. Unlike many other data contractors, however, Preethi gets paid well for her efforts, at least by local standards.
After three days of working with Karya, Preethi earned 4,500 rupees ($54), more than four times the amount the 22-year-old high school graduate usually makes as a tailor in an entire month. The money is enough, she said, to pay off that month’s installment on a loan taken to partly repair the crumbling mud walls of her home that have been carefully patched up with colorful saris. “All I need is a phone and the internet.”
Karya was founded in 2021, before the rise of ChatGPT, but this year’s frenzy around generative AI has only added to tech companies’ insatiable demand for data. India alone is expected to have nearly one million data annotation workers by 2030, according to Nasscom, the country’s tech industry trade body. Karya differentiates itself from other data vendors by offering its contractors – mostly women, and mostly in rural communities – as much as 20 times the prevailing minimum wage, with the promise of producing better quality Indian-language data that tech companies will pay more to obtain.
“Every year, big tech companies spend billions of dollars collecting training data for their AI” and machine learning models, said Manu Chopra, the 27-year-old Stanford-educated computer engineer behind the startup, told Bloomberg in an interview. “Poor pay for such work is an industry failure.”
If meager wages are an industry failure, it’s one that Silicon Valley bears some responsibility for creating. For years, tech companies have outsourced tasks like data labeling and content moderation to cheaper contractors overseas. But now, some of Silicon Valley’s most prominent names are turning to Karya to address one of the biggest challenges for their AI products: finding high-quality data to build tools that can better serve billions of potential non-English speaking users. These partnerships could represent a powerful shift in the economics of the data industry and Silicon Valley’s relationship with data providers.
Microsoft Corp. has used Karya to source local speech data for its AI products. The Bill & Melinda Gates Foundation is working with Karya to reduce gender biases in data that feeds into large language models, the technology underpinning AI chatbots. And Alphabet Inc.’s Google is leaning on Karya and other local partners to gather speech data in 85 Indian districts. Google plans to expand to every district to include the majority language or dialect spoken and build a generative AI model for 125 Indian languages.
Many AI services have been disproportionately developed with English-language internet data, such as articles, books and social media posts. As a result, these AI models poorly represent the diversity of languages for internet users in other countries who are accessing AI-powered smartphones and apps faster than they’re learning English. Nearly one billion such potential users live in India alone, as the government pushes for a rollout of AI tools in every sphere from healthcare to education to financial services.
“India is the first non-Western country we are doing this in, and we are testing Bard in nine Indian languages,” said Manish Gupta, head of Google Research in India, referring to the company’s AI chatbot. “Over 70 Indian languages spoken by over a million people each had zero digital corpus. The problem is so stark.”
Gupta ticked off a list of issues that AI firms need to address in order to serve India’s internet users: Non-English datasets are dismally low quality; hardly any conversational data exists in Hindi and other Indian languages; and digitized content from books and newspapers in Indian languages is very limited.
When used for South Asian languages, some large language models have been found to make up words and struggle with basic grammar. There are also concerns these AI services may reflect a more skewed view of other cultures. It’s critical to have broad representation of training data, including non-English data, so AI systems “don’t perpetuate harmful stereotypes, produce hate speech, nor yield misinformation,” said Mehran Sahami, a professor in the computer science department at Stanford University.
Karya, a social impact startup headquartered in Bangalore and supported by grants, is able to broaden the pool of languages represented in part by specifically targeting workers in rural areas who might not otherwise be contracted for such tasks. Karya’s app can work without internet access and it provides voice support for those with limited literacy. In India, over 32,000 crowdsourced workers have logged into the app, completing 40 million paid digital tasks such as image recognition, contour alignments, video annotation and speech annotation.
For Chopra, the goal isn’t just to improve the supply of data but to fight poverty. Karya’s founder grew up in an impoverished neighborhood called Shakur Basti in West Delhi. He won a scholarship to study in an elite school where he was bullied because his classmates said he “smelled poor.” Chopra landed at Stanford to study computer science but realized he hated the “how you make a billion dollars” mindset he encountered there.
After graduating in 2017, he began working on his long-held interest: using technology to tackle poverty. “It takes a mere $1,500 in savings to make an Indian eligible to enter the middle class,” Chopra said. “But the impoverished can take 200 years to reach that level of savings.”
Microsoft, he learned, had been paying a hefty amount for collecting speech data, albeit of poor quality, to feed its AI systems and research. In 2017, for instance, although 1 million hours of digitized spoken data was available in Marathi, a language spoken in Mumbai and its Western India region, only 165 hours was available for purchase. His startup has since put together 10,000 hours of Marathi speech data for Microsoft’s AI services, read by men and women from five different regions.
“Tech companies want the data, accent and all,” Chopra said. “You cough, they want that in the speech – it represents natural language.”Saikat Guha, a researcher at Microsoft Research India who focuses on the ethics of data collection, said he has also used Karya’s content for a project to aid those with visual disabilities in finding jobs. “The quality of data is far better than any other source I’ve used,” said Guha. “If you pay workers fairly, they’re more invested in their work, and the end result is better data.”
Meanwhile, over 30,000 young, school-educated women are working with Karya to help collect “gender intentional” datasets – such as that the doctor or boss isn’t always a he – in six Indian languages for the Bill & Melinda Gates Foundation. It’s the biggest such effort in Indian languages and will serve as a corpus to build datasets to reduce gender-related biases in LLMs.Karya isn’t stopping with India. The company said it’s in talks to sell its platform as a service to organizations in Africa and South America who will do similar work.
For now, women in Yelandur, another village southwest of Bangalore, eagerly await Karya’s next project: transcribing from a Kannada audio recording. Among them is Shambhavi S., 25, who earned a few thousand rupees from a previous assignment while working in the quiet of her home after feeding her in-laws dinner and putting her children to bed.
“I don’t know what artificial intelligence is, I haven’t heard of it,” said Shambhavi. “I want to earn and educate my children, so they can learn how to use it.”
Milestone Alert!Livemint tops charts as the fastest growing news website in the world 🌏 Click here to know more.
Download The Mint News App to get Daily Market Updates.
Updated: 03 Nov 2023, 02:04 PM IST
-
Blockchain1 week agoBitcoin Rebounds Off Yearly Lows But US Stocks Flash Warning Sign – Crypto News
-
De-fi1 week agoAave’s Kulechov Disputes Report, Says Firm Won’t Sell AAVE at ‘70%’ Discount – Crypto News
-
Metaverse1 week agoCan AI replace lawyers? An ₹8.8 lakh defeat for humans suggests it’s already happening – Crypto News
-
De-fi1 week agoPolymarket Confirms $3 Million Loss From Third-Party Front-End Supply-Chain Breach – Crypto News
-
Technology6 days agoWill iPhone 17 Pro Max cost nearly ₹2 lakh in India? Leak suggests so – Crypto News
-
Business1 week ago
UBS, $165B Banking Giant, Tests Ethereum Infra With Nethermind – Crypto News
-
Technology1 week ago
Cathie Wood’s ARK Invest Buys The Dip In Coinbase, Robinhood, Circle Stocks – Crypto News
-
Business1 week ago
Cathie Wood’s ARK Invest Buys The Dip In Coinbase, Robinhood, Circle Stocks – Crypto News
-
Technology1 week ago
Cathie Wood’s ARK Invest Buys The Dip In Coinbase, Robinhood, Circle Stocks – Crypto News
-
Technology1 week ago
Cathie Wood’s ARK Invest Buys The Dip In Coinbase, Robinhood, Circle Stocks – Crypto News
-
De-fi1 week agoPolymarket Confirms $3 Million Loss From Third-Party Front-End Supply-Chain Breach – Crypto News
-
Blockchain1 week agoEU Lawmakers Back Review of DeFi, Staking and NFT Regulation – Crypto News
-
Business1 week ago
Solana Price Prediction as Open Interest Soars: Will Bulls Reclaim $80k Soon? – Crypto News
-
Blockchain1 week agoCoinbase, Circle Deepen Crypto Stock Losses Despite Resilient S&P 500 – Crypto News
-
Business7 days ago
Crypto Market Analysis: Why Bitcoin, Stocks, and Gold Could Face Heavy Volatility on Monday – Crypto News
-
others1 week ago
Breaking: Ripple Promotes CLARITY Act With Latest “On The Road” Campaign – Crypto News
-
others1 week ago
Breaking: Ripple Promotes CLARITY Act With Latest “On The Road” Campaign – Crypto News
-
others1 week ago
Morgan Stanley Predicts Fed To Hold Rates This Year Despite Rate Hike Bets – Crypto News
-
De-fi1 week agoRipple Launches RLUSD in Japan via SBI as Circle and Nomura Join Stablecoin Race – Crypto News
-
Cryptocurrency1 week agoBinance will be cut off from Europe on July 1 – Crypto News
-
others1 week agoWells Fargo Employee Drains $655,000 From Bank’s Vaults and ATMs, Manipulates Monthly Audits To Cover Tracks: DOJ – Crypto News
-
Blockchain1 week agoEU Lawmakers Back Review of DeFi, Staking and NFT Regulation – Crypto News
-
Business1 week ago
XRP Price Outlook as Ripple CEO Backs Bitcoin Rally – Crypto News
-
others5 days agoMichael Saylor’s Strategy Boosts US Dollar Reserves, Unveils ‘Bitcoin Monetization Program’ – Crypto News
-
others3 days ago
CoinGape Announces Winners of the Web3 Innovation Awards 2026 – Crypto News
-
others1 week ago
Strategy Director Sells More Stake As MSTR Stock Price Hits Record Low At $85 – Crypto News
-
Business1 week ago
How Japan’s Public Companies Are Quietly Becoming Digital Asset Treasury Giants – Crypto News
-
Cryptocurrency1 week agoUS crypto perps are live but Bitcoin may be the only market many traders can actually use – Crypto News
-
others1 week ago
Galaxy Digital Lowers CLARITY Act Approval Odds To 50% As Senate Timeline Tightens – Crypto News
-
others1 week agoToss Brings 30 Million Users Into the AI Data Economy in Partnership With Poseidon – Crypto News
-
Cryptocurrency1 week agoFed stress tests reveal whether banks can survive a 10% unemployment shock – Crypto News
-
Cryptocurrency1 week agoFed stress tests reveal whether banks can survive a 10% unemployment shock – Crypto News
-
Business6 days ago
SpaceX Stock in Focus as Citadel Securities Flags Major AI Risk – Crypto News
-
others6 days ago
Ripple Is “Planting Seeds” For Global XRP Adoption After CLARITY Act, Says Expert – Crypto News
-
others3 days ago
CoinGape Announces Winners of the Web3 Innovation Awards 2026 – Crypto News
-
others1 week agoSolstice and Tensorx to Buy $1 Billion in AI Infrastructure to Support EU Sovereign AI Demand – Crypto News
-
Cryptocurrency1 week agoStablecoins are becoming a central bank problem hiding in T-bill markets – Crypto News
-
others1 week ago
Why are These Crypto Coins Rallying Today? Myro, BEAT, Aster, and AAVE – Crypto News
-
Cryptocurrency1 week agoOutdated bank rules may keep crypto outside the banks now allowed to hold it – Crypto News
-
others1 week agoToss Brings 30 Million Users Into the AI Data Economy in Partnership With Poseidon – Crypto News
-
Blockchain1 week agoTether Briefly Overtakes Ethereum As Stablecoin Market Cap Tops ETH During Sell-Off – Crypto News
-
Cryptocurrency1 week agoMichael Saylor’s Bitcoin machine hits $8 billion cash wall as STRC crashes 25% below par – Crypto News
-
others1 week ago
Polymarket Faces Broad CFTC Probe Amid Fake Bets Allegations – Crypto News
-
Cryptocurrency1 week agoEthereum’s oldest wallets are selling into the $1,500 demand line buyers cannot dodge – Crypto News
-
Cryptocurrency1 week agoBitcoin’s broken production cost floor is splitting miners into survivors and sellers – Crypto News
-
De-fi7 days agoTokenized Asset Value Stalls Even as Stock Token Holders Surge – Crypto News
-
others7 days ago
XRP ETFs vs Bitcoin & Ethereum ETFs: Who’s Winning the Race? – Crypto News
-
Technology7 days agoFlipkart GOAT Sale: iPhone 17 gets ₹12,000 discount, Pro models cheaper by up to ₹22,000 – Crypto News
-
Cryptocurrency7 days agoXRP investors capitulate at fastest pace since the 2022 crypto crash amid slide to $1 – Crypto News
-
Technology7 days agoGoogle limits Meta’s use of its Gemini AI models: Report – Crypto News
