{"id":186559,"date":"2023-11-03T14:14:37","date_gmt":"2023-11-03T08:44:37","guid":{"rendered":"https:\/\/dripp.zone\/news\/?p=186559"},"modified":"2023-11-03T14:14:37","modified_gmt":"2023-11-03T08:44:37","slug":"google-and-microsoft-bet-on-manu-chopra-a-27-year-old-stanford-alum-to-make-ai-work-for-a-billion-users-crypto-news","status":"publish","type":"post","link":"https:\/\/dripp.zone\/news\/google-and-microsoft-bet-on-manu-chopra-a-27-year-old-stanford-alum-to-make-ai-work-for-a-billion-users-crypto-news\/","title":{"rendered":"Google and Microsoft bet on Manu Chopra, a 27-year-old Stanford alum, to make AI work for a billion users &#8211; Crypto News"},"content":{"rendered":"<p><\/p>\n<div id=\"paywall_11698999758505\">\n<p>   Preethi, who goes by a single name, as is common in the region, is among the 70 workers hired in Agara and neighboring villages by a startup called Karya to gather text, voice and image data in India\u2019s vernacular languages. She is part of a vast, unseen global workforce \u2014\u00a0operating in countries like India, Kenya and the Philippines \u2014\u00a0who collect and label the data that AI chatbots and virtual assistants rely on to generate relevant responses. Unlike many other data contractors, however, Preethi gets paid well for her efforts, at least by local standards.<\/p>\n<p>   After three days of working with Karya, Preethi earned 4,500 rupees ($54), more than four times the amount the 22-year-old high school graduate usually makes as a tailor in an entire month. The money is enough, she said, to pay off that month\u2019s installment on a loan taken to partly repair the crumbling mud walls of her home that have been carefully patched up with colorful saris. \u201cAll I need is a phone and the internet.&#8221;<\/p>\n<p>   Karya was founded in 2021, before the rise of ChatGPT, but this year\u2019s frenzy around generative AI has only added to tech companies\u2019 insatiable demand for data. India alone is expected to have nearly one million data annotation workers by 2030, according to Nasscom, the country\u2019s tech industry trade body. Karya differentiates itself from other data vendors by offering its contractors \u2013 mostly women, and mostly in rural communities \u2013 as much as 20 times the prevailing minimum wage, with the promise of producing better quality Indian-language data that tech companies will pay more to obtain.<\/p>\n<p>   \u201cEvery year, big tech companies spend billions of dollars collecting training data for their AI&#8221; and machine learning\u00a0models,\u00a0said Manu Chopra, the 27-year-old Stanford-educated computer engineer behind the startup, told Bloomberg in an interview. \u201cPoor pay for such work is an industry failure.&#8221;\u00a0<\/p>\n<p>   If meager wages are\u00a0an industry failure, it\u2019s one that Silicon Valley bears some responsibility for creating. For years, tech companies have outsourced tasks like data labeling and content moderation to cheaper contractors overseas. But now, some of Silicon Valley\u2019s most prominent names are turning to Karya to address one of the biggest challenges for their AI products: finding high-quality data to build tools that can better serve billions of potential non-English speaking users. These partnerships could represent a powerful shift in the economics of the data industry and Silicon Valley\u2019s relationship with data providers.<\/p>\n<p>   Microsoft Corp. has used Karya to source local speech data for its AI products. The Bill &amp; Melinda Gates Foundation is working with Karya to reduce gender biases in data that feeds into large language models, the technology underpinning AI chatbots. And Alphabet Inc.\u2019s Google is leaning on\u00a0Karya and other local partners to gather speech data in 85 Indian districts. Google plans to expand to every district to include the majority language or dialect spoken and build a\u00a0 generative AI model for 125 Indian languages.<\/p>\n<p>   Many AI services have been disproportionately developed with English-language internet data, such as articles, books and social media posts. As a result, these AI models poorly represent the diversity of languages for internet users in other countries who are accessing AI-powered smartphones and apps faster than they\u2019re learning English. Nearly one billion such potential users live in India alone, as the government pushes for a rollout of AI tools in every sphere from healthcare to education to financial services.\u00a0<\/p>\n<p>   \u201cIndia is the first non-Western country we are doing this in, and we are testing Bard in nine Indian languages,&#8221;\u00a0said Manish Gupta, head of Google Research in India, referring to the company\u2019s AI chatbot. \u201cOver 70 Indian languages spoken by over a million people each had zero digital corpus. The problem is so stark.&#8221;<\/p>\n<p>   Gupta ticked off a list of\u00a0issues that AI firms need to address in order to serve India\u2019s internet users: Non-English datasets are dismally low quality; hardly any conversational data exists in Hindi and other Indian languages; and digitized content from books and newspapers in Indian languages is very limited.<\/p>\n<p>   When used for South Asian languages, some large language models have been found to make up words and struggle with basic grammar. There are also concerns these AI services may reflect a more skewed view of other cultures. It\u2019s critical to have broad representation of training data, including non-English data, so AI systems \u201cdon\u2019t perpetuate harmful stereotypes, produce hate speech, nor yield misinformation,&#8221; said Mehran Sahami, a professor in the computer science department at Stanford University.<\/p>\n<p>   Karya, a social impact startup headquartered in Bangalore and supported by grants, is able to broaden the pool of languages represented in part by specifically targeting workers in rural areas who might not otherwise\u00a0be contracted for such\u00a0tasks. Karya\u2019s app can work without internet access and it provides voice support for those with limited literacy.\u00a0 In India, over 32,000 crowdsourced workers have logged into the app, completing 40 million paid digital tasks such as image recognition, contour alignments, video annotation and speech annotation.\u00a0<\/p>\n<p>   For Chopra, the goal isn\u2019t just to improve the supply of data but to fight poverty. Karya\u2019s founder grew up in an impoverished neighborhood called Shakur Basti in West Delhi. He won a scholarship to study in an elite school where he was bullied because his classmates said he \u201csmelled poor.&#8221; Chopra landed at Stanford to study computer science but realized he hated the \u201chow you make a billion dollars&#8221; mindset he encountered there.\u00a0<\/p>\n<p>   After graduating in 2017, he began working on his long-held interest: using technology to tackle poverty. \u201cIt takes a mere $1,500 in savings to make an Indian eligible to enter the middle class,&#8221; Chopra said. \u201cBut the impoverished can take 200 years to reach that level of savings.&#8221;<\/p>\n<p>   Microsoft, he learned, had been paying a hefty amount for collecting speech data, albeit of poor quality, to feed its AI systems and research. In 2017, for instance, although 1 million hours of digitized spoken data was available in Marathi, a language spoken in Mumbai and its Western India region, only 165 hours was available for purchase. His startup has since put together 10,000 hours of Marathi speech data for Microsoft\u2019s AI services, read by men and women from five different\u00a0regions.\u00a0<\/p>\n<p>   \u201cTech companies want the data, accent and all,&#8221; Chopra said. \u201cYou cough, they want that in the speech \u2013 it represents natural language.&#8221;Saikat Guha, a researcher at Microsoft Research India who focuses on the ethics of data collection, said he has also used Karya\u2019s content for a project to aid those with visual disabilities in finding jobs. \u201cThe quality of data is far better than any other source I\u2019ve used,&#8221; said Guha. \u201cIf you pay workers fairly, they\u2019re more invested in their work, and the end result is better data.&#8221;<\/p>\n<p>   Meanwhile, over 30,000 young, school-educated women are working with Karya to help collect \u201cgender intentional&#8221; datasets \u2013 such as that the doctor or boss isn\u2019t always a he \u2013 in six Indian languages for the Bill &amp; Melinda Gates Foundation. It\u2019s the biggest such effort in Indian languages and will serve as a corpus to build datasets to reduce gender-related biases in LLMs.Karya isn\u2019t stopping with India. The company said it\u2019s in talks to sell its platform as a service to organizations in Africa and South America who will do similar work.<\/p>\n<p>   For now, women in Yelandur, another village southwest of Bangalore, eagerly await Karya\u2019s next project: transcribing from a Kannada audio recording. Among them is Shambhavi S., 25, who earned a few thousand rupees from a previous assignment while working in the quiet of her home after feeding her in-laws dinner and putting her children to bed.\u00a0<\/p>\n<p>   \u201cI don\u2019t know what artificial intelligence is, I haven\u2019t heard of it,&#8221; said Shambhavi. \u201cI want to earn and educate my children, so they can learn how to use it.&#8221;<\/p>\n<p><strong>Milestone Alert!<\/strong>Livemint tops charts as the fastest growing news website in the world <span>\ud83c\udf0f<\/span> <a rel=\"nofollow noopener\" target=\"_blank\" class=\"lmWhatsappChannel\" href=\"https:\/\/www.read.ht\/QvEN\" data-name=\"Whatsapp_channel\"><strong>Click here<\/strong><\/a>  to know more.<\/p>\n<div class=\"disclamerText\" id=\"disclamerArea_11698999758505\">\n<div class=\"seeless\" id=\"disclamerText_11698999758505\">\n                Catch all the <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.livemint.com\/\" class=\"bold\">Business News<\/a>, <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.livemint.com\/market\" class=\"bold\">Market News<\/a>, <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.livemint.com\/latest-news\" class=\"bold\">Breaking News<\/a> Events and <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.livemint.com\/latest-news\" class=\"bold\">Latest News<\/a> Updates on Live Mint.<br \/>\n        Download The <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.livemint.com\/apps\" class=\"bold\">Mint News App<\/a> to get Daily Market Updates.\n    <\/div>\n<p><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.livemint.com\/ai\/artificial-intelligence\/javascript:void(0)\" class=\"readMore\" id=\"readMore_11698999758505\">More<\/a><br \/>\n<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.livemint.com\/ai\/artificial-intelligence\/javascript:void(0)\" class=\"readLess\" id=\"readLess_11698999758505\">Less<\/a>\n<\/div>\n<p>\n\t\tUpdated: 03 Nov 2023, 02:04 PM IST\n\t<\/p>\n<aside class=\"moreAbout\">Topics <\/aside>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Preethi, who goes by a single name, as is common in the region, is among the 70 workers hired in Agara and neighboring villages by a startup called Karya to gather text, voice and image data in India\u2019s vernacular languages. She is part of a vast, unseen global workforce \u2014\u00a0operating in countries like India, Kenya [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":186560,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[440,284,263,262,29,260,259,258,73,289,13984,265,101,202,261,264],"class_list":["post-186559","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-metaverse","tag-ai","tag-artificial-intelligence","tag-axie-infinity","tag-axs","tag-business","tag-decentraland","tag-facebook","tag-game","tag-google","tag-india","tag-manu-chopra","tag-mark-zuckerberg","tag-microsoft","tag-nft","tag-sandbox","tag-vr"],"_links":{"self":[{"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/posts\/186559","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/comments?post=186559"}],"version-history":[{"count":2,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/posts\/186559\/revisions"}],"predecessor-version":[{"id":186562,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/posts\/186559\/revisions\/186562"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/media\/186560"}],"wp:attachment":[{"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/media?parent=186559"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/categories?post=186559"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dripp.zone\/news\/wp-json\/wp\/v2\/tags?post=186559"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}