Why voice is emerging as India's next frontier for AI interaction

Metaverse

Why voice is emerging as India’s next frontier for AI interaction – Crypto News

Published

8 hours ago

July 16, 2025

Dripp

Unlike text, which is relatively uniform, spoken language is richly-layered—with cultural nuances, colloquialisms and emotion. Startups building voice-first AI models are now doubling down on one thing above all else: the depth and diversity of datasets.

Why voice is emerging as the frontline interface

In India, where oral tradition plays a pivotal role in communication, voice isn’t just a convenience—it’s a necessity. “We’re not an English-first or even a text-first country. Even when we type in Hindi, we often use the English script instead of Devanagari. That’s exactly why we need to build voice-first models—because oral tradition plays such a vital role in our culture,” said Abhishek Upperwal, chief executive officer (CEO) of Soket AI Labs.

Voice is also proving critical for customer service and accessibility. “Voice plays a crucial role in bridging accessibility gaps, particularly for users with disabilities,” said Mahesh Makhija, leader, technology consulting, at EY.

“Many customers even prefer voicing complaints over typing, simply because talking feels more direct and human. Moreover, voice is far more frictionless than navigating mobile apps or interfaces—especially for users who are digitally-illiterate, older, or not fluent in English,” said Makhija, adding that “communicating in vernacular languages opens access to the next half a billion consumers, which is a major focus for enterprises.”

Startups like Gnani.ai are already deploying voice systems across banking and financial services to streamline customer support, assist with loan applications, and eliminate virtual queues. “The best way to reach people—regardless of literacy levels or demographics—is through voice in the local language, so it’s very important to capture the tonality of the conversations,” said Ganesh Gopalan, CEO of Gnani.ai.

The hunt for rich, real-world data

As of mid-2025, India’s AI landscape shows a clear tilt toward text-based AI, with over 90 Indian companies active in the space, compared to 57 in voice-based AI. Text-based platforms tend to focus on document processing, chat interfaces, and analytics. In contrast, voice-based companies are more concentrated in customer service, telephony, and regional language access, according to data from Tracxn.

In terms of funding, voice-first AI startups have attracted larger funding rounds at later stages, while text AI startups show broader distribution, especially at earlier stages.

For example, Skit.ai, a voice-first AI firm, raised a total of $47.6 million across five funding rounds. Similarly, Yellow.ai has cumulatively secured around $102 million, including a major $78.15M Series C round in 2021, making it one of the top-funded startups in voice AI, data from Tracxn shows.

However, data remains the foundational challenge for voice models. Voice AI systems need massive, diverse datasets that not only cover different languages, but also regional accents, slangs and emotional tonality.

Chaitanya C., co-founder and chief technological officer of Ozonetel Communications, put it simply: “The datasets matter the most—speaking as an AI engineer, I can say it’s not about anything else; it’s all about the data.”

IndiaAI Mission has allocated ₹199.55 crore for datasets—just about 2% of the mission’s total ₹10,300 crore budget —while 44% has gone to compute. “Investments solely in compute are inherently transient—their value fades once consumed. On the other hand, investments in datasets build durable, reusable assets that continue to deliver value over time,” said Chaitanya.

He also emphasized the scarcity of rich, culturally-relevant data in regional languages like Telugu and Kannada. “The amount of data easily available in English, when compared with Telugu and Kannada or Hindi, it’s not even comparable,” he said. “Somewhere it’s just not perfect, it wouldn’t be as good as an English story, which is why I wouldn’t want it to tell a Telugu story for my kid.”

“Some movie comes out, nobody’s going to write it in government documents, but people are going to talk about it, and that is lost,” he added, pointing out that government datasets often lack cultural nuance and everyday language.

Gopalan of Gnani.ai agreed. “The colloquial language is often very different from the written form. Language experts have a great career path ahead of them because they not only understand the language technically, but also know how to converse naturally and grasp colloquial nuances.”

Startups are now employing creative methods to fill these gaps. “First, we collect data directly from the field using multiple methods—and we’re careful with how we handle that data. Second, we use synthetic data in some cases. Third, we augment that synthetic data further. In addition, we also leverage a substantial amount of open-source data available from universities and other sources,” Gopalan said.

Synthetic data is artificially-generated data that mimics real-world data for use in training, testing, or validating models.

Upperwal added that Soket AI uses a similar approach: “We start by training smaller AI models with the limited real voice data we have. Once these smaller models are reasonably accurate, we use them to generate synthetic voice data—essentially creating new, artificial examples of speech.”

However, some intend to consciously stay away from synthetic data.

Ankush Sabarwal, CEO and founder of CoRover AI, said the company relies exclusively on real data, deliberately avoiding synthetic data, “If I am a consumer and I am interacting with an AI bot, the AI bot will become intelligent by the virtue of it interacting with a human like me.”

The ethical labyrinth of voice AI

As companies begin to scale their data pipelines, the new Digital Personal Data Protection (DPDP) Act will shape how they collect and use voice data.

“The DPDP law emphasizes three key areas: it mandates clear, specific, and informed consent before collecting data. Second, it enforces purpose limitation—data can only be used for legitimate, stated purposes like KYC or employment, not unrelated model training. Third, it requires data localization, meaning critical personal data must reside on servers in India,” said Makhija.

He added, “Companies have begun including consent notices at the start of customer calls, often mentioning AI training. However, the exact process of how this data flows into model training pipelines is still evolving and will become clearer as DPDP rules are fully implemented.”

Outsourcing voice data collection raises red flags, too. “For a deep-tech company like ours, voice data is one of the most powerful forms of IP (intellectual property) we have, and outsourcing it could compromise its integrity and ownership. What if someone is using copyrighted material?” said Gopalan.

Crypto News

Why voice is emerging as India’s next frontier for AI interaction – Crypto News

Metaverse

Why voice is emerging as India’s next frontier for AI interaction – Crypto News

Why voice is emerging as the frontline interface

The hunt for rich, real-world data

The ethical labyrinth of voice AI

You may like

Trending