3 Indian AI Startups in the Spotlight
Three founders, three categories, one operating insight: India’s AI advantage is not in the model. It is in the workflow that makes the model useful.
The Indian AI story is finally starting to move beyond hype.
For most of the last two years, much of the conversation about Indian AI orbited around the model layer - whether Indian labs could build frontier-grade systems, who was funding them, how the cost curves would shake out. Those questions are still live, and companies like Sarvam are answering them in their own ways. But for the much larger universe of Indian AI builders sitting outside the foundation-model race, the practical reality on the ground is simpler. The model layer is being rented. Our recent stack survey of 244 Indian builders made this concrete: 74% run on closed proprietary APIs, 9% on Western open-weights, 3% on Chinese open-weights. For most builders, the model is no longer the bottleneck. It is the input.
The more interesting question, then, is what kinds of AI companies India can uniquely produce on top of that input. The answer, increasingly, looks like a voice layer purpose-built for messy multilingual business calls that no Silicon Valley product manager has ever sat next to. It looks like a software creation platform that turns natural-language intent into a production workflow rather than a one-shot demo. It looks like a storytelling system where serialised content is written, validated, localised, and stress-tested at machine speed against millions of listener journeys.
Over the last few weeks, we sat down with three founders building exactly this kind of company: Pocket FM, Rocket, and Bolna. Different categories. Different stages. Different go-to-market motions. But across the conversations, one architectural pattern kept recurring - each of them is using AI not to produce a single impressive output, but to collapse a bottleneck inside real workflows that nobody had previously owned cleanly.
The common thread is worth stating plainly. India’s AI winners are unlikely to win by owning the model. They will win by owning the workflow where the model becomes useful - the messy, sticky, distinctly local layer where intent meets execution and where every additional unit of usage hardens the moat.
What follows is a structured profile of each company. Same shape, three times: the facts, what they do, why they are unique, the big opportunity, the risks, and our final take.
Bolna: A Voice Operating Layer for Indian Business
What does Bolna do?
Bolna is a voice AI orchestration platform purpose-built for India’s telephony complexity. Enterprises deploy voice agents from their own transcripts and FAQs, with support for 10+ Indian languages, 50+ accents, sub-500ms latency, and the kind of code-switched conversations (Hinglish, English + Tamil, Hindi → Gujarati mid-sentence) that break global voice stacks. Customers go live in days, not the weeks a legacy IVR replacement typically takes.
The product is delivered through two motions: a self-serve platform for SMBs and modern startups (plug in a number, point at a CRM, run agents within minutes), and a forward-deployed engineering team for enterprise rollouts where voice workflows need to be embedded into core operations.
Why is it unique?
Bolna’s defining choice is that it is not a single-model company. Maitreya Wagh’s (CEO) position is direct: no foundational model can meet enterprise voice needs across India. Instead, Bolna runs an orchestration layer that picks the best-fit model for the language, accent, latency budget, and use case on every call - and re-routes as better models arrive. The model is rented. The orchestration is owned.
The founders describe four reinforcing moats underneath that orchestration.
Scale. Volume gives Bolna access to the underlying models at unit costs that new entrants cannot match.
Orchestration as a product surface. Different models route to different purposes within a single call, and the voice agent connects to CRM data, email, routing logic, and downstream workflow tools. This is not a chatbot. It is an agentic workflow embedded inside a real business process.
Tacit knowledge from running calls at India scale. How Indian users actually interrupt. Where latency breaks the experience. When prompts need to be rewritten. When to escalate to a human. None of this lives in a paper. It accumulates only by shipping.
Distribution via self-serve. A user plugs in their own number, completes verification, points at their CRM, and runs an agent against their own customers within minutes. In a category that otherwise feels procurement-heavy, that frictionless onboarding does as much work as the orchestration underneath.
The most India-specific insight, and the one no global competitor will reach without years in market: background noise is a semantic risk, not just an audio problem. Noise-cancellation models biased toward English can suppress real Indian speech because the system treats non-English audio as noise. That is a product truth Bolna had to learn by running calls.
The big opportunity
India runs over a billion voice calls a day across support, recruitment, logistics, onboarding, and collections. Most of it still flows through legacy IVR or human-led workflows that are expensive and slow to scale across languages.
The cost case is striking - a typical human call costs ₹6-7 in raw spend, ₹13-14 fully loaded; Bolna’s AI calls drop as low as ₹2. But Maitreya is careful that the better argument is structural rather than cost-based: AI can fetch data from multiple systems mid-call, respond at consistent speed, and hold quality across high-volume workflows that human teams degrade on. The cost saving is a side-effect of the structural advantage.
They are also clear about where humans still win. Expensive purchases. Sensitive support. Hiring conversations. Anywhere trust is the deliverable. Voice AI shines in conversational information collection, reminders, qualification, follow-ups, and the long tail of repetitive business calls.
If Bolna becomes the orchestration layer Indian enterprises standardise on, the platform sits at the intersection of telephony, CRM, and agentic workflows. International optionality is real - 10+ countries already - but the right way to read that is as proof the orchestration logic generalises, not as a near-term GTM bet.
The risks
Three are worth holding honestly.
The first is competitive. Vapi, LiveKit, VoiceRun and others are building voice orchestration globally, and US-funded competitors will eventually arrive in India with better-capitalised playbooks. Bolna’s head-start has to convert into either default-distribution or deep integration moats before that happens.
The second is concentration. Most current customers are SMBs paying small ticket sizes. The founder has signed two large enterprises and has four more in pilot - real progress, but the path from current scale to a venture-scale outcome runs through enterprise. That motion is operationally heavier (forward-deployed engineers, custom implementations) and slower-compounding than self-serve.
The third is category-definition. Voice AI is still in flux as a category. Is it infrastructure (priced like Twilio), application software (priced like a SaaS seat), or services (priced per-call)? The choice shapes margins, GTM, and competitive set. The founders seem clear-eyed about this, but the answer will only emerge through the next eighteen months of contracts.
Our final take
The orchestration-over-single-model bet is the architectural choice we find most defensible in Indian voice AI today, and Bolna's commitment to that bet is what makes the company worth watching closely. Whether it converts into a durable position depends less on the orchestration logic itself, which is straightforwardly correct on its own terms, than on whether the four reinforcing layers underneath - scale economics, tacit operational knowledge, agentic workflow integration, and self-serve distribution - genuinely compound at the pace the early traction suggests. The next eighteen months will tell us whether the enterprise motion catches up to the self-serve, and whether the head-start translates into the kind of category-defining position that survives the eventual arrival of better-capitalised global competitors. The signals so far are encouraging. The verdict is not yet in.
Rocket: Product Creation as a Natural-Language Workflow
What does Rocket do?
Rocket.new turns a natural-language prompt into a production-ready application - landing pages, dashboards, internal tools, native mobile apps, full-stack SaaS. It is a vibe-coding platform in the same broad category as Lovable, Bolt, and Replit, but with a different architectural bet: it takes longer (~25 minutes) on the first generation in exchange for a much more complete output, and is built around the assumption that day one is the start of the product, not the end of it.
The platform combines frontier models from Anthropic, OpenAI, and Google with Rocket’s own deep learning systems, trained on a proprietary 10M Figma-to-code corpus inherited from DhiWise. A multi-agent orchestration layer breaks high-level prompts into structured sub-prompts, generates across Next.js, React, and Flutter, and merges outputs into a single coherent application. A continuous RLHF pipeline improves output quality over time.
Why is it unique?
Vishal’s framing is simple enough to repeat in one line: day one is easy; day two is where the real product begins.
Take a landing page, the canonical AI-generated artefact. A founder can spin one up in minutes today, and it will look perfectly serviceable. Then the actual work begins. Are leads coming through. Is the page ranking. Should the copy be personalised for visitors arriving from India, the Gulf, or the US. Should the team run an A/B test. App generation is the output. The actual customer wants the outcome. Almost everyone in the category is pricing the first thing. Rocket is built for the second.
This thesis was not invented post-ChatGPT. Rocket’s parent company DhiWise launched a code-generation platform in 2021, before the LLM wave, and the hard learning from that era is what shapes the current product. Code generation alone, Vishal says, does not create wide adoption. Users could ship in a day, but unless the application was reliable, maintainable, and capable of growing with them, they would forget about it within a month.
DhiWise also gave the team an engineering base most vibe-coding entrants do not possess - components built across the full software lifecycle (requirement analysis, design-to-code, end-to-end testing, code generation). That structural depth is what keeps pure AI generation from collapsing into architectural mess at scale. The first 90% of the work now happens in 10% of the effort. The remaining 10% - taste, polish, edge cases, last-mile quality - is where products are still differentiated, and it is what Vishal is building Rocket around.
The proprietary data advantage is the second half of the moat. Rocket is trained on 10 million Figma-to-code pairs accumulated through DhiWise. That dataset cannot be quickly replicated by a new entrant, and it produces UI code with higher fidelity and better architectural soundness than tools trained on generic public data.
The Surat point is also worth holding. Vishal does not frame Surat as a constraint - he frames it as focus. Strong engineering talent locally, no relocation just because the market expected serious startups to be in Bangalore, and an anti-hype philosophy that runs against the grain of an extremely loud category.
The big opportunity
The vibe-coding market is in explosive top-of-funnel growth with unclear unit economics. Most platforms are racing to ship the first generation faster; few are building the iteration, deployment, and outcome layer that determines whether users stay past the first month. Rocket’s traction signal is strong because it is built around that second layer: 30-40% MoM growth, double-digit million ARR, ~$3K annual ARPU, 50-55% gross margins, 400,000+ users across 180 countries, 85% of recent cohort apps being serious projects.
The bigger opportunity sits beyond app generation. Vishal’s stated ambition is for Rocket to become a comprehensive agentic system that builds the app, but also runs competitive research, conversion analysis, and product iteration. That is a much larger company than a code-generation tool. It is closer to a software agency rebuilt as a self-serve product surface - what Salesforce Ventures, in announcing the round, called a “full-stack AI solutioning engine.”
Geography is instructive. The US is already the largest revenue contributor at 26%, with Europe at 15-20% and India at 10%. The Palo Alto office and Salesforce Ventures relationship suggest enterprise distribution in the US is the next vector to unlock - while the engineering and product centre of gravity stays in Surat.
The risks
The category is the biggest risk. Vibe coding has become the most crowded segment of consumer AI, with billions of dollars flowing into Lovable, Bolt, Replit, Cursor, and adjacent players. Differentiation has to be defended every quarter, and the moat needs to compound faster than the next model release commoditises individual features.
The 25-minute generation time is a deliberate trade-off, not a bug. But it does create real friction in a category where competitors generate output in three minutes. Rocket has to keep proving that the additional time is repaid in quality and in downstream usability. Vishal’s bet is that day-two outcomes are what users will eventually be paying for; the question is whether they realise it before they bounce on day one.
Last-mile production is a structural challenge. Production-ready software requires hosting, databases, auth, payments, deployment, monitoring, and a long tail of edge cases. Each new layer is a potential drop-off point, and the more Rocket owns the iteration loop, the more operational surface area it inherits.
Our final take
The day-one-vs-day-two diagnosis is the part of Vishal's thinking that most clearly distinguishes Rocket from the rest of the vibe-coding category. Most platforms in this space are competing on speed of first generation, which is a feature race that the underlying models will commoditise within a year or two; Rocket is competing on completeness and second-month outcomes, which is a much harder thing to copy and a much more defensible thing to own. The 10M Figma-to-code corpus inherited from DhiWise is the kind of asset that gets harder to replicate the longer Rocket operates, and the engineering discipline visible in the multi-agent orchestration layer suggests the team can actually execute on the architectural ambition. Whether all of this converts into a category-defining position depends on whether Rocket can extend beyond app generation into the broader agentic product loop fast enough to stay ahead of category compression. The strategy is sound. And it all comes down to execution.
Pocket FM: A Closed-Loop Storytelling System
What does Pocket FM do?
Pocket FM is the global category-leader in long-form serialised audio entertainment, with operations across 20+ countries. The product is structured around episodic audio dramas - fiction, romance, thrillers, horror, self-help - distributed through a freemium model where listeners pay micro-payments to unlock new episodes.
What makes the company genuinely interesting in this edition, and what most coverage misses, is the architecture underneath. In the company’s own framing, Pocket has shifted from a content marketplace to an AI-native storytelling system. A proprietary AI Co-pilot - purpose-built for long-form storytelling and trained on billions of minutes of listener engagement and retention data - powers an end-to-end content supply chain across three coordinated stages: creation, blockbuster identification, and scale & expansion.
Crucially, even leading general-purpose models like OpenAI or Claude are not designed to do what this system does. The Co-pilot can produce hundreds of hours of coherent narrative while maintaining character consistency, emotional arcs, and core storytelling principles - the long-form coherence problem that horizontal models do not solve. It is not a generic LLM applied to fiction. It is a specialised system optimised against a different objective function entirely.
Why is it unique?
Pocket FM treats engagement as a closed-loop reward signal. Completion rate, drop-off curves, binge depth, re-listen frequency, session continuation - each is treated as an implicit reward function in reinforcement learning loops. Instead of optimising for next-token likelihood, the model is optimised toward sequences that maximise sustained engagement, tighter pacing, and stronger narrative hooks. Layered on top, the corpus of high-performing stories paired with fine-grained engagement annotations serves as a supervised fine-tuning signal that teaches the model not just how to write coherently, but how to write well in a specific genre, language, or audience context.
At sufficient volume, the system also discovers structural patterns - tension buildup, cliffhanger placement, character introduction timing, emotional cadence - that consistently correlate with retention. These patterns are not visible from smaller datasets. They emerge clearly only when you observe millions of listener journeys, and once extracted, they become priors guiding new stories. The system is not just generating coherent prose. It is generating prose inside structural rules empirically validated against audience behaviour at scale.
The localisation engine is the second piece worth pausing on. Pocket treats localisation as a system-design problem, not a translation task. Translation gets the words right; Pocket’s pipeline preserves the experience. Before any rewriting begins, the system extracts a complete map of the source story - world, characters, relationships, cultural elements - and builds a cultural transposition strategy. Adaptation then proceeds in three phases: characters first, then dependent entities like places and institutions (with graph-based grouping ensuring a renamed family cascades correctly across estates, titles, and symbols everywhere they appear), and only then a chapter rewrite that preserves tone, rhythm, and emotional equivalence. Surface details may change - food, setting, customs - but the underlying feeling stays intact.
The position on what stays human is unambiguous. Creativity itself remains entirely with the writer. The ideas, imagination, cultural instincts, emotional truth - all of it is human. The process is writer-led, with creators retaining full control and agency over every aspect of the story. The Co-pilot acts as a creative partner, supporting structure, continuity, and optimisation, while remaining fully directed by the writer. AI does not replace creativity. It compounds it.
The big opportunity
Pocket FM is operating in the largest revenue pool of the three companies and arguably one of the largest categories in consumer AI globally - entertainment. The numbers tell the story: UGC ecosystem at $50M ARR within 12 months. AI-enabled creator-led shows ~25% of total platform playtime. 80+ blockbuster IPs each generating $1M+ in revenue, with 4+ new blockbusters added every month. ~$50M ARR in Europe within 12 months. LATAM accelerating.
The Co-pilot does not just produce content. It changes the cost structure of producing hits. The company reports 4x more hits at 50x lower cost. Traditional content businesses underwrite a small number of expensive bets and absorb a high failure rate. Pocket is underwriting a much larger number of cheaper bets and using engagement signals to identify the winners early.
The terminal opportunity is to become the global default infrastructure for long-form serialised content - first audio, then potentially adjacent formats. The Co-pilot, the engagement-annotated corpus, the structural pattern discovery, and the localisation engine are all assets that travel across formats, languages, and geographies.
The risks
Three matter, in declining order of severity.
The first is regulatory. Pocket has faced lawsuits in the US over the classification of independent contractors, with creators alleging circumvention of US wage and worker-classification laws. The outcome of these cases, and the precedent they set, materially affects the cost structure of the creator economy underneath the platform. This is the single most important diligence item for any serious investor or partner.
The second is content homogenisation. AI-assisted serialised content carries a real risk that scaled-up production starts to feel formulaic - and a 25% playtime contribution from AI-enabled shows is large enough that any quality drift would be visible in retention metrics. The structural pattern discovery that powers the Co-pilot also creates the risk of producing structurally similar stories. The company’s edge will depend on whether the workflow continues to surface genuinely differentiated stories or merely scaled-up variants.
The third is competitive. Kuku FM, Audible’s expansion into long-form fiction, ByteDance’s audio plays, and Spotify’s serialised content investments are all operating in adjacent territory. Pocket has a meaningful technical and data lead today, but the moat is engagement compounding - and that compounds slowly.
Our final take
The architectural choice that distinguishes Pocket FM from the rest of the AI-native content category is the treatment of engagement as a reward signal rather than as a vanity metric, and watching how that closed-loop system performs over the next eighteen months will tell us a great deal about whether the underlying pattern generalises beyond audio entertainment to other long-form content categories. The engagement-annotated corpus, the structural pattern discovery layer, and the localisation engine are each genuinely difficult to replicate, and the early traction across India, Europe, and LATAM suggests the system is doing real work rather than benefiting from temporary tailwinds. The regulatory overhang in the US, around independent contractor classification and creator economy compliance, is the single live question that could materially change the cost structure of the platform underneath, and any honest assessment of Pocket's trajectory has to weight that question heavily. The architecture is genuinely promising.
India AI News Roundup
The most impactful AI developments & announcements shaping India in recent weeks.
Nandan Nilekani: India doesn’t need to lead the world in building the most advanced AI models
Pixxel, Sarvam join hands to build orbital data centre satellite
Yotta, Gorilla Technology expand India AI infra pact with $2.8 billion project
AI startup Nava raises $22 million in round led by Greenoaks Capital
Startup Signals
Spotlighting brand new emerging AI startups from India every month, early and undiscovered.
OpenRound: AI-native engineering hiring assessments
Every method engineering teams use to evaluate candidates - LeetCode puzzles, multi-day take-homes, week-long work trials - was designed before AI changed how engineers ship, and most of them are now one-shotted by the same tools the candidates would use on the job. OpenRound is rebuilding the assessment workflow against that new reality. Candidates work on real codebases with full access to AI coding agents through their own CLI, and are evaluated across six dimensions including analysis, planning, judgment, execution, and AI collaboration itself. A single ninety-minute session replaces a process that typically runs three weeks and twelve interviewer-hours. Built by Fabric and already deployed at companies including Aays Analytics, OpenRound has positioned itself early in a category that is being fundamentally rewritten rather than incrementally improved.
Navana: Foundational voice AI infrastructure for Indian enterprise
India’s voice AI landscape is splitting into two architectural bets - orchestration over rented models, and proprietary foundation models for Indian linguistic complexity - and Navana is the cleanest expression of the second. Its proprietary speech engine, Bodhi, is trained on hundreds of thousands of real-world conversations across 12 Indian languages and 40+ dialects, built specifically to handle code-switching, overlapping speech, and ambient noise that global voice stacks treat as edge cases. The company has aligned early with the heaviest end of Indian enterprise procurement: ISO 27001:2022, SOC 2 Type II, RBI guidelines, on-premises deployment, and full data localisation. Hello Ujjivan, its deployment with Ujjivan Small Finance Bank, now serves 1.5M+ users in 12 Indian languages. Built for the 900M+ Indians, urban and rural, that the global voice stack was not designed around.
Atlas: AI as a junior accountant for independent firms
The accounting profession has shed 300,000+ professionals in the US since 2019, and smaller independent firms - squeezed between rising labour costs and the inability to absorb the kind of automation that the largest firms already have - are the structural beneficiaries of a workflow rewrite. Atlas, founded in 2025 by Arpit Maheshwari and Jagmal Singh (ex-CarDekho, PolicyBazaar), is building software that embeds AI across both client-facing and back-office accounting workflows, with a deliberately human-in-the-loop architecture that positions AI as a junior accountant rather than a replacement. The company reports productivity gains of more than 5x on certain workflows in early deployments. $6M seed in April 2026 co-led by Accel and Stellaris Venture Partners, targeting a $150B global market with go-to-market focused on North America.



