Why Voice-AI Wins Used-Car CX

The used-car platforms in India — Cars24, Spinny, CarDekho, Droom, and the long tail of regional players — are all running the same playbook: aggressive CAC, heavy inspection ops, warranty-backed trust products, and a customer service org that is quietly crushing their margins. Chat didn’t fix it. Ticketing didn’t fix it. Voice-AI, applied narrowly, actually can.

This is a position piece. We’ve worked adjacent to this space long enough to have opinions, and we think this is where the puck is going for the next 24 months of operational AI in India. Our case-study work on CX automation is at dotsai.in/case-studies.

The margin problem is a voice problem

Used-car platforms have a structural CX challenge that pure-play e-commerce companies don’t: the purchase is high-trust, high-ticket, and the post-sale surface area is huge. Test drives. Delivery rescheduling. RC transfer follow-ups. Warranty claims. EMI queries. Refund disputes. Insurance handoffs. Most of it happens on a phone call, in Hindi or a regional language, with emotional stakes on both sides.

₹18–25 lakh

typical monthly fully-loaded cost of a 40-agent voice support team in a Tier-1 Indian used-car BPO

ZeroOne internal benchmarking, 3 platforms, 2025–2026

Cars24 has publicly discussed its unit-economics journey — getting to contribution-margin-positive has been a multi-year push.[✓]

Spinny has raised large rounds and explicitly focused on trust and refurbishment quality — which all shows up as CX cost downstream.[✓]

The math is brutal. If you sell a used car at a 4–6% gross margin and a single contested refund consumes 40 minutes of a senior agent’s time plus 20 minutes of a supervisor’s review, you’ve burned a meaningful fraction of that deal’s margin on one ticket. Scale it to thousands of tickets a day and CX becomes a first-order profitability lever.

Why chat and ticketing didn’t close the gap

Every used-car platform I’ve looked at has already done the obvious: chatbots for FAQs, ticketing routing, self-service portals, knowledge-base deflection. The results are real but bounded.

Chat deflects the easy queries. It does not touch the calls. And the calls are where the money lives — both the cost and the emotional moments that define retention.

Indian customers also prefer voice for high-stakes interactions. A 2024 Google/KPMG report on Indian internet behaviour found voice search and voice interaction adoption highest in exactly the demographics that buy used cars — Tier 2/3, 30–45 age, household-decision-makers.[✓]

What a voice agent has to do to be real

Most “voice AI” demos fail the used-car test. They’re trained on English, tuned for US accents, and they collapse the moment a customer switches between Hindi and English mid-sentence — which is how real Indian calls go.

A voice agent that actually works for used-car CX needs to clear six bars:

Hindi + English + code-switched Hinglish as a single conversational surface. Not a language toggle. Real-time mixed input.
Latency under 800ms end-to-end from user stops speaking to agent starts speaking. Anything above 1.2s and the caller hangs up.[✓]

✓ high

Voice agent latency above 1.2s leads to measurable hang-up rate increases.
LiveKit + Deepgram voice-AI latency whitepaper 2024
Domain-grounded RAG. The agent must answer from the platform’s actual policy documents, inventory database, and order system — not from LLM pretraining.
Escalation logic that doesn’t feel like escalation. The handoff to a human must carry full context so the caller doesn’t repeat themselves. That’s where most vendor demos collapse.
Call-quality observability. Post-call scoring, sentiment tracking, hallucination flagging. Without this you cannot improve the agent and you cannot trust it at scale.
A compliance and audit trail — every call logged, transcribed, searchable, and exportable. RBI and consumer-protection litigation makes this non-optional.

60–80%

of inbound used-car platform calls are deflectable to a well-tuned voice agent at production quality

ZeroOne client pilots, 2025–2026

The math of a deployed voice agent

Let’s do the math on a single platform with 40 agents running two shifts.

Human cost: 80 agents × ₹35K fully-loaded = ₹28 lakh/month
A voice agent handling 70% of call volume at production quality reduces headcount need by ~50% (you keep senior agents for escalation and quality review)
Net savings: ~₹14 lakh/month, or ~₹1.7 crore/year, on one mid-size platform

That’s before factoring in NPS improvements, 24/7 coverage (which reduces abandonment on after-hours calls), and the compounding data asset: every call becomes labeled training data for the next iteration.

Where this goes wrong

We’ve also seen the failure modes:

Vendor-led deployments where the platform doesn’t own the model, the prompt, or the data. Two years in, they’re locked in and cannot iterate.
Over-automating. Emotional calls — accident claims, contested deliveries, refund escalations — must route to humans fast. Agents that try to “handle” these create brand damage.
Ignoring accent coverage. A voice agent that works in Delhi Hindi but fails on South Indian Hindi or Bengali-inflected Hindi is shipping a broken product to 40% of its user base.
Metric illusions. “Average handle time reduced 30%” is not success if CSAT dropped 10 points. Measure the whole stack.

See our consulting frameworks for how we think about avoiding these in engagements.

Why now

Three things converged in 2024–2026 that made this viable:

Inference cost. Real-time speech-to-text + LLM + TTS pipelines dropped ~80% in cost over 18 months.[✓]

✓ medium

Real-time voice-AI pipeline cost dropped approximately 80% between mid-2023 and late-2025.
a16z Voice-AI State of the Stack 2025
Indic language models. Sarvam AI, AI4Bharat, and Google’s Indic efforts pushed Hindi/Hinglish ASR and TTS quality across the usability threshold for customer-facing deployments.
Platform maturity. Used-car platforms now have enough CX data, structured inventory, and policy documentation to actually ground an agent properly. Three years ago they didn’t.

The ones who deploy well in 2026 will compound the advantage. The ones who wait until 2028 will be catching up on unit economics their competitors already booked.

What we’d do tomorrow

If we were advising a used-car platform COO today:

Pick one call-type. Test drive rescheduling. Nothing else. Ship a voice agent that handles it end-to-end in production, not a demo, within 8 weeks.
Own the stack. Pick vendors for ASR / LLM / TTS but keep orchestration, prompts, and data in-house. Vendor lock-in is the silent killer.
Build observability from day one. Transcription, scoring, hallucination detection. No dashboards, no deployment.
Expand narrowly. After call-type #1 is production-stable, add call-type #2. Resist the urge to “roll out to all of CX” until you have three call types live and measured.

That’s the playbook. Talk to us if you want help shipping it.