Tech Stack
Architecture
Incoming call -> Twilio -> Vapi (voice agent) -> n8n webhook -> Zoho CRM update + Google Sheets log + Telegram alert. Fallback to human if confidence < 0.7.Everyone demos voice AI with a "book me a restaurant" example. Nobody talks about what happens when a construction site manager calls at 6 AM asking about invoice discrepancies while a bulldozer runs in the background.
- -Real-World Challenges:
- -Background noise from construction sites destroys transcription accuracy
- -Indian English accents with Hindi code-switching need specialized handling
- -Customers expect immediate CRM updates after calls — not "we'll get back to you"
- -Downtime during business hours = lost deals
The Stack: Vapi handles the voice agent layer with OpenAI Realtime for sub-1s latency. Twilio manages telephony. Every call triggers an n8n webhook that updates Zoho CRM, logs to Google Sheets, and pings Telegram for the sales team.
The Confidence Threshold: If Angelina's confidence drops below 0.7 on any response, she gracefully transfers to a human rep. No awkward "I don't understand" loops. The handoff includes full conversation context so the rep doesn't ask the customer to repeat anything.
What I Learned: Voice AI in production is 20% AI and 80% infrastructure — retry logic, fallback paths, monitoring, and making sure the CRM update happens even if the webhook fails. The AI part is the easy part.
Want to build something like this?
I architect and deploy end-to-end AI systems — from MVP to revenue.
Let's Talk