OTTO voice shopping assistant with Gemini Enterprise and Gemini Live API
OTTO built a conversational voice shopping assistant to turn retail search into expert advisory dialogue. The assistant supports real-time, multi-turn voice interactions for shopping advice and product discovery.
Reported outcomes
+1,000%
product categories coveredOther quantified impact
Strategic outcomes
Primary read
Use case focus
Showing 2 of 2
- 1Voice automation
- 2Customer service agent
- Built a text assistant first, then migrated the voice path to Gemini Live's native audio model for real-time turn-by-turn conversation.
- Implemented an internal orchestration layer based on Petri nets to keep critical guardrails intact mid-flow.
- Used hybrid lexical and vector retrieval followed by a Gemini 2.5 Flash validation pass for subjective requests.
- Cut voice response latency from about 8-9 seconds to under 2 seconds.
- First EMEA retailer with native voice shopping live in production.
- Scaled expert product advisory from 5 to 50 categories and expanding.
- Open beta results showed voice sessions capture more customer context than typed chat.
Architecture
OTTO built the assistant on Gemini Enterprise Agent Platform. The voice path uses Gemini Live API with native audio while a separate text assistant continues to drive guardrails, recommendations, and search. Chirp 3 handles German speech recognition. A custom internal orchestration layer based on Petri nets coordinates the conversation flow and preserves guardrails. Hybrid lexical and vector retrieval is followed by a Gemini 2.5 Flash validation pass for subjective requests.
Implementation partners1
Sources & evidence1
AI-generated summary. Verify important details with the linked sources before relying on this case.