OTTO voice shopping assistant with Gemini Enterprise and Gemini Live API

OTTO built a conversational voice shopping assistant to turn retail search into expert advisory dialogue. The assistant supports real-time, multi-turn voice interactions for shopping advice and product discovery.

Organization
OTTO
Industry
Retail
Location
Germany
Published
July 2026

Reported outcomes

+1,000%

product categories coveredOther quantified impact

8 seconds to under 2 secondsvoice response latency

Strategic outcomes

Customer experience & trustEnabled native voice shopping advisoryScale & capacityExpanded advisory coverage across many more product categoriesInnovation & cultureIntroduced a new conversational retail interface

Primary read

Use case focus

Showing 2 of 2

  • 1Voice automation
  • 2Customer service agent
Turn list-based search into a true conversational, expert-advisory voice shopping experience with low latency and reliable guardrails during multi-turn interactions.
  • Built a text assistant first, then migrated the voice path to Gemini Live's native audio model for real-time turn-by-turn conversation.
  • Implemented an internal orchestration layer based on Petri nets to keep critical guardrails intact mid-flow.
  • Used hybrid lexical and vector retrieval followed by a Gemini 2.5 Flash validation pass for subjective requests.
  • Cut voice response latency from about 8-9 seconds to under 2 seconds.
  • First EMEA retailer with native voice shopping live in production.
  • Scaled expert product advisory from 5 to 50 categories and expanding.
  • Open beta results showed voice sessions capture more customer context than typed chat.
Architecture

OTTO built the assistant on Gemini Enterprise Agent Platform. The voice path uses Gemini Live API with native audio while a separate text assistant continues to drive guardrails, recommendations, and search. Chirp 3 handles German speech recognition. A custom internal orchestration layer based on Petri nets coordinates the conversation flow and preserves guardrails. Hybrid lexical and vector retrieval is followed by a Gemini 2.5 Flash validation pass for subjective requests.

Implementation partners1
Sources & evidence1
Groundedness: 5/5Type: Customer StoryPublished: Jul 1, 2026Publisher: Google CloudEvidence: PrimaryConfidence: High

AI-generated summary. Verify important details with the linked sources before relying on this case.

Explore related AI use cases
This website uses cookies to enhance the user experience. Learn more.