Alibaba Cloud Tair KVCache: 3FS-based enterprise KVCache storage pipeline for agent-style inference

Alibaba Cloud's Tair KVCache team and storage hardware-software integration team upgraded the open-source 3FS file system to support enterprise KVCache storage for AI inference. The work optimized RDMA load balancing and small I/O, added a user-space persistence engine, introduced GPU Direct RDMA and multi-tenant isolation, and built a Kubernetes Operator for one-click deployment, self-healing, elastic scaling, and monitoring. The solution was integrated with SGLang, vLLM, and Tair KVCache Manager to improve long-context and agent-style inference performance.

Organization
Alibaba Cloud
Industry
Tech & Comms
Location
China
Published
July 2026

Reported outcomes

+830%

inference throughputProductivity & throughput

+150%4K random read IOPS−27%CPU utilization−84%TTFT20 GB/snear-theoretical peak bandwidth

Strategic outcomes

Speed & agilityEnterprise-grade KVCache deployment and operationsScale & capacityReusable technical paradigm for large-scale KVCache deploymentRisk & complianceMulti-tenant isolation and access control

Catalog median for productivity & throughput deployments: +43% across 212 reported metrics. Compare benchmarks →

Primary read

Use case focus

Showing 2 of 2

  • 1Training infrastructure modernization
  • 2AI model training
  • Meet large-model inference requirements for high throughput, low latency, and strong stability.
  • Scale KVCache storage while improving small I/O performance and simplifying operations.
  • Enhanced 3FS with RDMA traffic load balancing and small I/O tuning.
  • Integrated a full user-space persistence engine and enabled GDR support.
  • Built cloud-native management via Kubernetes Operator and monitoring dashboard.
  • Integrated with inference engines and Tair KVCache Manager for global KVCache reuse.
  • 4K random read IOPS increased by 150%.
  • CPU utilization dropped by approximately 27%.
  • SGLang L3 cold-start TTFT reduced by 84% and throughput increased by 830%.
  • SGLang achieved near-theoretical peak bandwidth of about 20 GB/s.
Architecture

3FS-based KVCache storage pipeline with RDMA networking, user-space persistence engine, GDR zero-copy support, Kubernetes Operator management, and integration with SGLang/vLLM and Tair KVCache Manager.

Sources & evidence1
Groundedness: 4/5Type: Blog PostPublished: Jul 2, 2026Publisher: Alibaba CloudEvidence: VendorConfidence: High

AI-generated summary. Verify important details with the linked sources before relying on this case.

Explore related AI use cases
This website uses cookies to enhance the user experience. Learn more.