Alibaba Cloud Tair KVCache: 3FS-based enterprise KVCache storage pipeline for agent-style inference
Alibaba Cloud's Tair KVCache team and storage hardware-software integration team upgraded the open-source 3FS file system to support enterprise KVCache storage for AI inference. The work optimized RDMA load balancing and small I/O, added a user-space persistence engine, introduced GPU Direct RDMA and multi-tenant isolation, and built a Kubernetes Operator for one-click deployment, self-healing, elastic scaling, and monitoring. The solution was integrated with SGLang, vLLM, and Tair KVCache Manager to improve long-context and agent-style inference performance.
- Organization
- Alibaba Cloud
- Industry
- Tech & Comms
- Location
- China
- Published
- July 2026
Reported outcomes
+830%
inference throughputProductivity & throughput
Strategic outcomes
Catalog median for productivity & throughput deployments: +43% across 212 reported metrics. Compare benchmarks →
Primary read
Use case focus
Showing 2 of 2
- 1Training infrastructure modernization
- 2AI model training
- Meet large-model inference requirements for high throughput, low latency, and strong stability.
- Scale KVCache storage while improving small I/O performance and simplifying operations.
- Enhanced 3FS with RDMA traffic load balancing and small I/O tuning.
- Integrated a full user-space persistence engine and enabled GDR support.
- Built cloud-native management via Kubernetes Operator and monitoring dashboard.
- Integrated with inference engines and Tair KVCache Manager for global KVCache reuse.
- 4K random read IOPS increased by 150%.
- CPU utilization dropped by approximately 27%.
- SGLang L3 cold-start TTFT reduced by 84% and throughput increased by 830%.
- SGLang achieved near-theoretical peak bandwidth of about 20 GB/s.
Architecture
3FS-based KVCache storage pipeline with RDMA networking, user-space persistence engine, GDR zero-copy support, Kubernetes Operator management, and integration with SGLang/vLLM and Tair KVCache Manager.
Sources & evidence1
AI-generated summary. Verify important details with the linked sources before relying on this case.