What challenges did Alibaba Cloud face before AI implementation?

Meet large-model inference requirements for high throughput, low latency, and strong stability. Scale KVCache storage while improving small I/O performance and simplifying operations.

What was the impact of the AI solution?

4K random read IOPS increased by 150%. CPU utilization dropped by approximately 27%. SGLang L3 cold-start TTFT reduced by 84% and throughput increased by 830%. SGLang achieved near-theoretical peak bandwidth of about 20 GB/s.

Source

Alibaba

Alibaba Cloud Tair KVCache: 3FS-based enterprise KVCache storage pipeline for agent-style inference

Use case typeTraining infrastructure modernizationUpdated 16 hours ago

Alibaba Cloud's Tair KVCache team and storage hardware-software integration team upgraded the open-source 3FS file system to support enterprise KVCache storage for AI inference. The work optimized RDMA load balancing and small I/O, added a user-space persistence engine, introduced GPU Direct RDMA and multi-tenant isolation, and built a Kubernetes Operator for one-click deployment, self-healing, elastic scaling, and monitoring. The solution was integrated with SGLang, vLLM, and Tair KVCache Manager to improve long-context and agent-style inference performance.

Organization: Alibaba Cloud
Industry: Tech & Comms
Location: China
Published: July 2026

Reported outcomes

+830%

inference throughputProductivity & throughput

+150%4K random read IOPS−27%CPU utilization−84%TTFT20 GB/snear-theoretical peak bandwidth

Strategic outcomes

Speed & agilityEnterprise-grade KVCache deployment and operationsScale & capacityReusable technical paradigm for large-scale KVCache deploymentRisk & complianceMulti-tenant isolation and access control

Catalog median for productivity & throughput deployments: +43% across 212 reported metrics. Compare benchmarks →

Primary read

Use case focus

Showing 2 of 2

1Training infrastructure modernization
2AI model training

Meet large-model inference requirements for high throughput, low latency, and strong stability.
Scale KVCache storage while improving small I/O performance and simplifying operations.

Enhanced 3FS with RDMA traffic load balancing and small I/O tuning.
Integrated a full user-space persistence engine and enabled GDR support.
Built cloud-native management via Kubernetes Operator and monitoring dashboard.
Integrated with inference engines and Tair KVCache Manager for global KVCache reuse.

Technologies

Alibaba Cloud Database Tair KVCache 3FS Kubernetes GPU Direct RDMA ClickHouse Grafana SGLang vLLM Tair KVCache Manager

4K random read IOPS increased by 150%.
CPU utilization dropped by approximately 27%.
SGLang L3 cold-start TTFT reduced by 84% and throughput increased by 830%.
SGLang achieved near-theoretical peak bandwidth of about 20 GB/s.

Architecture

3FS-based KVCache storage pipeline with RDMA networking, user-space persistence engine, GDR zero-copy support, Kubernetes Operator management, and integration with SGLang/vLLM and Tair KVCache Manager.

Sources & evidence1

Groundedness: 4/5Type: Blog PostPublished: Jul 2, 2026Publisher: Alibaba CloudEvidence: VendorConfidence: High

Primary source

AI-generated summary. Verify important details with the linked sources before relying on this case.

Explore related AI use cases

Provider evidence

Alibaba provider page Alibaba ranked cases

Market context

Tech & Comms industry insights China country insights

Browse the catalog

All AI use cases AI use-case types Training infrastructure modernization use cases

Tech & Comms Amber Mobile