FlexKV supports distributed KVCache reuse functionality. In a multi-node serving environment, you can enable KVCache sharing across multiple nodes by simply configuring a few options.
FlexKV implements KVCache index management by building a local snapshot of the global index, enabling fast global KVCache lookup and reuse. It uses Mooncake for actual data transfer and Redis for metadata management.
This guide uses an example of starting two serving instances on a single node to demonstrate how to run FlexKV with distributed support.
Pull the vLLM image and create a container:
docker run -it --name flexkv_dist_env \
-v /home/FlexKV:/workspace \
--gpus all \
--ipc=host \
--net=host \
--device=/dev/infiniband/uverbs0 \
--device=/dev/infiniband/uverbs1 \
--device=/dev/infiniband/uverbs2 \
--device=/dev/infiniband/uverbs3 \
--device=/dev/infiniband/uverbs4 \
--device=/dev/infiniband/uverbs5 \
--device=/dev/infiniband/uverbs6 \
--device=/dev/infiniband/uverbs7 \
--device=/dev/infiniband/rdma_cm \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--privileged \
--entrypoint /bin/bash \
vllm/vllm-openai:v0.10.1.1Install the necessary packages inside the container (for distributed communication and Redis support):
# RDMA and base libraries
apt update && apt install -y libibverbs-dev ibverbs-utils rdma-core
# io_uring and xxhash
apt install -y liburing-dev libxxhash0 libxxhash-dev
# Redis client library
apt install -y libhiredis-dev
# JSON and other dependencies
apt install -y libjsoncpp-dev libgflags-dev libgflags2.2
# Redis tools and Python packages
apt install -y redis-tools
pip install redis pandas datasetsWe recommend building from source to enable Redis support:
git clone https://github.com/kvcache-ai/Mooncake.git
cd Mooncake
bash dependencies.sh
mkdir build && cd build
cmake .. -DUSE_TENT=ON -DUSE_REDIS=ON
make -j
sudo make installClone vLLM, checkout version v0.10.1.1, apply the FlexKV patch, then build vLLM and FlexKV from source.
For detailed instructions, please refer to: FlexKV vLLM Adapter Documentation
Note: Set the following environment variable before building/installing FlexKV:
export FLEXKV_ENABLE_P2P=1Start Redis services on one node. You need to start two instances: one for Mooncake Engine and one for FlexKV.
# Redis for FlexKV (port 6379)
redis-server --port 6379 --bind 10.6.131.12 --requirepass redis-serving-passwd
# Redis for Mooncake (port 6380)
redis-server --port 6380 --bind 10.6.131.12 --requirepass redis-serving-passwdUse the script to start multiple vLLM instances.
Note: You need to configure the IP, ports, number of nodes, FlexKV capacity settings, and prepare FlexKV and Mooncake configuration files as well as vLLM serving configurations according to your actual environment. All these settings are in the
start_multi_node_serving.shscript, which you can refer to and modify.
cd FlexKV/scripts/multi-nodes
# Start 2 instances, with service ports starting from 30001
bash start_multi_node_serving.sh 2 30001Send benchmark requests to the vLLM instances on consecutive ports starting from 30001.
TODO: To be added