DLSlime is dedicated to supporting efficient transmission over a variety of different links, including but not limited to IBVerbs, CUDA IPC, TCP Socket, PCIE, NVShmem, Ascend (Direct), NVME-oF ...
DLSlime provides a flexible and efficient P2P Transfer Engine, enabling AI-workload-aware customized functions such as Prefill-Decode separation and checkpoint transmission.
Referring to DeepEP, DLSlime provides a buffer-based collective communication library that achieves ultra-low latency and SM-free collective communications.
To meet the heterogeneous requirements of SPMD programs such as heterogeneous pipeline parallel training, a Torch communication backend is provided.
- IBVerbs Transfer Engine
- ✅ SendRecv Endpoint
- ✅ RDMA Read/Write Endpoint
- NVShmem
- ✅ NVShmem Context and Send/Recv Kernel
- ⚡ support NVShmem put and get wrapper
- TCP Socket
- ✅ zmq bootstrap
- ⏳ TCP Socket transfer engine
- CUDA IPC
- ✅ support CUDAIPC Read/Write Endpoint
- PCIE
- ⏳ High performance Shared Memory transfer engine
- ⏳ High performance data offloading
- Ascend
- ✅ Ascned direct transfer engine
- NVME-oF
- 💭 Planning
- UB Mesh
- 💭 Planning
- IBVerbs
- ✅ Send/Recv
- ⚡ M2N for attention-FFN disaggregation
- ⏳ AllGather
- ⏳ AllReduce
- ⏳ All2All
- NVShmem
- ⏳ Send/Recv
- ✅ AllGather
- ⏳ AllReduce
- ⏳ All2All
- CUDA IPC
- ✅ AllGather
- ⚡ High performance AllGather using CUDA Multi-Mem
- IBVerbs
- ✅ Send/Recv
- ⏳ AllGather
- ⏳ AllReduce
- ⏳ All2All