Skip to content

Latest commit

 

History

History
executable file
·
65 lines (51 loc) · 1.72 KB

File metadata and controls

executable file
·
65 lines (51 loc) · 1.72 KB

Overview

Links

DLSlime is dedicated to supporting efficient transmission over a variety of different links, including but not limited to IBVerbs, CUDA IPC, TCP Socket, PCIE, NVShmem, Ascend (Direct), NVME-oF ...

Transfer Engine

DLSlime provides a flexible and efficient P2P Transfer Engine, enabling AI-workload-aware customized functions such as Prefill-Decode separation and checkpoint transmission.

Collective Ops

Referring to DeepEP, DLSlime provides a buffer-based collective communication library that achieves ultra-low latency and SM-free collective communications.

Torch Wrapper

To meet the heterogeneous requirements of SPMD programs such as heterogeneous pipeline parallel training, a Torch communication backend is provided.

Transfer Engine Roadmap

  • IBVerbs Transfer Engine
    • ✅ SendRecv Endpoint
    • ✅ RDMA Read/Write Endpoint
  • NVShmem
    • ✅ NVShmem Context and Send/Recv Kernel
    • ⚡ support NVShmem put and get wrapper
  • TCP Socket
    • ✅ zmq bootstrap
    • ⏳ TCP Socket transfer engine
  • CUDA IPC
    • ✅ support CUDAIPC Read/Write Endpoint
  • PCIE
    • ⏳ High performance Shared Memory transfer engine
    • ⏳ High performance data offloading
  • Ascend
    • ✅ Ascned direct transfer engine
  • NVME-oF
    • 💭 Planning
  • UB Mesh
    • 💭 Planning

Collective Ops

  • IBVerbs
    • ✅ Send/Recv
    • ⚡ M2N for attention-FFN disaggregation
    • ⏳ AllGather
    • ⏳ AllReduce
    • ⏳ All2All
  • NVShmem
    • ⏳ Send/Recv
    • ✅ AllGather
    • ⏳ AllReduce
    • ⏳ All2All
  • CUDA IPC
    • ✅ AllGather
    • ⚡ High performance AllGather using CUDA Multi-Mem

Torch Wrapper

  • IBVerbs
    • ✅ Send/Recv
    • ⏳ AllGather
    • ⏳ AllReduce
    • ⏳ All2All