Skip to content

Tags: ayinedjimi/KVortex

Tags

v1.0

Toggle v1.0's commit message
Initial release: KVortex v1.0 - VRAM to RAM Offloader

KVortex is a production-grade C++23 VRAM to RAM offloading system
for AI inference workloads, optimized for vLLM 0.15.

Features:
- Multi-stream GPU transfers (20+ GB/s bandwidth)
- NUMA-aware memory management
- SHA256 content-addressable caching
- LRU eviction policy with O(1) operations
- Thread-safe concurrent operations
- Modern C++23 with std::expected error handling
- 100% test coverage (10/10 tests passing)
- Zero memory leaks detected

Performance:
- 6x faster TTFT on cache hits
- <5% overhead on cache misses
- Support for 8+ concurrent threads

Author: Ayi NEDJIMI
License: Apache 2.0 (based on LMCache)

Co-Authored-By: Ayi NEDJIMI <contact@ayinedjimi-consultants.fr>