Red Hat AI

A stylized illustration representing an artificial neural network, set against a dark purple background within a slightly rounded, darker purple square icon shape. The neural network consists of multiple layers of interconnected nodes, depicted as glossy, spherical red orbs. Lines connect these red orbs, forming a complex web. White arrow shapes extend horizontally from the left side, pointing towards the network, suggesting input or data flowing into the system.
Article

How speculative decoding delivers faster LLM inference

Sawyer Bowerman

Learn how speculative decoding can improve the performance of large language models (LLMs) in production by using a small, fast model to generate tokens speculatively and a large model to verify them.

A stylized illustration representing an artificial neural network, set against a dark purple background within a slightly rounded, darker purple square icon shape. The neural network consists of multiple layers of interconnected nodes, depicted as glossy, spherical red orbs. Lines connect these red orbs, forming a complex web. White arrow shapes extend horizontally from the left side, pointing towards the network, suggesting input or data flowing into the system.
Article

Intelligent inference scheduling with llm-d on Red Hat AI

Madhu Goutham Reddy Ambati +1

Learn how llm-d routes each inference request to the GPU that already has the relevant data cached, cutting down on time-to-first-token, and doubling throughput without changing hardware. Discover how Red Hat's stack packages this neatly into a single Kubernetes resource.

Red Hat AI
Article

Bring your own evaluation framework to EvalHub

William Caban Babilonia +2

Learn how to onboard a custom evaluation framework into EvalHub using one class, one method, and a container image. This guide covers the contract, data structures, and a complete minimal adapter.

Red Hat AI
Article

Understanding evaluation collections in EvalHub

William Caban Babilonia +2

Learn how to read an existing system collection, understand its threshold logic, and build your own collection that encodes your actual measurement strategy with thresholds that mean something.

Featured image for vLLM interference article.
Article

Speculators v0.5.0: DFlash support and online training

Helen Zhao +2

Speculators v0.5.0 introduces DFlash support, enabling single-pass draft token generation with block diffusion for more efficient speculative decoding workflows. The release also adds unified online and offline training through vLLM’s native hidden states extraction system, improving training flexibility, version stability, and production readiness.

ai-ml
Article

Evaluation-driven development with EvalHub

William Caban Babilonia +1

Learn how evaluation-driven development (EDD) turns AI optimization from an art into an engineering discipline with EvalHub.

Featured image for Red Hat OpenShift AI.
Article

Build an enterprise RAG system with OGX

Abdelhamid Soliman

Learn how to transform a simple chatbot into an enterprise RAG application by applying metadata filtering, hybrid search, and neural reranking using the OGX framework in Red Hat OpenShift AI.

Featured image for agentic AI
Article

How Kagenti ADK simplifies production AI agent management

Legare Kerrison

Learn how Kagenti ADK, an open source toolkit, handles the complexities of managing production AI agents. It aligns with the Linux Foundation's Agent2Agent (A2A) protocol and provides a set of runtime services for easier deployment and operation.