Learn AI Engineering with Real Hardware¶

Build a self-hosted LLM chat client and learn the full stack — from GPU inference to terminal UI.

Zorac is an educational open-source project that teaches AI engineering concepts by building something real: a ChatGPT-style chat client that runs entirely on your own hardware. No cloud APIs, no monthly costs, complete privacy.

This documentation site goes beyond "how to install" and explains the why behind every design decision — so you can apply these patterns to your own projects.

What You'll Learn¶

Concepts

Understand the fundamentals: how LLMs generate text, why quantization lets you run 24B-parameter models on a gaming GPU, how inference servers work, and how to manage context windows.

Start with Concepts
Guides

Step-by-step guides for building each component: setting up a vLLM inference server, building a terminal UI with Textual, and configuring multi-GPU training.

Browse Guides
Walkthroughs

Trace through the actual source code to see how everything connects. Follow a message from keypress to rendered response, or understand how streaming markdown works.

Read Walkthroughs
Decisions

Architecture Decision Records explaining why we chose Textual over other TUI frameworks, AWQ over other quantization formats, and other key trade-offs.

See Decisions

Who This Is For¶

Developers who want to understand how local LLM applications work end-to-end
AI engineers looking to run inference on consumer hardware without cloud dependencies
Students learning about quantization, tokenization, and context management
Homelab enthusiasts who want to self-host their own ChatGPT alternative
Anyone with a gaming GPU (RTX 3080 or better) curious about running AI locally

Quick Links¶

Getting Started	Reference
Install Zorac	Configuration Reference
Set up a vLLM Server	Usage & Commands
Understand Quantization	Development Guide
What Happens When You Press Enter	GitHub Repository