Learn AI Engineering with Real Hardware¶
Build a self-hosted LLM chat client and learn the full stack — from GPU inference to terminal UI.
Zorac is an educational open-source project that teaches AI engineering concepts by building something real: a ChatGPT-style chat client that runs entirely on your own hardware. No cloud APIs, no monthly costs, complete privacy.
This documentation site goes beyond "how to install" and explains the why behind every design decision — so you can apply these patterns to your own projects.
What You'll Learn¶
-
Concepts
Understand the fundamentals: how LLMs generate text, why quantization lets you run 24B-parameter models on a gaming GPU, how inference servers work, and how to manage context windows.
-
Guides
Step-by-step guides for building each component: setting up a vLLM inference server, building a terminal UI with Textual, and configuring multi-GPU training.
-
Walkthroughs
Trace through the actual source code to see how everything connects. Follow a message from keypress to rendered response, or understand how streaming markdown works.
-
Decisions
Architecture Decision Records explaining why we chose Textual over other TUI frameworks, AWQ over other quantization formats, and other key trade-offs.
Who This Is For¶
- Developers who want to understand how local LLM applications work end-to-end
- AI engineers looking to run inference on consumer hardware without cloud dependencies
- Students learning about quantization, tokenization, and context management
- Homelab enthusiasts who want to self-host their own ChatGPT alternative
- Anyone with a gaming GPU (RTX 3080 or better) curious about running AI locally
Quick Links¶
| Getting Started | Reference |
|---|---|
| Install Zorac | Configuration Reference |
| Set up a vLLM Server | Usage & Commands |
| Understand Quantization | Development Guide |
| What Happens When You Press Enter | GitHub Repository |