Skip to content

Learn AI Engineering with Real Hardware

Build a self-hosted LLM chat client and learn the full stack — from GPU inference to terminal UI.

Zorac is an educational open-source project that teaches AI engineering concepts by building something real: a ChatGPT-style chat client that runs entirely on your own hardware. No cloud APIs, no monthly costs, complete privacy.

This documentation site goes beyond "how to install" and explains the why behind every design decision — so you can apply these patterns to your own projects.


What You'll Learn

  • Concepts


    Understand the fundamentals: how LLMs generate text, why quantization lets you run 24B-parameter models on a gaming GPU, how inference servers work, and how to manage context windows.

    Start with Concepts

  • Guides


    Step-by-step guides for building each component: setting up a vLLM inference server, building a terminal UI with Textual, and configuring multi-GPU training.

    Browse Guides

  • Walkthroughs


    Trace through the actual source code to see how everything connects. Follow a message from keypress to rendered response, or understand how streaming markdown works.

    Read Walkthroughs

  • Decisions


    Architecture Decision Records explaining why we chose Textual over other TUI frameworks, AWQ over other quantization formats, and other key trade-offs.

    See Decisions


Who This Is For

  • Developers who want to understand how local LLM applications work end-to-end
  • AI engineers looking to run inference on consumer hardware without cloud dependencies
  • Students learning about quantization, tokenization, and context management
  • Homelab enthusiasts who want to self-host their own ChatGPT alternative
  • Anyone with a gaming GPU (RTX 3080 or better) curious about running AI locally

Getting Started Reference
Install Zorac Configuration Reference
Set up a vLLM Server Usage & Commands
Understand Quantization Development Guide
What Happens When You Press Enter GitHub Repository