Usage Guide¶

This guide covers how to use Zorac effectively for interactive chat sessions.

Getting Started¶

Start Zorac using your preferred installation method:

# If installed as binary
zorac

# If installed from source
uv run zorac

# If using uv tool install
zorac

Basic Interaction¶

Simply type your message at the You: prompt and press Enter. Commands auto-complete with Tab:

You: Explain quantum entanglement
Assistant: [Response streams in real-time with markdown formatting...]
Stats: 245 tokens in 3.8s (64.5 tok/s) | Total: 4 msgs | Tokens: 312/12000

The assistant will respond, and the conversation is automatically saved.

Multi-line Input¶

Zorac supports multi-line input via Shift+Enter and clipboard pasting:

Press Enter - Submit your prompt
Press Shift+Enter - Insert a newline (works in iTerm2, kitty, WezTerm, and most modern terminals)
Paste multi-line text - Works seamlessly from clipboard, newlines are preserved

The input bar auto-resizes from 1 to 5 lines as you type.

Example:

You: Please explain the following code:
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)
[Press Enter to submit]

Simply type or paste multi-line text. The newlines will be preserved, and you can submit with Enter.

Getting Help¶

To see all available commands at any time, use the /help command:

You: /help

Available Commands:
  /help              - Show all available commands
  /quit or /exit     - Save and exit the application
  /clear             - Reset conversation to initial system message
  /save              - Manually save session to disk
  /load              - Reload session from disk
  /tokens            - Display current token usage statistics
  /summarize         - Force conversation summarization
  /summary           - Display the current conversation summary
  /config            - Manage configuration settings
    /config list     - Show current configuration
    /config set      - Set a configuration value
    /config get      - Get a specific configuration value

You can also ask the LLM natural language questions about commands:

You: How do I save my session?
Assistant: You can save your session using the /save command. Sessions are also automatically saved after each assistant response...

You: What commands are available?
Assistant: Here are the available commands:
- /help - Shows all available commands
- /clear - Resets the conversation...
[continues listing commands]

Available Commands¶

All commands start with /:

Command	Description
`/help`	Show all available commands with descriptions
`/clear`	Clear conversation history and start fresh
`/save`	Manually save the current session
`/load`	Reload session from disk
`/tokens`	Display current token usage and limits
`/summarize`	Force summarization of conversation history
`/summary`	Display the current conversation summary (if exists)
`/config`	Manage configuration settings (list, set, get)
`/quit` or `/exit`	Save session and exit
`Ctrl+C`	Interrupt streaming response
`Ctrl+D`	Save and exit

Session Management¶

Auto-Save¶

After each assistant response, your conversation is automatically saved to ~/.zorac/session.json.

Manual Save/Load¶

# Manually save current session
You: /save
✓ Session saved to ~/.zorac/session.json

# Reload from disk (discards unsaved changes)
You: /load
✓ Session reloaded (12 messages, ~3421 tokens)

Clear History¶

You: /clear
✓ Conversation history cleared and saved!

Token Management¶

Check Token Usage¶

You: /tokens

📊 Token usage:
   Current: ~3421 tokens
   Limit: 12000 tokens
   Remaining: ~8579 tokens
   Messages: 12

Automatic Summarization¶

When your conversation exceeds MAX_INPUT_TOKENS (default: 12000):

Zorac automatically triggers summarization
Older messages are condensed into a summary
The most recent 6 messages (configurable) are preserved intact
The summary is injected as a system message
This maintains context while staying within limits

Example output:

⏳ Token limit approaching. Summarizing conversation history...
✓ Summarized 15 messages. Kept 6 recent messages.

Manual Summarization¶

You can force summarization before reaching the limit:

You: /summarize
⏳ Summarizing conversation...
✓ Session saved with summary

View Current Summary¶

You: /summary

📝 Current Conversation Summary:
┌────────────────────────────────────────┐
│ [Summary content displayed here...]    │
└────────────────────────────────────────┘

Configuration Commands¶

See the Configuration Guide for full details.

View Configuration¶

You: /config list

Configuration:
  VLLM_BASE_URL:      http://localhost:8000/v1
  VLLM_MODEL:         stelterlab/Mistral-Small-24B-Instruct-2501-AWQ
  MAX_INPUT_TOKENS:   12000
  MAX_OUTPUT_TOKENS:  4000
  KEEP_RECENT_MESSAGES: 6
  TEMPERATURE:        0.1
  STREAM:             true
  TIKTOKEN_ENCODING:  cl100k_base
  CODE_THEME:         monokai
  Config File:        ~/.zorac/config.json

Update Settings¶

# Change server URL
You: /config set VLLM_BASE_URL http://192.168.1.100:8000/v1

# Adjust temperature
You: /config set TEMPERATURE 0.7

# Disable streaming
You: /config set STREAM false

# Change code syntax highlighting theme
You: /config set CODE_THEME dracula

Get Specific Setting¶

You: /config get TEMPERATURE
TEMPERATURE = 0.1

Performance Metrics¶

After each response, Zorac displays performance statistics:

Stats: 147 tokens in 2.34s (62.82 tok/s) | Total msgs: 12 | Tokens: ~3421/12000

Metrics explained: - 147 tokens - Number of tokens in the assistant's response - 2.34s - Time taken to generate the response - 62.82 tok/s - Tokens per second (generation speed) - Total msgs: 12 - Total messages in conversation - Tokens: ~3421/12000 - Current token usage vs. limit

Tips & Best Practices¶

Monitor Token Usage¶

Use /tokens regularly to monitor your conversation size:

You: /tokens

The auto-summarization kicks in at 12k tokens, but you can manually /clear anytime.

Interrupt Long Responses¶

Press Ctrl+C to interrupt a streaming response without losing your session. The partial response is discarded and you can continue chatting immediately:

You: Write a very long essay...
Assistant: [Starts streaming...]
^C
Response interrupted.

Adjust Response Style¶

Change temperature for different response styles:

0.1 = Focused, deterministic (default)
0.7 = Balanced creativity
1.0 = Very creative

You: /config set TEMPERATURE 0.7

Persistent Sessions¶

Sessions persist across restarts. When you restart Zorac, it automatically loads your previous conversation:

$ zorac
✓ Loaded previous session (12 messages, ~3421 tokens)

Fresh Start¶

Use /clear when you want to start a new conversation:

You: /clear
✓ Conversation history cleared and saved!

Example Workflow¶

Here's a typical Zorac session:

# Start Zorac
$ zorac
✓ Loaded previous session (4 messages, ~892 tokens)

# Check what was discussed
You: /summary

# Continue conversation
You: Can you explain that concept in more detail?
Assistant: [Detailed explanation...]
Stats: 342 tokens in 5.2s (65.8 tok/s) | Total: 6 msgs | Tokens: ~1234/12000

# Check token usage
You: /tokens
Current: ~1234 tokens | Limit: 12000 | Remaining: ~10766

# Adjust settings for more creative responses
You: /config set TEMPERATURE 0.8

# Continue chatting...
You: Write a creative story about that concept
Assistant: [Creative story...]

# Save and exit
You: /quit
✓ Session saved. Goodbye!

Troubleshooting¶

Connection Issues¶

Error: Connection refused

Solution: Ensure your vLLM server is running:

# Test server connectivity
curl http://localhost:8000/v1/models

See Configuration Guide for more help.

Token Errors¶

Error: context_length_exceeded

Solution: 1. Use /clear to start fresh 2. Reduce MAX_INPUT_TOKENS: /config set MAX_INPUT_TOKENS 8000 3. Increase messages kept: /config set KEEP_RECENT_MESSAGES 10

Session File Issues¶

Error loading session: [Errno 2] No such file or directory

Solution: This is normal on first run. A new session file will be created automatically.

To reset your session:

rm ~/.zorac/session.json

Next Steps¶

Configuration Guide - Customize Zorac settings
Development Guide - Contribute to Zorac
Server Setup - Set up your vLLM inference server