The open-source
Claude.
A fully open-source AI assistant built from scratch in Rust. Self-host on your own infrastructure, own your data, extend with plugins. No API keys, no rate limits, no vendor lock-in.
$ cargo install openclaude
Compiling openclaude v0.1.0 (registry+https://github.com/rust-lang/crates.io-index)
Finished release [optimized] target(s) in 42.3s
$ openclaude serve --model llama-3.3-70b
Server running on http://localhost:3000
Model loaded in 1.2s · VRAM: 38.4 GB · Ready
$ openclaude chat "Explain quantum computing simply"
Quantum computing uses qubits that can be 0, 1, or both at once
(superposition). This lets quantum computers explore many solutions
simultaneously, making them powerful for specific problems..._
Features
Everything Claude does. But yours.
No closed APIs, no monthly bills, no data leaving your servers. Just powerful AI you fully control.
Self-Hosted
Run on your own hardware — a laptop, a GPU server, or a Kubernetes cluster. Your infrastructure, your rules.
100% Private
No data ever leaves your machine. No telemetry, no analytics, no phone-home. Fully air-gapped capable.
Plugin System
Extend with WASM plugins. Add tools, custom models, RAG pipelines, and middleware — all hot-reloadable.
OpenAI-Compatible API
Drop-in replacement for OpenAI and Claude APIs. Your existing code works without changes.
Streaming First
Server-Sent Events streaming out of the box. Real-time token generation with sub-millisecond overhead.
MIT Licensed
Fully open source under MIT. Read every line, fork it, ship it in your product. No CLA, no strings.
Architecture
Built on Rust. Built to last.
Every component is written in Rust for maximum performance, memory safety, and reliability. No garbage collector pauses, no runtime overhead.
System Overview
// Inference Engine
GGML/GGUF model loading
Quantization: Q4_K_M, Q5_K_M, Q8_0
Flash Attention v2
Speculative decoding
// Serving Layer
Axum HTTP server
OpenAI-compatible API
Server-Sent Events streaming
Concurrent request batching
// Plugin System
WASM plugin runtime
Tool calling & function execution
RAG pipeline with vector search
Custom middleware hooks
Native Performance
No interpreter, no VM, no garbage collector. Compiled to bare metal for maximum throughput on inference workloads.
Memory Safety
Rust's ownership model prevents buffer overflows, data races, and use-after-free bugs at compile time. No segfaults in production.
Fearless Concurrency
Handle thousands of concurrent connections with async/await and zero-cost abstractions. No GIL, no thread-safety surprises.
Tiny Footprint
Single static binary, ~15MB. No Python environment, no node_modules, no Docker required. Just copy and run.
Performance
Fast is an understatement.
Native Rust performance means lower latency, less memory, and more tokens per second than Python-based alternatives.
| Metric | Python (llama.cpp bindings) | OpenClaude (Rust) |
|---|---|---|
| Tokens/sec (generation) | ~42 tok/s | ~67 tok/s |
| Time to first token | ~180ms | ~45ms |
| Memory overhead | ~2.1 GB | ~340 MB |
| Concurrent requests (p99) | ~320ms @ 50 rps | ~85ms @ 50 rps |
| Binary size | ~1.2 GB (env) | ~15 MB |
| Cold start | ~4.2s | ~0.3s |
Measured on Llama 3.3 70B Q4_K_M, single NVIDIA A100 80GB. Your results may vary.
Claude vs. OpenClaude
We love Claude. We just think AI should also be open.
| Claude (Anthropic) | OpenClaude | |
|---|---|---|
| Source code | Closed source | Fully open (MIT) |
| Data privacy | Sent to Anthropic servers | Never leaves your machine |
| Pricing | $20/mo or per-token API | Free forever |
| Model choice | Claude models only | Any GGUF model |
| Self-hosting | Not possible | Single binary, any hardware |
| Customization | System prompts only | Full source + WASM plugins |
| Rate limits | Per-plan limits | Unlimited (your hardware) |
| Offline use | Requires internet | Fully air-gapped |
Models
Run any open model.
Load GGUF models from Hugging Face or any local file. Switch models in seconds, no redeployment needed.
And any GGUF-compatible model from Hugging Face, Ollama, or your own fine-tunes.
Up and running in 60 seconds.
Install from crates.io, download a model, start chatting. That's it.
$ openclaude chat
Welcome to OpenClaude. How can I help?
openclaude serve