RustOpen-source AI assistant

The open-source
Claude.

A fully open-source AI assistant built from scratch in Rust. Self-host on your own infrastructure, own your data, extend with plugins. No API keys, no rate limits, no vendor lock-in.

100%
Open Source
Rust
Built in Rust
<2ms
Token Latency
0
API Keys Needed
openclaude

$ cargo install openclaude

Compiling openclaude v0.1.0 (registry+https://github.com/rust-lang/crates.io-index)

Finished release [optimized] target(s) in 42.3s

$ openclaude serve --model llama-3.3-70b

Server running on http://localhost:3000

Model loaded in 1.2s · VRAM: 38.4 GB · Ready

$ openclaude chat "Explain quantum computing simply"

Quantum computing uses qubits that can be 0, 1, or both at once

(superposition). This lets quantum computers explore many solutions

simultaneously, making them powerful for specific problems..._

Features

Everything Claude does. But yours.

No closed APIs, no monthly bills, no data leaving your servers. Just powerful AI you fully control.

Self-Hosted

Run on your own hardware — a laptop, a GPU server, or a Kubernetes cluster. Your infrastructure, your rules.

100% Private

No data ever leaves your machine. No telemetry, no analytics, no phone-home. Fully air-gapped capable.

Plugin System

Extend with WASM plugins. Add tools, custom models, RAG pipelines, and middleware — all hot-reloadable.

OpenAI-Compatible API

Drop-in replacement for OpenAI and Claude APIs. Your existing code works without changes.

Streaming First

Server-Sent Events streaming out of the box. Real-time token generation with sub-millisecond overhead.

MIT Licensed

Fully open source under MIT. Read every line, fork it, ship it in your product. No CLA, no strings.

Architecture

Built on Rust. Built to last.

Every component is written in Rust for maximum performance, memory safety, and reliability. No garbage collector pauses, no runtime overhead.

System Overview

// Inference Engine

GGML/GGUF model loading

Quantization: Q4_K_M, Q5_K_M, Q8_0

Flash Attention v2

Speculative decoding

// Serving Layer

Axum HTTP server

OpenAI-compatible API

Server-Sent Events streaming

Concurrent request batching

// Plugin System

WASM plugin runtime

Tool calling & function execution

RAG pipeline with vector search

Custom middleware hooks

Native Performance

No interpreter, no VM, no garbage collector. Compiled to bare metal for maximum throughput on inference workloads.

Memory Safety

Rust's ownership model prevents buffer overflows, data races, and use-after-free bugs at compile time. No segfaults in production.

Fearless Concurrency

Handle thousands of concurrent connections with async/await and zero-cost abstractions. No GIL, no thread-safety surprises.

Tiny Footprint

Single static binary, ~15MB. No Python environment, no node_modules, no Docker required. Just copy and run.

Performance

Fast is an understatement.

Native Rust performance means lower latency, less memory, and more tokens per second than Python-based alternatives.

MetricPython (llama.cpp bindings)OpenClaude (Rust)
Tokens/sec (generation)~42 tok/s~67 tok/s
Time to first token~180ms~45ms
Memory overhead~2.1 GB~340 MB
Concurrent requests (p99)~320ms @ 50 rps~85ms @ 50 rps
Binary size~1.2 GB (env)~15 MB
Cold start~4.2s~0.3s

Measured on Llama 3.3 70B Q4_K_M, single NVIDIA A100 80GB. Your results may vary.

Claude vs. OpenClaude

We love Claude. We just think AI should also be open.

Claude (Anthropic)OpenClaude
Source codeClosed sourceFully open (MIT)
Data privacySent to Anthropic serversNever leaves your machine
Pricing$20/mo or per-token APIFree forever
Model choiceClaude models onlyAny GGUF model
Self-hostingNot possibleSingle binary, any hardware
CustomizationSystem prompts onlyFull source + WASM plugins
Rate limitsPer-plan limitsUnlimited (your hardware)
Offline useRequires internetFully air-gapped

Models

Run any open model.

Load GGUF models from Hugging Face or any local file. Switch models in seconds, no redeployment needed.

Llama 3.3
70B, 8B variants by Meta
Mistral
7B, Mixtral 8x7B, Large
Qwen 2.5
72B, 32B, 7B by Alibaba
DeepSeek V3
MoE architecture, 685B
Gemma 2
27B, 9B, 2B by Google
Phi-4
14B by Microsoft
Command R+
104B by Cohere
Your Fine-tune
Any GGUF model

And any GGUF-compatible model from Hugging Face, Ollama, or your own fine-tunes.

Up and running in 60 seconds.

Install from crates.io, download a model, start chatting. That's it.

1Install OpenClaude
$ cargo install openclaude
2Download a model
$ openclaude pull llama-3.3-70b-q4
3Start chatting

$ openclaude chat

Welcome to OpenClaude. How can I help?

Or start the OpenAI-compatible API server: openclaude serve