RustOpen-source AI assistant

The open-source
Claude.

A fully open-source AI assistant built from scratch in Rust. Self-host on your own infrastructure, own your data, extend with plugins. No API keys, no rate limits, no vendor lock-in.

Quick Start View Source

100%

Open Source

Rust

Built in Rust

<2ms

Token Latency

API Keys Needed

openclaude

$ cargo install openclaude

Compiling openclaude v0.1.0 (registry+https://github.com/rust-lang/crates.io-index)

Finished release [optimized] target(s) in 42.3s

$ openclaude serve --model llama-3.3-70b

Server running on http://localhost:3000

Model loaded in 1.2s · VRAM: 38.4 GB · Ready

$ openclaude chat "Explain quantum computing simply"

Quantum computing uses qubits that can be 0, 1, or both at once

(superposition). This lets quantum computers explore many solutions

simultaneously, making them powerful for specific problems..._

Features

Everything Claude does. But yours.

No closed APIs, no monthly bills, no data leaving your servers. Just powerful AI you fully control.

Self-Hosted

Run on your own hardware — a laptop, a GPU server, or a Kubernetes cluster. Your infrastructure, your rules.

100% Private

No data ever leaves your machine. No telemetry, no analytics, no phone-home. Fully air-gapped capable.

Plugin System

Extend with WASM plugins. Add tools, custom models, RAG pipelines, and middleware — all hot-reloadable.

OpenAI-Compatible API

Drop-in replacement for OpenAI and Claude APIs. Your existing code works without changes.

Streaming First

Server-Sent Events streaming out of the box. Real-time token generation with sub-millisecond overhead.

MIT Licensed

Fully open source under MIT. Read every line, fork it, ship it in your product. No CLA, no strings.

Architecture

Built on Rust. Built to last.

Every component is written in Rust for maximum performance, memory safety, and reliability. No garbage collector pauses, no runtime overhead.

System Overview

// Inference Engine

GGML/GGUF model loading

Quantization: Q4_K_M, Q5_K_M, Q8_0

Flash Attention v2

Speculative decoding

// Serving Layer

Axum HTTP server

OpenAI-compatible API

Server-Sent Events streaming

Concurrent request batching

// Plugin System

WASM plugin runtime

Tool calling & function execution

RAG pipeline with vector search

Custom middleware hooks

Native Performance

No interpreter, no VM, no garbage collector. Compiled to bare metal for maximum throughput on inference workloads.

Memory Safety

Rust's ownership model prevents buffer overflows, data races, and use-after-free bugs at compile time. No segfaults in production.

Fearless Concurrency

Handle thousands of concurrent connections with async/await and zero-cost abstractions. No GIL, no thread-safety surprises.

Tiny Footprint

Single static binary, ~15MB. No Python environment, no node_modules, no Docker required. Just copy and run.

Performance

Fast is an understatement.

Native Rust performance means lower latency, less memory, and more tokens per second than Python-based alternatives.

Metric	Python (llama.cpp bindings)	OpenClaude (Rust)
Tokens/sec (generation)	~42 tok/s	~67 tok/s
Time to first token	~180ms	~45ms
Memory overhead	~2.1 GB	~340 MB
Concurrent requests (p99)	~320ms @ 50 rps	~85ms @ 50 rps
Binary size	~1.2 GB (env)	~15 MB
Cold start	~4.2s	~0.3s

Measured on Llama 3.3 70B Q4_K_M, single NVIDIA A100 80GB. Your results may vary.

Claude vs. OpenClaude

We love Claude. We just think AI should also be open.

	Claude (Anthropic)	OpenClaude
Source code	Closed source	Fully open (MIT)
Data privacy	Sent to Anthropic servers	Never leaves your machine
Pricing	$20/mo or per-token API	Free forever
Model choice	Claude models only	Any GGUF model
Self-hosting	Not possible	Single binary, any hardware
Customization	System prompts only	Full source + WASM plugins
Rate limits	Per-plan limits	Unlimited (your hardware)
Offline use	Requires internet	Fully air-gapped

Models

Run any open model.

Load GGUF models from Hugging Face or any local file. Switch models in seconds, no redeployment needed.

Llama 3.3

70B, 8B variants by Meta

Mistral

7B, Mixtral 8x7B, Large

Qwen 2.5

72B, 32B, 7B by Alibaba

DeepSeek V3

MoE architecture, 685B

Gemma 2

27B, 9B, 2B by Google

Phi-4

14B by Microsoft

Command R+

104B by Cohere

Your Fine-tune

Any GGUF model

And any GGUF-compatible model from Hugging Face, Ollama, or your own fine-tunes.

Workflow

Turn Claude chats into PowerPoint.

Got a great answer from Claude or OpenClaude? Paste the markdown into ChatSlide and get a fully editable .pptx deck — professional themes and layouts applied automatically, no manual design.

Claude to PowerPoint converter

Editable slides compatible with Microsoft Office, Google Slides, and Keynote. Free to start, no credit card.

Try ChatSlide.ai

Up and running in 60 seconds.

Install from crates.io, download a model, start chatting. That's it.

1Install OpenClaude