openclaw-optimization-guide

provenance:github:OnlyTerp/openclaw-optimization-guide

WHAT THIS AGENT DOES

This agent helps you make your AI assistant, OpenClaw, work much faster and more effectively. It addresses the common issues of slow performance and limited memory that many users experience, allowing the AI to handle larger tasks and remember more information over time. Business users who rely on OpenClaw for research, automation, or decision-making would find this guide invaluable.

View Source ↗First seen 4mo agoNot yet hireable

USE CASES

Documentation Automation Research

README

# OpenClaw Optimization Guide
### Make Your OpenClaw AI Agent Faster, Smarter, and Actually Useful
#### Speed optimization, memory architecture, context management, model selection, and one-shot development for OpenClaw

*By Terp - [Terp AI Labs](https://x.com/OnlyTerp)*

---

## Table of Contents

1. [Speed](#part-1-speed-stop-being-slow) - Trim context files, add fallbacks, manage reasoning mode
2. [Context Bloat](#part-2-context-bloat-the-silent-performance-killer) - Quadratic scaling, built-in defenses
3. [Cron Session Bloat](#part-3-cron-session-bloat-the-hidden-killer) - Session file accumulation, cleanup
4. [Memory](#part-4-memory-stop-forgetting-everything) - 3-tier memory system, Ollama vector search
5. [Orchestration](#part-5-orchestration-stop-doing-everything-yourself) - Sub-agent delegation, CEO/COO/Worker model
6. [Models](#part-6-models-what-to-actually-use) - Provider comparison, pricing, local models
7. [Web Search](#part-7-web-search-give-your-agent-eyes-on-the-internet) - Tavily, Brave, Serper, Gemini grounding
8. [One-Shotting Big Tasks](#part-8-one-shotting-big-tasks-stop-iterating-start-researching) - Research-first methodology
9. [Vault Memory System](#part-9-vault-memory-system-stop-losing-knowledge-between-sessions) - Structured knowledge graph, MOCs, cross-session continuity
10. [State-of-the-Art Embeddings](./part10-state-of-the-art-embeddings.md) - Upgrade from nomic to Qwen3-VL, Stark Edition server, Windows gotchas
11. [Auto-Capture Hook](./part11-auto-capture-hook.md) - Automatic knowledge extraction after every session, no manual memory writes
12. [Self-Improving System](./part12-self-improving-system.md) - Micro-learning loop that compounds forever, $0/day
13. [Memory Bridge](./part13-memory-bridge.md) - Give coding agents (Codex/Claude Code) access to your vault knowledge
14. [Quick Checklist](#part-14-quick-checklist) - 30-minute setup checklist
15. [Infrastructure Hardening](./part15-infrastructure-hardening.md) - Compaction crash loops, GPU contention, Gemini Flash purge, Tavily migration, gateway crash-loop fix
16. [autoDream Memory Consolidation](./part16-autodream-memory-consolidation.md) - Automatic memory cleanup inspired by Claude Code's leaked source. 3-gate trigger, 4-phase execution, works with any OpenClaw setup
17. [The One-Shot Prompt](#part-17-the-one-shot-prompt) - Copy-paste automation prompt that does the entire setup

**📊 [Benchmarks](./benchmarks/)** — Real numbers from a production system (context savings, search latency, reindex results, SWE-bench rankings)

**📁 [Example Vault](./examples/)** — Populated mini-vault showing MOCs, wiki-links, Agent Notes, and .learnings/ after 2 weeks of use

---

## The Problem

If you're running a stock OpenClaw setup, you're probably dealing with:

- **Freezing and hitting context limits.** Bloated workspace files exhaust the context window mid-response.
- **Slow responses.** 15-20KB+ of context injected every message = hundreds of milliseconds of latency per reply.
- **Forgetting everything.** New session = blank slate. No memory of yesterday's work or decisions.
- **Inconsistent behavior.** Without clear rules, personality drifts between sessions.
- **Doing everything the expensive way.** Main model writes code, does research, AND orchestrates - all at top-tier pricing.
- **Flying blind.** No web search means guessing at anything after training cutoff.
- **Wrong model choice.** Using whatever was default without considering the tradeoffs.

## What This Fixes

After this setup:

| Metric | Before | After |
|--------|--------|-------|
| Context per msg | 15-20 KB | 4-5 KB |
| Time to respond | 4-8 sec | 1-2 sec |
| Memory recall | Forgets daily | Remembers weeks |
| Token cost/msg | ~5,000 tokens | ~1,500 tokens |
| Long sessions | Degrades | Stable |
| Concurrent tasks | One at a time | Multiple parallel |

### How It Works

```
You ask a question
    ↓
Orchestrator (main model, lean context ~5KB)
    ↓
┌─────────────────────────────────────────┐
│  memory_search() - 45ms, local, $0     │
│  ┌─────────┐  ┌──────────┐  ┌────────┐ │
│  │MEMORY.md│→ │memory/*.md│→ │vault/* │ │
│  │(index)  │  │(quick)   │  │(deep)  │ │
│  └─────────┘  └──────────┘  └────────┘ │
└─────────────────────────────────────────┘
    ↓
Only relevant context loaded (~200 tokens)
    ↓
Fast, accurate response + sub-agents for heavy work
```

**The key insight:** Workspace files become **lightweight routers, not storage.** All knowledge lives in a local vector database. The bot loads only what it needs - not everything it's ever learned.

### What The Optimized Files Look Like

Full versions in [`/templates`](./templates):

**SOUL.md** (772 bytes - injected every message):
```markdown
## Who You Are
- Direct, concise, no fluff. Say the useful thing, then stop.
- Have opinions. Disagree when warranted. No sycophancy.

## Memory Rule
Before answering about past work, projects, people, or decisions:
run memory_search FIRST. It costs 45ms. Not searching = wrong answers.

## Orchestrator Rule
You coordinate; sub-agents execute. Never write 50+ lines of code yourself.
```

**MEMORY.md** (581 bytes - slim pointer index):
```markdown
## Active Projects
- Project A → vault/projects/project-a.md
- Project B → vault/projects/project-b.md

## Key People
- Person A - role, relationship → vault/people/person-a.md
```

Details live in vault/. The bot finds them via vector search in 45ms.

This isn't a settings tweak - it's a **complete architecture change**: memory routing, context engineering, and orchestration working together. The one-shot prompt at the bottom does the entire setup automatically.

> **Note:** Tested on Claude Opus 4.6. Other frontier models should work if they can follow multi-step instructions.

> **Templates included:** Check [`/templates`](./templates) for ready-to-use versions of SOUL.md, AGENTS.md, MEMORY.md, TOOLS.md, and a sample vault/ structure.

---

## Part 1: Speed (Stop Being Slow)

Every message you send, OpenClaw injects ALL your workspace files into the prompt. Bloated files = slower, more expensive replies. This is the #1 speed issue people don't realize they have.

### Why Trimming Works

**You don't need big files once you have vector search.**

Old approach: Stuff everything into MEMORY.md so the bot "sees" it every message → 15KB+ context, slow responses, wasted tokens on irrelevant info.

New approach: MEMORY.md is a slim index of pointers. Full details live in vault/. `memory_search()` finds them instantly via local Ollama embeddings ($0). Your workspace files stay tiny without losing any knowledge.

### Trim Your Context Files

| File | Target Size | What Goes In It | Why This Size |
|------|------------|-----------------|---------------|
| SOUL.md | < 1 KB | Personality, tone, core rules | Injected EVERY message - every byte costs latency |
| AGENTS.md | 2-10 KB | Decision tree, tool routing, operational protocols (autoDream, coordinator) | Operational protocols are worth the context cost — they replace manual prompting |
| MEMORY.md | < 3 KB | **Pointers only** - NOT full docs | Vector search replaces big files |
| TOOLS.md | < 1 KB | Tool names + one-liner usage | Just reminders, not documentation |
| **Total** | **8-15 KB** | Everything injected per message | With operational protocols (autoDream, coordinator), 8-15KB is acceptable — these replace manual prompting that would cost more |

**Rule:** If it's longer than a tweet thread, it's too long for a workspace file. Move the details to vault/.

### Add a Fallback Model

```json
"fallbackModels": ["your-provider/faster-cheaper-model"]
```

OpenClaw automatically switches when your main model is rate-limited or slow.

### Reasoning Mode - Know the Tradeoff

Run `/status` to see your current reasoning mode.

- **Off** - fastest, no thinking phase
- **Low** - slight thinking, faster responses
- **High** - deep reasoning, adds 2-5 seconds but catches things low/off misses

I run **high** and keep it there. 

[truncated…]

PUBLIC HISTORY

First discoveredApr 1, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenMar 13, 2026

last updatedMar 31, 2026

last crawled17 days ago

version—

RELATED AGENTS

askimo

Askimo is a platform that lets you interact with artificial intelligence in a simple way, whether through chatting, sear

blog-writer-multi-agents

Here's a plain English summary of the blog-writer-multi-agents AI agent: This agent automatically creates professional-

boss-skill

This agent, boss-skill, is designed to help employees navigate challenging workplace dynamics, particularly those involv

strands-multi-engineer-agent

This agent helps businesses understand how different AI models perform when tackling the same engineering task. It runs

J.E.L.L.Y._AI

J.E.L.L.Y._AI is an article writing AI developed by its creator. This repository is publicly available for job seeking p

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:OnlyTerp/openclaw-optimization-guide)