AGENTS / GITHUB / ratchet
githubinferredactive

ratchet

provenance:github:mahsumaktas/ratchet
WHAT THIS AGENT DOES

Ratchet is a tool that continuously improves your software code, making small, tested changes over time. It addresses the challenge of relying on AI to make large, potentially risky changes to codebases, which can be difficult to manage and undo. Software development teams, especially those working on complex projects, would find this helpful. What makes Ratchet unique is its methodical approach – it only keeps changes that demonstrably improve the code, ensuring stability and preventing unexpected problems. It’s like having a dedicated, tireless assistant constantly refining your software, bit by bit, to make it better and more secure.

View Source ↗First seen 24d agoNot yet hireable
README
# Ratchet

**Autonomous code improvement engine for Claude Code.**

Ratchet makes small, atomic changes to your codebase, measures the impact, and keeps only what improves. Like a ratchet wrench — it only turns forward, never back.

Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch): 126 experiments, 11% improvement, zero human intervention.

---

## Why Ratchet?

Most AI coding tools make large, risky changes. Ratchet takes the opposite approach:

| Traditional AI Coding | Ratchet |
|----------------------|---------|
| Big refactors that might break things | One change at a time |
| "Trust me, it's better" | Frozen metrics prove it |
| Can't undo easily | Git commit or revert, nothing in between |
| Runs once, done | Runs for hours, compounds improvement |
| LLM decides quality | **Scripts decide quality** — deterministic, not probabilistic |

**The key insight:** Claude Code follows markdown instructions, but can forget or skip steps during long sessions. Ratchet adds **enforcement hooks** — bash scripts that mechanically prevent invalid actions. Claude can't commit without passing validation. Can't edit protected files. Can't skip metrics.

## Quick Start

```bash
# Clone and install
git clone https://github.com/mahsumaktas/ratchet.git
cd ratchet
bash install.sh

# Go to your project
cd ~/your-project

# Start Claude Code and run
claude
> /autoresearch            # general improvement
> /autoresearch fix 20     # fix 20 lint/type errors
> /autoresearch security   # security audit
```

Or just tell Claude naturally:
```
> projeyi iyileştir
> uyurken çalış
> analyze and improve this codebase
```

## Features

### 6 Modes

| Mode | Command | What it does |
|------|---------|-------------|
| **run** | `/autoresearch` | General improvement: bugs, lint, types, dead code |
| **fix** | `/autoresearch fix [N]` | Drive lint/type/test errors to zero |
| **debug** | `/autoresearch debug` | Scientific bug hunting: reproduce → 5 Whys → TDD fix |
| **security** | `/autoresearch security` | OWASP Top 10 + STRIDE threat scan |
| **predict** | `/autoresearch predict` | 5-persona analysis (no changes, just prioritized findings) |
| **plan** | `/autoresearch plan` | Interactive wizard → frozen metric config |

### Enforcement Hooks

Unlike markdown-only instructions, Ratchet uses real bash scripts that **mechanically enforce** the rules:

| Hook | Type | What it enforces |
|------|------|-----------------|
| `ar-state-enforcer.sh` | PreToolUse | Blocks edits outside MAKE_CHANGE state, blocks commits outside COMMIT state |
| `ar-boundary-guard.sh` | PreToolUse | Blocks edits to `never_touch` files (*.lock, node_modules, .env, etc.) |
| `ar-metrics-collector.sh` | PostToolUse | Auto-runs frozen metrics + guard + decision engine after every edit |
| `ar-session-restore.sh` | SessionStart | Restores state after Claude Code restart |
| `ar-compact-inject.sh` | PostCompact | Injects state after context compaction |
| `ar-stop-summary.sh` | Stop | Shows progress summary + triggers self-review |

**Performance:** When Ratchet is not active, all hooks exit in <1ms (single file existence check).

### Comprehensive Logging (v2)

Every hook call, every experiment, every decision is logged to project-local JSONL files:

```
.autoresearch/logs/
├── events.jsonl        — hook-level micro events (state transitions, boundary checks, guard runs)
├── experiments.jsonl    — full experiment records (metrics before/after, decision, strategy, duration)
└── insights.jsonl       — self-review learnings (accumulated across sessions)
```

Log rotation: `events.jsonl` auto-rotates at 5MB. Experiments and insights are preserved for analysis.

### Self-Review Engine (v2)

Ratchet analyzes its own performance and auto-adjusts configuration:

| Trigger | When | What it does |
|---------|------|-------------|
| **Session end** | Every stop | Analyzes all experiments, detects patterns |
| **Threshold** | 5 consecutive discards or every 20 experiments | Mid-session course correction |
| **Manual** | `/autoresearch review` | On-demand analysis |

**Auto-detected patterns:**

| Pattern | Action |
|---------|--------|
| Strategy < 10% success (5+ experiments) | Remove from rotation |
| File with 3+ consecutive failures | Add to `never_touch` |
| Last 10 experiments all discarded (local minimum) | Reset strategies + add `discovery` |
| Strategy > 70% success | Prioritize (move to front) |

**Safety:** Self-review can modify `strategy_rotation` and `never_touch`, but **never** touches `guard_command`, `frozen_commands`, `mode`, or `max_experiments`. Config backup taken before every change.

### State Machine

Every experiment follows a strict state machine. Invalid transitions are impossible:

```
BOOTSTRAP → SELECT_TARGET → READ_FILE → MAKE_CHANGE → VALIDATE → DECIDE
                 ↑                                                   |
                 |                                          COMMIT ←─┤ (KEEP)
                 |                                          REVERT ←─┘ (DISCARD)
                 |                                            |
                 └──────────── LOG ←──────────────────────────┘
```

### Guard Commands

Main metrics measure improvement. Guard commands prevent regression:

```
Optimize lint → lint errors decrease ✓
But guard (npm test) fails → DISCARD ✗
```

The guard ensures you never break tests while fixing lint, never break the build while adding types.

### Cross-run Learning (v2.1)

Ratchet remembers what worked and what didn't across sessions:

```
.autoresearch/lessons.jsonl
```

- **50-entry cap** with FIFO eviction
- **30-day time-decay** — old lessons auto-pruned at bootstrap
- **Auto-populated** — KEEP decisions record strategy + file + reason
- **Loaded into CHECKPOINT.md** — Claude sees previous learnings on restart

### Token & Cost Tracking (v2.1)

Track spending per ratchet session:

```bash
# During session — auto-tracked per experiment
# At session end — summary in stop hook:
# [RATCHET] 12 experiments (8 kept / 4 discarded) | Cost: 45K tokens, ~$0.32

# Budget enforcement:
# Set in config: "max_budget_usd": 5.00
# Ratchet stops when budget exceeded
```

### Environment Probing (v2.1)

At bootstrap, `ar-probe.sh` auto-detects:

| Category | Detected |
|----------|----------|
| Languages | Node.js, TypeScript, Python, Rust, Go, Ruby, Java, Shell |
| Test runners | jest, vitest, mocha, pytest, cargo test, go test |
| Linters | eslint, biome, ruff, clippy, go vet, rubocop |
| Type checkers | tsc, mypy, pyright |
| Frameworks | Next.js, Nuxt, Vite, Angular, Svelte, Django, Flask |
| CI | GitHub Actions, GitLab CI, Jenkins, CircleCI |
| Monorepo | lerna, pnpm workspaces, Cargo workspaces, nx |

Results saved to `state.json` as `environment` field — used for smarter strategy selection.

### Mechanical Decision Engine

The keep/discard decision is made by `ar-decide.sh` — a deterministic script, not an LLM judgment:

| Metrics | Guard | Decision |
|---------|-------|----------|
| Improved | PASS | **KEEP** |
| Same, code shorter | PASS | **KEEP** |
| Same, code same size | PASS | **KEEP** |
| Same, code longer | PASS | **DISCARD** |
| All errored | Any | **DISCARD** |
| Improved | FAIL | **DISCARD** |
| Worsened | Any | **DISCARD** |

Code size is measured via `git diff --stat` (insertions minus deletions) — only checked when all metrics are unchanged.

### Context-Reset Proof

Ratchet survives Claude Code restarts and context compaction:
- `state.json` — machine state, persisted to disk after every transition
- `CHECKPOINT.md` — self-contained document for zero-context resume
- `SessionStart` hook — auto-restores state on restart
- `PostCompact` hook — auto-injects state after compaction

### Strategy Rotation

When stuck (5 consecutive discards), Ratchet automatically switches strategy:

```
default → low-hanging-fruit → deep-refactor → security-sweep → dead-code-cleanup → discovery-driven
```

After 10 consecutive discard

[truncated…]

PUBLIC HISTORY

First discoveredMar 25, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub
first seenMar 24, 2026
last updatedMar 24, 2026
last crawledtoday
version

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:mahsumaktas/ratchet)