tri-memory

provenance:github:amabito/tri-memory

WHAT THIS AGENT DOES

What it does: Tri-memory helps AI agents remember and learn from much longer conversations and interactions than currently possible. It combines different memory techniques to keep track of recent events, archived information, and recurring patterns, all within a small, consistent memory footprint. What problem it solves: Many AI tools struggle to maintain context over extended conversations, leading to repetitive or irrelevant responses. Tri-memory addresses this by allowing agents to effectively recall information from a much larger history, improving the quality and consistency of their interactions. Who would use it: Businesses using AI chatbots, virtual assistants, or other conversational AI applications would benefit from tri-memory

View Source ↗First seen 4mo agoNot yet hireable

USE CASES

Customer Support Automation Summarization

README

# TriMemory -- Memory Architecture for LLM Agents

[![tests](https://img.shields.io/badge/tests-277%20passing-brightgreen)]()
[![python](https://img.shields.io/badge/python-3.10%2B-blue)]()
[![pytorch](https://img.shields.io/badge/pytorch-2.1%2B-orange)]()
[![license](https://img.shields.io/badge/license-Apache%202.0-lightgrey)](LICENSE)

Three-path memory layer for LLM agents. KV window for recent tokens,
retrieval index for archived chunks, TRN recurrent state for compressed
long-range patterns. 8 KB of state per agent. Flat throughput at 10,000+
token history where a KV cache is 156 MB and 15x slower.

- **KV window** -- recent tokens, exact attention
- **Retrieval index** -- archived chunks, cosine search over hidden states
- **TRN state** -- compressed patterns and periodicity, constant size (8--96 KB)

A learned 3-way softmax gate mixes all three paths per token.

### TRN vs Transformer+KV (CPU, d=256, L=8, K=128)

| History | TRN (tps) | TF+KV (tps) | TRN State | KV Cache (fp32) |
|---------|-----------|-------------|-----------|-----------------|
| 1,000 | 240 | 73.8 | 8 KB | 15.6 MB |
| 5,000 | 244 | 35.9 | 8 KB | 78.1 MB |
| 10,000 | 231 | 15.5 | 8 KB | 156.3 MB |

TRN state is O(K), not O(T). Throughput stays flat while Transformer+KV degrades as history grows.

---

## Quick Demo

```bash
git clone https://github.com/amabito/tri-memory.git
cd tri-memory
pip install -e ".[dev]"
pytest  # 277 tests
```

### TriMemoryEngine -- training loop

```python
import torch
from trimemory import TRNConfig
from trimemory.tri_memory import TriMemoryEngine

cfg = TRNConfig(
    vocab_size=8192, d_model=128, n_oscillators=64,
    n_layers=4, d_ff=512, max_seq_len=1024,
)
model = TriMemoryEngine(
    cfg,
    window_size=64,           # KV window: last 64 tokens
    chunk_size=32,            # eviction granularity
    max_retrieval_chunks=256, # retrieval index capacity
    enable_trn=True,
    enable_retrieval=True,
)

ids = torch.randint(0, cfg.vocab_size, (1, 512))
out = model(ids, labels=ids)
print(f"loss: {out['loss']:.4f}")

mem = model.memory_summary()
print(f"TRN state: {mem['trn_state_bytes']} bytes (constant)")
print(f"KV window: {mem['kv_window_bytes']} bytes (bounded)")
```

### AgentMemory -- stateful per-token streaming

```python
from trimemory import TRNConfig
from trimemory.agent_memory import AgentMemory

mem = AgentMemory(TRNConfig.toy(), device="cpu")

# Feed tokens. State is O(K) -- no KV cache.
mem.add_tokens([1, 2, 3, 4, 5])

state = mem.get_state()
print(f"TRN state: {mem.state_size_bytes()} bytes, "
      f"position: {state['position']}")

# Save and restore across agent turns.
mem.save("turn1.pt")
mem.load("turn1.pt")
```

### Standalone TRN

```python
from trimemory import TRNConfig, TRNModel
import torch

cfg = TRNConfig.trn_100m()
model = TRNModel(cfg)

prompt = torch.randint(0, cfg.vocab_size, (1, 16))
tokens = model.generate(prompt, max_new_tokens=128)
# O(1) memory per step. No KV cache.
```

---

## How it differs from RAG

| | Standard RAG | TriMemory |
|--|--|--|
| Retrieval basis | Semantic similarity | Authority chain + semantic similarity |
| Handles amendment override | No (both chunks retrieved equally) | Yes (structured authority resolution) |
| Memory per agent at 10k context | O(context) -- 156 MB KV cache (fp32) | O(1) -- 8 KB TRN state |
| Throughput at 10k context | 15.5 tps (TF+KV, CPU) | 231 tps (TRN, CPU) |
| Content-addressed retrieval | Yes | No -- TRN recall is 0.0 (honest limitation) |
| Status | Production | Alpha (toy-scale models, N=10 benchmark) |

Authority resolution requires `use_compact_memory_packet=True` and document metadata. Off by default.

---

## Architecture

```
Input
 |-- KV window (last W tokens, exact attention)
 |-- Retrieval index (archived chunks, cosine search)
 |-- TRN state (compressed patterns, constant size)
 |
 v
3-way gate: [g_kv, g_trn, g_ret] = softmax(W_gate * x)
  out = g_kv * kv_out + g_trn * trn_out + g_ret * ret_out
 |
 v
FFN -> logits
```

| Path | What it stores | Size | Access |
|------|---------------|------|--------|
| KV window | Recent W tokens | O(W) per layer | Exact attention |
| Retrieval | Archived chunks with hidden states | Fixed capacity (default 256 chunks) | Cosine similarity search |
| TRN state | Compressed history (amplitude, phase, frequency) | O(K) per layer, constant | Linear recurrence |

Tokens enter KV window. Every C tokens, the oldest chunk gets evicted and scored
for saliency. High-saliency chunks go to the retrieval index. TRN state always updates.
The gate routes each path's output based on what the current token needs.

---

## Benchmark Results

### V5 A/B/C/D (Seeds 1--10, 3000 steps)

| Config | Composite | Strength | Caveat |
|--------|-----------|----------|--------|
| A (KV only) | 0.263 | Baseline | No long-range memory |
| B (KV+TRN) | 0.457 | Pattern detection 0.678 | 2/10 seeds stuck on pattern (D recovers) |
| C (KV+Ret) | 0.369 | Old fact recall 0.433 | No pattern capability |
| **D (Full)** | **0.676** | **Pattern 0.805, Old fact 0.719** | Toy scale only (d=128, L=4) |

D >= max(A,B,C) in 10/10 seeds (mean delta +0.165). H1--H4 all PASS.
See [docs/FINAL_VERDICT.md](docs/FINAL_VERDICT.md).

### Multi-Agent Scaling (trn_100m, T=1000)

| Agents | TRN Total | KV Total (fp32) | Ratio |
|--------|-----------|------------------|-------|
| 10 | 0.16 MB | 312 MB | 2,000x |
| 100 | 1.56 MB | 3,125 MB | 2,000x |
| 1,000 | 15.6 MB | 31,250 MB | 2,000x |

Config-specific. bf16 KV halves the ratio. See [docs/PUBLIC_CLAIMS.md](docs/PUBLIC_CLAIMS.md).

---

## PolicyBench (N=10)

Document authority QA -- 10 samples, Japanese corporate IT security policy, 9 evaluation types:
authority resolution, amendment override, scope-dependent values, transition states,
and more. English language corpus evaluation is planned.

```
data/policybench/policy_v1.jsonl
```

Each sample has multi-document context (base policy + amendments + circulars),
a query, gold answer, authority chain, and failure class annotation.

---

## VERONICA Integration

TriMemory handles memory architecture. Runtime containment (budget enforcement,
circuit breaking, policy governance) is handled by
[VERONICA-core](https://github.com/amabito/veronica-core).
Together: governed knowledge execution for LLM agents.

---

## Known Limitations

- TRN cannot perform content-addressed retrieval. Selective copy accuracy is 8.8% vs Transformer 96.2%. Needle-in-Haystack recall is 0.0. Structural property of linear recurrence.
- All experiments use toy-scale models (1--100M parameters). Scaling behavior at 1B+ is unknown.
- B config (KV+TRN) shows seed-dependent pattern failure (2/10 stuck), though D recovers.
- PolicyBench is N=10. Validation on larger corpora is future work.
- CompactMemoryPacket (authority resolution, conflict detection) is off by default. Requires `use_compact_memory_packet=True` and document metadata with authority chains.
- Retrieval index is not persistent across process restarts.
- Gate telemetry is diagnostic only -- gate ratios vary by input distribution.
- Alpha status. No production deployment.

See [docs/TRN_LIMITATIONS.md](docs/TRN_LIMITATIONS.md) and [docs/PUBLIC_CLAIMS.md](docs/PUBLIC_CLAIMS.md).

---

## Repository Structure

```
src/trimemory/
    tri_memory.py    TriMemoryEngine (KV + TRN + Retrieval)
    retrieval.py     RetrievalIndex (bag/hidden/hybrid search)
    model.py         TRNModel (standalone)
    resonance.py     TemporalResonanceLayer (oscillator recurrence)
    baseline.py      TransformerModel (A/B comparison)
    saliency.py      SaliencyArchiver (chunk scoring)
    router.py        Retrieval router / gate
    config.py        TRNConfig (toy, trn_100m, trn_400m, trn_1b)
    agent_memory.py  Streaming agent inference wrapper

scripts/
    eval_go_no_go.py                   Go/No-Go gate evaluation
    run_trimemory_streaming_eval.py    Streaming evaluation with telemetry
    bench_phase7_gpu.py                GPU benchmark (TRN vs TF+KV)


[truncated…]

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenMar 6, 2026

last updatedMar 17, 2026

last crawled3 months ago

version—

RELATED AGENTS

askimo

Askimo is a platform that lets you interact with artificial intelligence in a simple way, whether through chatting, sear

blog-writer-multi-agents

Here's a plain English summary of the blog-writer-multi-agents AI agent: This agent automatically creates professional-

boss-skill

This agent, boss-skill, is designed to help employees navigate challenging workplace dynamics, particularly those involv

strands-multi-engineer-agent

This agent helps businesses understand how different AI models perform when tackling the same engineering task. It runs

J.E.L.L.Y._AI

J.E.L.L.Y._AI is an article writing AI developed by its creator. This repository is publicly available for job seeking p

More Customer Support agents →

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:amabito/tri-memory)