summarization-pydantic-ai

provenance:github:vstorm-co/summarization-pydantic-ai

WHAT THIS AGENT DOES

This agent helps AI assistants manage long conversations. It automatically condenses or trims down past messages so the assistant doesn't run out of memory when responding. This is useful for things like customer service bots or coding assistants that need to remember a lot of information over time, ensuring they can continue to provide helpful responses.

View Source ↗First seen 4mo agoNot yet hireable

USE CASES

Customer Support Summarization Automation

README

<h1 align="center">Context Management for Pydantic AI</h1>

<p align="center">
  <em>Automatic Conversation Summarization and History Management</em>
</p>

<p align="center">
  <a href="https://pypi.org/project/summarization-pydantic-ai/"><img src="https://img.shields.io/pypi/v/summarization-pydantic-ai.svg" alt="PyPI version"></a>
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10+-blue.svg" alt="Python 3.10+"></a>
  <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>
  <a href="https://github.com/vstorm-co/summarization-pydantic-ai/actions/workflows/ci.yml"><img src="https://github.com/vstorm-co/summarization-pydantic-ai/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="https://github.com/pydantic/pydantic-ai"><img src="https://img.shields.io/badge/Powered%20by-Pydantic%20AI-E92063?logo=pydantic&logoColor=white" alt="Pydantic AI"></a>
</p>

<p align="center">
  <b>Intelligent Summarization</b> — LLM-powered context compression
  &nbsp;&bull;&nbsp;
  <b>Sliding Window</b> — zero-cost message trimming
  &nbsp;&bull;&nbsp;
  <b>Limit Warnings</b> — finish-soon guidance before hard caps
  &nbsp;&bull;&nbsp;
  <b>Context Manager</b> — real-time token tracking + tool truncation
  &nbsp;&bull;&nbsp;
  <b>Safe Cutoff</b> — preserves tool call pairs
</p>

---

**Context Management for Pydantic AI** helps your [Pydantic AI](https://ai.pydantic.dev/) agents handle long conversations without exceeding model context limits. Choose between intelligent LLM summarization or fast sliding window trimming.

> **Full framework?** Check out [Pydantic Deep Agents](https://github.com/vstorm-co/pydantic-deepagents) — complete agent framework with planning, filesystem, subagents, and skills.

## Use Cases

| What You Want to Build | How This Library Helps |
|------------------------|------------------------|
| **Long-Running Agent** | Automatically compress history when context fills up |
| **Customer Support Bot** | Preserve key details while discarding routine exchanges |
| **Code Assistant** | Keep recent code context, summarize older discussions |
| **High-Throughput App** | Zero-cost sliding window for maximum speed |
| **Cost-Sensitive App** | Choose between quality (summarization) or free (sliding window) |

## Installation

```bash
pip install summarization-pydantic-ai
```

Or with uv:

```bash
uv add summarization-pydantic-ai
```

For accurate token counting:

```bash
pip install summarization-pydantic-ai[tiktoken]
```

## Quick Start — Capabilities (Recommended)

The recommended way to add context management is via pydantic-ai's native [Capabilities API](https://ai.pydantic.dev/capabilities/):

```python
from pydantic_ai import Agent
from pydantic_ai_summarization import ContextManagerCapability

agent = Agent(
    "anthropic:claude-sonnet-4-6",
    capabilities=[ContextManagerCapability(max_tokens=100_000)],
)

result = await agent.run("Hello!")
```

**That's it.** Your agent now:

- Tracks token usage on every turn
- Auto-compresses when approaching the limit (90% by default)
- Truncates large tool outputs
- Auto-detects context window size from the model
- Preserves tool call/response pairs (never breaks them)

### Agent-Triggered Compression

Let the agent decide when to compress by enabling the `compact_conversation` tool:

```python
agent = Agent(
    "anthropic:claude-sonnet-4-6",
    capabilities=[ContextManagerCapability(
        include_compact_tool=True,  # Adds compact_conversation(focus?) tool
    )],
)
```

The agent can call `compact_conversation(focus="preserve API design decisions")` to trigger compression with a focus topic. Compression is deferred to the next model request.

### Combine with Limit Warnings

```python
from pydantic_ai_summarization import ContextManagerCapability, LimitWarnerCapability

agent = Agent(
    "openai:gpt-4.1",
    capabilities=[
        LimitWarnerCapability(max_iterations=40, max_context_tokens=100_000),
        ContextManagerCapability(max_tokens=100_000),
    ],
)
```

### Alternative: Processor API

For standalone use without capabilities:

```python
from pydantic_ai import Agent
from pydantic_ai_summarization import create_summarization_processor

processor = create_summarization_processor(
    trigger=("tokens", 100000),
    keep=("messages", 20),
)

agent = Agent("openai:gpt-4.1", history_processors=[processor])
```

## Available Processors

| Processor | LLM Cost | Latency | Context Preservation |
|-----------|----------|---------|---------------------|
| `ContextManagerCapability` | Per compression | Low tracking | Intelligent summary + tool truncation |
| `SummarizationProcessor` | High | High | Intelligent summary |
| `SlidingWindowProcessor` | Zero | ~0ms | Discards old messages |
| `LimitWarnerProcessor` | Zero | ~0ms | Full history + warning injection |

### Intelligent Summarization

Uses an LLM to create summaries of older messages:

```python
from pydantic_ai_summarization import create_summarization_processor

processor = create_summarization_processor(
    trigger=("tokens", 100000),  # When to summarize
    keep=("messages", 20),       # What to keep
)
```

### Zero-Cost Sliding Window

Simply discards old messages — no LLM calls:

```python
from pydantic_ai_summarization import create_sliding_window_processor

processor = create_sliding_window_processor(
    trigger=("messages", 100),  # When to trim
    keep=("messages", 50),      # What to keep
)
```

### Limit Warnings

Warn the agent before requests, context usage, or total tokens hit a cap:

```python
from pydantic_ai_summarization import create_limit_warner_processor

processor = create_limit_warner_processor(
    max_iterations=40,
    max_context_tokens=100000,
    max_total_tokens=200000,
)
```

### Context Manager Capability

Full context management with token tracking, auto-compression, and tool output truncation:

```python
from pydantic_ai import Agent
from pydantic_ai_summarization import ContextManagerCapability

agent = Agent(
    "anthropic:claude-sonnet-4-6",
    capabilities=[ContextManagerCapability(
        max_tokens=100_000,
        compress_threshold=0.9,
        max_tool_output_tokens=5000,
        include_compact_tool=True,  # Agent gets a compact_conversation tool
    )],
)
```

## Trigger Types

| Type | Example | Description |
|------|---------|-------------|
| `messages` | `("messages", 50)` | Trigger when message count exceeds threshold |
| `tokens` | `("tokens", 100000)` | Trigger when token count exceeds threshold |
| `fraction` | `("fraction", 0.8)` | Trigger at percentage of max_input_tokens |

## Keep Types

| Type | Example | Description |
|------|---------|-------------|
| `messages` | `("messages", 20)` | Keep last N messages |
| `tokens` | `("tokens", 10000)` | Keep last N tokens worth |
| `fraction` | `("fraction", 0.2)` | Keep last N% of context |

## Advanced Configuration

### Multiple Triggers

```python
from pydantic_ai_summarization import SummarizationProcessor

processor = SummarizationProcessor(
    model="openai:gpt-4o",
    trigger=[
        ("messages", 50),    # OR 50+ messages
        ("tokens", 100000),  # OR 100k+ tokens
    ],
    keep=("messages", 10),
)
```

### Fraction-Based

```python
processor = SummarizationProcessor(
    model="openai:gpt-4o",
    trigger=("fraction", 0.8),  # 80% of context window
    keep=("fraction", 0.2),     # Keep last 20%
    max_input_tokens=128000,    # GPT-4's context window
)
```

### Custom Token Counter

```python
def my_token_counter(messages):
    return sum(len(str(msg)) for msg in messages) // 4

processor = create_summarization_processor(
    token_counter=my_token_counter,
)
```

### Custom Model (e.g., Azure OpenAI)

```python
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
from pydantic_ai_summarization import create_summarization_processor

azure_m

[truncated…]

PUBLIC HISTORY

First discoveredMar 22, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenJan 20, 2026

last updatedMar 21, 2026

last crawled2 days ago

version—

RELATED AGENTS

askimo

Askimo is a platform that lets you interact with artificial intelligence in a simple way, whether through chatting, sear

drone-agent

The drone-agent is an autonomous system designed for drone control and operation. It leverages generative AI, specifical

dolios-agent

Dolios Agent is an AI assistant that can handle complex tasks by breaking them down into smaller steps and learning from

encode

Here's a plain English summary of the "encode" agent (which is part of the ZORAK platform): This agent analyzes cryptoc

J.E.L.L.Y._AI

J.E.L.L.Y._AI is an article writing AI developed by its creator. This repository is publicly available for job seeking p

More Customer Support agents →

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:vstorm-co/summarization-pydantic-ai)