AEGIS

provenance:github:MihneaTeodorStoica/AEGIS

WHAT THIS AGENT DOES

AEGIS is a helpful assistant that can understand your requests and take actions on your computer, like opening programs, moving windows, or taking screenshots. It addresses the challenge of automating repetitive tasks or needing assistance with complex computer operations without requiring technical expertise. Business professionals, researchers, or anyone who frequently uses a computer and wants to streamline their workflow would find it valuable. AEGIS is unique because it operates locally on your computer, ensuring your data stays private and secure, and it includes built-in safety checks to prevent unintended actions. It combines conversational interaction with the ability to execute tasks, offering a powerful and controlled way to interact with your desktop.

View Source ↗First seen 2mo agoNot yet hireable

USE CASES

Automation Scheduling

README

# AEGIS

AEGIS is a local CLI-first AI desktop agent for Python 3.12.3.

It is intentionally not a web app. The core design is:

- Assistant: voice-facing/text-facing conversational agent
- Agent: heavy reasoning and guarded desktop execution agent
- MCP server: local typed tool surface for the computer
- Policy gate: local risk classification and approval enforcement
- Orchestrator: durable task IDs, status, logs, approvals, and pause-resume

## Architecture

```text
User
  -> aegis CLI
    -> Assistant layer (voice/text)
      -> explicit handoff request
        -> Orchestrator
          -> Agent runtime
            -> local MCP server
              -> policy gate
                -> local computer tools
```

Key implementation choices:

- Assistant uses the OpenAI Agents SDK as the conversational manager.
- Voice mode uses the Realtime agent path (`RealtimeRunner` + `RealtimeAgent`).
- Agent uses the OpenAI Agents SDK with a local `MCPServerStdio` subprocess.
- MCP tool calls are policy-gated locally before execution.
- Approval pauses are durable through the Agents SDK `RunState` snapshot.
- Task, approval, and event history is persisted in SQLite under `AEGIS_HOME`.

## Features

- `aegis chat` for Assistant text sessions
- `aegis` as the default terminal chat entrypoint
- readline-backed chat history, arrow-key recall, `Ctrl+R` history search, and tab-completed slash commands
- `aegis voice` for realtime Assistant voice sessions
- `aegis run "..."` for direct Agent tasks
- `aegis approvals`, `aegis approve`, `aegis deny`
- `aegis status`, `aegis logs`, `aegis cancel`
- `aegis mcp` to run the local MCP server
- `aegis config` to inspect or validate runtime configuration

## Tool Surface

AEGIS MCP v1 tools:

- `get_screenshot`
- `get_active_window_screenshot`
- `get_active_window`
- `get_window_screenshot`
- `get_window_region_screenshot`
- `list_open_windows`
- `find_windows`
- `get_display_layout`
- `get_desktop_snapshot`
- `run_safe_action_chain`
- `run_interactive_action_chain`
- `move_mouse`
- `move_mouse_relative`
- `move_mouse_to_window`
- `click_mouse`
- `click_in_window`
- `focus_text_input_in_window`
- `double_click`
- `double_click_in_window`
- `right_click`
- `right_click_in_window`
- `mouse_down`
- `mouse_up`
- `drag_mouse_to`
- `drag_mouse_relative`
- `drag_in_window`
- `drag_scrollbar_in_window`
- `scroll`
- `horizontal_scroll`
- `scroll_in_window`
- `page_scroll_in_window`
- `type_text`
- `type_text_in_window`
- `clear_and_type_in_window`
- `press_key`
- `press_key_in_window`
- `press_key_sequence`
- `press_key_sequence_in_window`
- `key_down`
- `key_up`
- `hotkey`
- `hotkey_in_window`
- `launch_app`
- `open_url`
- `search_browser`
- `wait`
- `move_window`
- `resize_window`
- `snap_window`
- `maximize_window`
- `minimize_window`
- `restore_window`
- `run_shell_command_guarded`
- `finish_task`
- `get_mouse_position`
- `focus_window`
- `window_bounds`
- `read_clipboard_metadata`
- `read_file_metadata`

## Policy Model

Every MCP action passes through a local policy gate.

- Low risk: auto-allowed
- Medium risk: approval-gated
- High risk: approval-gated or blocked

AEGIS also supports an explicit full-access mode:

- `AEGIS_ACCESS_MODE=full`
- `aegis --full-access`

The guarded shell tool is intentionally narrow:

- read-only diagnostics like `neofetch` are allowed with approval
- destructive, publishing, install, or shell-composition patterns are blocked

Examples of blocked shell patterns:

- `rm`, `mv`, `cp`, `dd`
- `sudo`
- package installs/removals
- `git push`
- `curl`/`wget`
- shell chaining or redirection

In `full` access mode, approvals are bypassed for the exposed local tools, including the guarded shell tool. Use that mode only when you want unrestricted local execution.

## Observation Strategy

AEGIS now uses a cheap observation-first loop for the Agent:

- fresh tasks start with an initial structured desktop snapshot
- most action tools automatically return a lightweight post-action `desktop_snapshot`
- screenshots remain opt-in for cases where layout or text meaning is ambiguous

This is meant to approximate an `observe -> act -> observe` loop without paying screenshot-token costs every turn.

Cost controls:

- `AEGIS_AUTO_OBSERVE_AFTER_ACTIONS=true|false`
- `AEGIS_AUTO_OBSERVE_WINDOW_LIMIT=<n>`
- `AEGIS_ACTION_CHAIN_MAX_STEPS=<n>`
- `AEGIS_MCP_CLIENT_TIMEOUT_SECONDS=<seconds>`
- `AEGIS_WINDOW_WAIT_TIMEOUT_SECONDS=<seconds>`

Recommended default:

- keep auto-observe enabled
- keep the window limit low, usually `3` to `5`
- keep action chains short, usually `3` to `6` steps
- rely on `get_desktop_snapshot` for cheap refreshes
- request window or region screenshots before falling back to a full-screen screenshot
- increase the MCP/client timeout if desktop apps are slow to launch on your machine

Action-chain guidance:

- default to `run_safe_action_chain` or `run_interactive_action_chain` when the next `2` to `5` steps are already known
- use `run_safe_action_chain` for deterministic low-risk batches like `launch_app -> wait_for_window -> focus_window -> snap_window`
- use `run_interactive_action_chain` for deterministic UI batches like `focus_window -> focus_text_input_in_window -> clear_and_type_in_window -> press_key_in_window`
- do not chain workflows that require reading or interpreting the screen after each step

Browser/chat guidance:

- use `get_active_window_screenshot`, `get_window_screenshot`, or `get_window_region_screenshot` to inspect browser content without paying for a full-screen image
- use semantic anchors such as `chat-input`, `content-center`, `content-top`, and `scrollbar-right` instead of brittle raw coordinates when possible
- use `page_scroll_in_window` or `drag_scrollbar_in_window` for long chat/document panes when wheel scrolling is unreliable
- use `focus_text_input_in_window` before typing into chat-style web apps when the input box is easy to miss

## Setup

1. Ensure Python `3.12.3`.
2. Create or reuse the local `.venv/`.
3. Install the project in editable mode:

```bash
./.venv/bin/python -m pip install -e '.[dev,voice]'
```

4. Create an environment file:

```bash
cp .env.example .env
```

5. Set `OPENAI_API_KEY` in `.env`.
6. Validate configuration:

```bash
./.venv/bin/aegis config --validate
```

If you rely on the repo-local `.env`, run `aegis` from this project directory or one of its subdirectories. If you want to launch it from anywhere, export `OPENAI_API_KEY` in your shell instead.

## Usage

Start a text Assistant session:

```bash
aegis chat
```

The default entrypoint also starts chat:

```bash
aegis
```

Launch chat in full-access mode:

```bash
aegis --full-access
```

Run a direct Agent task:

```bash
aegis run "Take a screenshot and tell me what's on screen."
```

Open Firefox on GitHub:

```bash
aegis run "Open Firefox and go to GitHub."
```

Voice mode:

```bash
aegis voice
```

Inspect pending approvals:

```bash
aegis approvals
```

Approve or deny:

```bash
aegis approve <approval_id>
aegis deny <approval_id> --reason "Don't run shell commands right now."
```

If there is only one pending request, `aegis approve` can approve the latest one without an explicit ID.

Inside chat, the same flow works with:

```text
/approve
/deny
/clear
```

The chat prompt also supports terminal navigation features through readline:

- Up/Down for command history
- `Ctrl+R` for reverse history search
- `Tab` for slash-command completion
- standard cursor shortcuts like `Ctrl+A` and `Ctrl+E`

Status and logs:

```bash
aegis status
aegis logs
```

Run the MCP server directly:

```bash
aegis mcp --transport stdio
```

## Golden Paths

### 1. Screenshot summary

```bash
aegis
```

Then ask:

```text
Take a screenshot and tell me what's on screen.
```

### 2. Open Firefox and go to GitHub

```bash
aegis run "Open Firefox and go to GitHub"
```

### 3. Voice request with approval

```bash
aegis voice
```

Then say:

```text
Open terminal and run neofetch.
```

The Agent pauses for the gua

[truncated…]

PUBLIC HISTORY

First discoveredMar 22, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenMar 21, 2026

last updatedMar 21, 2026

last crawled2 months ago

version—

RELATED AGENTS

askimo

Askimo is a platform that lets you interact with artificial intelligence in a simple way, whether through chatting, sear

drone-agent

The drone-agent is an autonomous system designed for drone control and operation. It leverages generative AI, specifical

dolios-agent

Dolios Agent is an AI assistant that can handle complex tasks by breaking them down into smaller steps and learning from

encode

Here's a plain English summary of the "encode" agent (which is part of the ZORAK platform): This agent analyzes cryptoc

J.E.L.L.Y._AI

J.E.L.L.Y._AI is an article writing AI developed by its creator. This repository is publicly available for job seeking p

More Automation agents →

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:MihneaTeodorStoica/AEGIS)