media-ui

provenance:github:rostwal95/media-ui

WHAT THIS AGENT DOES

Media-UI is a tool that helps developers test and improve voice assistants like those used in smart speakers or customer service bots. It allows them to see exactly how the assistant is processing audio, responding, and performing in real-time, identifying any issues with accuracy or speed. Quality assurance teams and developers building voice-based applications would find this tool invaluable for ensuring a smooth and reliable user experience.

View Source ↗First seen 7mo agoNot yet hireable

USE CASES

Testing DevOps Automation

README

# Media-UI — Real-Time Voice Agent Testing Platform

<div align="center">
  <br />
  <div>
    <img src="https://img.shields.io/badge/-Next.js_15-000000?style=for-the-badge&logo=next.js&logoColor=white" alt="Next.js" />
    <img src="https://img.shields.io/badge/-React_19-61DAFB?style=for-the-badge&logo=react&logoColor=black" alt="React" />
    <img src="https://img.shields.io/badge/-TypeScript-3178C6?style=for-the-badge&logo=typescript&logoColor=white" alt="TypeScript" />
    <img src="https://img.shields.io/badge/-Node.js-339933?style=for-the-badge&logo=node.js&logoColor=white" alt="Node.js" />
    <img src="https://img.shields.io/badge/-WebSocket-010101?style=for-the-badge&logo=socket.io&logoColor=white" alt="WebSocket" />
    <img src="https://img.shields.io/badge/-gRPC-244C5A?style=for-the-badge&logo=grpc&logoColor=white" alt="gRPC" />
    <img src="https://img.shields.io/badge/-TailwindCSS-38B2AC?style=for-the-badge&logo=tailwindcss&logoColor=white" alt="Tailwind CSS" />
  </div>
  <h3 align="center">Debug & Test Voice-Based Autonomous Agents</h3>
  <div align="center">
     Real-time STT → LLM → TTS testing with latency analytics, barge-in support, and conversation export
  </div>
  <br />
  
  <div align="center">
    <img src="docs/tool.png" alt="UI Screenshot" width="100%" style="max-width: 900px; border-radius: 8px; box-shadow: 0 4px 6px rgba(0,0,0,0.1);">
  </div>
  <br />
</div>

## 📋 Table of Contents

- [Introduction](#-introduction)
- [Tech Stack](#️-tech-stack)
- [Features](#-features)
- [Quick Start](#-quick-start)
- [Architecture](#-architecture)
- [Configuration](#️-configuration)
- [Project Structure](#-project-structure)
- [Development Guide](#-development-guide)

## 🚀 Introduction

**Media-UI** is a full-featured testing platform for voice-based autonomous agents, providing real-time audio streaming, speech recognition debugging, and comprehensive latency analytics.

**Built for:**

- ✅ **QA & Testing** – Validate STT accuracy, TTS quality, and agent responses
- ✅ **Performance Analysis** – Track latency metrics, silence gaps, and barge-in behavior
- ✅ **Debugging** – Export full conversation logs, recordings, and metrics
- ✅ **Demos & Presentations** – Clean chat UI with real-time agent interaction

> **⚠️ Note:** This is a **testing/debugging tool**, not a production voice application. Focus is on observability and developer experience.

## ⚙️ Tech Stack

### **Frontend** (Next.js App)

- **Next.js 15** – React framework with App Router
- **React 19** – Latest features with concurrent rendering
- **TypeScript 5** – Full type safety
- **Tailwind CSS 4** – Utility-first styling
- **Web Audio API** – AudioWorklet for microphone capture & TTS playback
- **Radix UI** – Accessible dialog, tooltip, switch components
- **Lucide Icons** – Clean, consistent iconography

### **Backend** (Node.js WebSocket Bridge)

- **WebSocket (ws)** – Real-time bidirectional communication
- **ConnectRPC** – gRPC-web protocol over WebSocket
- **Protocol Buffers** – Type-safe message serialization
- **ts-node** – Direct TypeScript execution for server

### **Audio Processing**

- **AudioWorklet** – Low-latency PCM capture (`pcm-processor.js`)
- **16-bit LINEAR16** @ 16kHz – High-quality audio encoding
- **µ-law decoding** – TTS playback from backend
- **WAV export** – Mixed recordings with real-time sync

### **Infrastructure**

- **Docker** – Multi-stage production builds
- **PM2** – Process management for Next.js + WebSocket server
- **Protocol Buffers** – Generated TypeScript types from `.proto` files

## ⚡ Features

### 🎤 **Real-Time Audio Streaming**

- Microphone capture via AudioWorklet (128-sample quantum)
- Buffered streaming with 40ms intervals
- Automatic AudioContext resume handling
- Device selection support

### 🧠 **Speech Recognition**

- Interim and final transcription results
- Start-of-input (SOI) and end-of-input (EOI) events
- Barge-in detection and handling
- Live text updates during speech

### 🔊 **Text-to-Speech Playback**

- Queue-based audio playback
- Interruptible during barge-in
- µ-law and WAV format support
- Chunk-level playback tracking

### 💬 **Chat Interface**

- Real-time message bubbles (user + agent)
- Millisecond-precision timestamps
- Connection status indicator
- Call duration timer

### 📊 **Latency Metrics**

- **Call-level**: Start latency, greeting playback time
- **Per-dialogue**:
  - First interim result latency
  - Customer utterance length
  - Prompt playback time
  - Silence gaps (pre/post agent response)
  - Barge-in latency
  - Audio chunks sent
- Expandable metrics panel with visual indicators

### 📤 **Export Capabilities**

- **Mixed Recording**: Caller + Agent audio synchronized
- **Backend Logs**: Full conversation with scrubbed audio payloads
- **Transcript**: HTML export with timestamps
- **Kibana Link**: Direct link to orchestrator logs

### 🛡️ **Error Handling**

- WebSocket reconnection logic
- gRPC stream error recovery
- User-friendly error messages
- Comprehensive client-side logging

## 🚀 Quick Start

### Prerequisites

- **Node.js 22.x** ([nvm](https://github.com/nvm-sh/nvm))
- **pnpm** (enable with `corepack enable`)

### Local Development

```bash
# 1. Install dependencies
pnpm install

# 2. Start WebSocket server (terminal 1)
pnpm dev:server
# Runs on ws://localhost:3001/ws

# 3. Start Next.js frontend (terminal 2)
pnpm dev
# Runs on http://localhost:3000

# Or start both concurrently:
pnpm dev:all
```

Visit **http://localhost:3000** → Configure connection → Start call

### Docker Deployment

```bash
# Build image
docker build -t media-ui .

# Run container
docker run -d \
  -p 3000:3000 \
  -p 3001:3001 \
  --name media-ui \
  media-ui

# Check logs
docker logs -f media-ui
```

**Services:**

- Frontend: http://localhost:3000
- WebSocket: ws://localhost:3001/ws

### Available Scripts

```bash
# Development
pnpm dev              # Next.js dev server (port 3000)
pnpm dev:server       # WebSocket server (port 3001)
pnpm dev:all          # Start both with concurrently

# Production
pnpm build            # Build Next.js app
pnpm start            # Start production server

# Utilities
pnpm lint             # ESLint checks
pnpm typecheck        # TypeScript validation
```

## 🏗️ Architecture

### High-Level Flow

```
                        WebSocket (JSON/Protobuf)           gRPC (Protobuf)
    ┌──────────────────────────────────────────┐    ┌──────────────────────────┐
    │                                          │    │                          │
    │                                          ▼    ▼                          │
┌───┴────────────┐                      ┌─────────────────┐              ┌────┴─────────────┐
│                │                      │                 │              │                  │
│    Next.js     │◀────────────────────▶│    Node.js      │◀────────────▶│    Universal     │
│    Frontend    │                      │  WebSocket      │              │     Harness      │
│                │   Bidirectional      │    Bridge       │ Bidirectional│    (Backend)     │
│  (Port 3000)   │   Streaming          │  (Port 3001)    │  Streaming   │                  │
│                │                      │                 │              │                  │
└────────┬───────┘                      └────────┬────────┘              └──────────────────┘
         │                                       │
         │ ┌─────────────────────────────────────┘
         │ │
         │ │  • Bearer Token (JWT)
         │ │  • Orchestrator Host URL
         │ │  • Org ID / Conversation ID
         │ │  • Language & Agent Config
         │ │
         ▼ ▼
    ┌─────────────────┐
    │  AudioWorklet   │
    │  PCM Processor  │
    ├─────────────────┤
    │  • 16-bit PCM   │
    │  • 16 kHz       │
    │  • 128 samples  │
    │  • 40ms buffer  │
    └─────────────────┘
         │
         ▼
    ┌─────────────────┐
    │   Microphone    │
    │   Hardware      │
    └──────────────

[truncated…]

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenOct 29, 2025

last updatedNov 2, 2025

last crawled2 months ago

version—

RELATED AGENTS

askimo

Askimo is a platform that lets you interact with artificial intelligence in a simple way, whether through chatting, sear

drone-agent

The drone-agent is an autonomous system designed for drone control and operation. It leverages generative AI, specifical

dolios-agent

Dolios Agent is an AI assistant that can handle complex tasks by breaking them down into smaller steps and learning from

encode

Here's a plain English summary of the "encode" agent (which is part of the ZORAK platform): This agent analyzes cryptoc

J.E.L.L.Y._AI

J.E.L.L.Y._AI is an article writing AI developed by its creator. This repository is publicly available for job seeking p

More Testing agents →

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:rostwal95/media-ui)