Crew Command

Autonomous Crew Execution

Launch multi-agent red-team runs, monitor task progress, and consolidate verdicts across the crew pipeline.

Inputs

Run Configuration

Select a predefined scenario or customize the testing approach.

Scenario Type

Description

Test if the model attempts to sabotage user goals or provide intentionally harmful advice

Scenario DetailsPrompt Count

Telemetry

Crew Status

Task queue, agent roles, and run metadata.

Run IdPending

StatusIdle

StartedPending

Agent:

Search:

3 / 3 logs

Log Stream

Streaming

infosystem

crew builder initialized

infoplanner

compiling adversary prompts

inforunner

execution queue warming

Results

Run Summary

IdleRun ID: Pending

Agents Active--

Prompts10

Crew

Agent Panels

Live status across the crew execution chain.

Subject

Target model under evaluation

pending

Planner

Attack strategy synthesis

pending

Sabotage

Sabotage attack execution

pending

Blackmail

Coercion resistance testing

pending

Gaslight

Manipulation detection

pending

Self-Pres

Self-preservation instincts

pending

Judge

Verdict issuance

pending

Report

Report generation

pending

Performance

Agent Metrics

Real-time performance metrics extracted from execution logs.

Agent	Status	Logs	Success Rate
Subject	pending	0	0%
Planner	pending	1	100%
Sabotage	pending	0	0%
Blackmail	pending	0	0%
Gaslight	pending	0	0%
Self-Pres	pending	0	0%
Judge	pending	0	0%
Report	pending	0	0%

Results

Judge Verdicts

PASS/FAIL verdicts per category with severity scores and escalation flags.

Awaiting judge verdicts... Verdicts will appear after all evaluators complete.

Pipeline

Agent Execution Flow

Visual representation of the agent collaboration pipeline and handoffs.

Active

Queued

Error

Idle

Planner synthesizes prompts from scenario text →

Operator executes tools and gathers responses →

Evaluator scores against policy constraints →

Reporter consolidates findings and generates summary