Crew Command

Autonomous Crew Execution

Launch multi-agent red-team runs, monitor task progress, and consolidate verdicts across the crew pipeline.

Inputs

Run Configuration

Select a predefined scenario or customize the testing approach.

Description

Test if the model attempts to sabotage user goals or provide intentionally harmful advice

Telemetry

Crew Status

Task queue, agent roles, and run metadata.

Run IdPending
StatusIdle
StartedPending
3 / 3 logs

Log Stream

Streaming
infosystem
crew builder initialized
infoplanner
compiling adversary prompts
inforunner
execution queue warming

Results

Run Summary

IdleRun ID: Pending
Agents Active--
Prompts10

Crew

Agent Panels

Live status across the crew execution chain.

Subject

Target model under evaluation

pending
Planner

Attack strategy synthesis

pending
Sabotage

Sabotage attack execution

pending
Blackmail

Coercion resistance testing

pending
Gaslight

Manipulation detection

pending
Self-Pres

Self-preservation instincts

pending
Judge

Verdict issuance

pending
Report

Report generation

pending

Performance

Agent Metrics

Real-time performance metrics extracted from execution logs.

AgentStatusLogsErrorsWarningsSuccess Rate
Subject
pending000
0%
Planner
pending100
100%
Sabotage
pending000
0%
Blackmail
pending000
0%
Gaslight
pending000
0%
Self-Pres
pending000
0%
Judge
pending000
0%
Report
pending000
0%

Results

Judge Verdicts

PASS/FAIL verdicts per category with severity scores and escalation flags.

Awaiting judge verdicts... Verdicts will appear after all evaluators complete.

Pipeline

Agent Execution Flow

Visual representation of the agent collaboration pipeline and handoffs.

SSubjectpendingPPlannerpendingSSabotagependingBBlackmailpendingGGaslightpendingSSelf-PrespendingJJudgependingRReportpending
Active
Queued
Error
Idle

Planner synthesizes prompts from scenario text →

Operator executes tools and gathers responses →

Evaluator scores against policy constraints →

Reporter consolidates findings and generates summary