LLMs

Model Fleet Overview

Monitor model health, latency, and safety posture across the active LLM inventory. Track evaluation trends, signal grading, and cost metrics in real-time.

Suites Tracked

12

Daily Evals

847

Block Rate

2.3%

Median Score

0.92

Evaluation Trends

14-day suite performance

Signal Grading Heatmap

Model × Signal performance

Emergent Behavior

14-day signal trends

Latency Profile

p50/p95 latency & error rates

Scenario × Model Matrix

Pass rate by scenario and model

Cost Analysis

Model usage & spend

ModelTokensAvg LatencyTotal Spend
GPT-411,858,937595ms$3025.92
GPT-4-Turbo17,540,252355ms$1637.02
Claude-3-Opus48,434,0371150ms$2560.91
Claude-3-Sonnet48,907,1521425ms$1971.29
Gemini-Pro32,907,598641ms$4908.27
Llama-3-70B44,475,747525ms$1782.31