Qorinix QX / BENCHMARK
DE Chat Start building

Independent AI Leaderboard · 72-hour rolling window

The Qorinix Benchmark

A live, independent leaderboard tracking AI speed, quality, throughput, and cost across the major model lanes — refreshed continuously from Qorinix Arena production traffic. Sort by what matters for your workload, not a single composite score.

Models tracked 8 Qorinix lanes + 6 public references
Daily prompts 14,200 Across reasoning, code, JSON, creative
p50 TTFT — Qorinix 148 ms 4.6× faster than the public-reference average
Cost — Qorinix vs refs 2.6× cheaper Per million output tokens, weighted

Default plot · Quality vs. Speed

The Pareto frontier

Top-right is best: faster output speed and higher quality. Qorinix lanes sit on the frontier — high quality at speeds public references cannot match.

Default plots · Per-metric ranking

Where each model lands on the metric that matters

Sorted bar plots make trade-offs explicit. Qorinix dominates speed, latency, and cost while staying competitive on quality.

Output speed tok/s · higher is better

TTFT p50 ms · lower is better

Total p95 ms · lower is better

Cost per 1M output US$ · lower is better

Quality index 0–100 · higher is better

JSON reliability 0–100 · higher is better

Detailed leaderboard

Sortable, filterable leaderboard

Filter by category and sort by any column. Qorinix rows highlight in orange.

# Lane / Model Quality TTFT p50 Total p95 Output speed JSON Success Cost / M Cache saving Value

Category winners

Best-in-class per workload

Different workloads value different trade-offs. Here are the winners by intent.

Real-time agents

Qorinix 3.1

TTFT under 150 ms, throughput above 230 tok/s — voice agents, gaming NPCs, and trading alerts where every millisecond matters.

Why: lowest TTFT and total latency, with adaptive routing across speed-class.

High-volume support automation

Qorinix 3.2

62% cache saving on repeated queries with quality matching frontier public models, at less than half the cost.

Why: semantic cache + Quality lane keeps unit economics healthy at scale.

Long-form reasoning

Reference B

Highest reasoning index in the public reference set; pair with Qorinix routing for speed-tiered resilience.

Caveat: 4–5× slower TTFT and ~3.5× cost per million output tokens.

Cost-sensitive batch

Reference A

Cheapest non-Qorinix lane; useful for offline batch where latency does not matter.

Caveat: low cache saving and middle-of-pack quality.

Methodology

How the benchmark is computed

Transparency about prompt mix, measurement, and what is held server-side.

1 · Prompt mix

14,200 prompts per day distributed across reasoning (35%), code (25%), JSON / tool-use (20%), creative (15%), and short-form chat (5%). Prompts rotate every 72 hours.

2 · Latency measurement

TTFT measured server-side from request receipt to first byte of response. Total latency captured to last token. p50 is the median across the rolling 72-hour window; p95 is the slow-tail.

3 · Quality scoring

Composite of model-graded preference (LLM-as-judge with cross-model rotation), task-deterministic checks (HumanEval-lite for code, JSON-schema validation for tools), and reading-level coherence.

4 · Cost

Listed as the per-1M output-token list price applicable to the lane on the measurement day. Cache savings are computed on Qorinix-internal traffic and assume 40%+ semantic cache hit rate.

5 · What stays server-side

Exact model IDs, API routes, credentials, resilience order, and routing weights are never exposed in the public leaderboard. Only neutral reference names and observed metrics are shown.

6 · Updates

Numbers refresh continuously from production Arena traffic. The visible board is the rolling 72-hour aggregate. Anomalies such as regional degradation or rate-limit events are flagged in the live status panel.

Test these numbers yourself.

Run the same prompt against all six lanes in the live Arena.

Open Arena Create workspace View pricing