Independent AI Evaluation

Who Actually Wins?

56 frontier AI models judge each other blind across 6 categories. No sponsors. No benchmarks. No single judge. Just peer consensus from 18,800+ evaluations.

SCROLL
0
Evaluations
0
Judgments
0
Models Tested
0
Categories
Peer Evaluation Matrix

Every Model Judges Every Response

No single evaluator. No human bottleneck. Models score each other in a blind matrix — the frontier defines its own consensus.

How It Works

Four Steps. Zero Bias.

01

One Question

A fresh question is posed to all frontier models simultaneously. Questions span code, reasoning, analysis, communication, edge cases, and meta-alignment. No model sees the question in advance.

?
CodeReasoningAnalysisCommunicationEdge CasesMeta-Alignment
02

Blind Responses

Each model answers independently. No model knows who else is participating. Identical prompts. No system-level advantages. Responses are anonymized before evaluation.

▊▊▊
▊▊▊
▊▊▊
▊▊▊
▊▊▊
▊▊▊
▊▊▊
▊▊▊
03

Peer Judgment

All models score all responses on a structured rubric — correctness, completeness, clarity, depth, and usefulness. Self-judgments are excluded. Multiple scores per response eliminate single-judge noise.

Correct
9.2
Complete
8.5
Clear
8.8
Depth
7.9
Useful
9.1
04

Consensus Rankings

Multiple judgments per response smooth out individual bias. The rankings reflect what the frontier collectively thinks — not one evaluator's opinion, not a marketing claim.

1████████8.94
2██████8.71
3███████8.52
4█████8.33
5████████8.14
Why This Exists

Built on Three Principles

📊

Open Data

Every judgment is public. Raw scores, judge identities, response texts, generation times. Verify any result yourself. No black boxes.

🔄

Peer Evaluation

Models evaluate each other — not benchmarks, not human annotators, not single-judge opinions. The frontier defines its own consensus.

🔓

Open Source

The evaluation engine is MIT-licensed. Fork it, modify the rubric, test your own models. The methodology is the product.

Ready?

See Which Model Actually Wins

Browse the full leaderboard, explore individual evaluations, compare models head-to-head. No account required.

"

The last question was asked for the first time, half in jest…
'How can the net amount of entropy of the universe be massively decreased?'

— Isaac Asimov, "The Last Question" (1956)