HorseBench

Horses don't stop, they keep going. Gauges which models are most eager to produce tokens and most susceptible to falling into infinite repetitions. GitHub

Filters
Outcome:
Model visibility

Leaderboard
Companies:
Rank Model Org Avg Tokens (std) Workhorse% Halfhorse% Garfield% Workhorse Rate Avg Cost N
Classification Breakdown

Workhorse Rate by Model

Workhorse Halfhorse Garfield
Drilldown

Selected Segment

Response Viewer