Overall Leaderboard

Weighted ranking across core quality signals.

RankModelFinal ScorereasoningreliabilitycostspeedcodingUX
1Claude 4 Sonnet8.99.39.27.88.69.19.0
2GPT-4.28.79.08.87.38.99.48.6
3Gemini 2.5 Pro8.68.88.48.68.58.98.5
4Llama 4 Instruct8.28.17.89.28.28.07.6
Category: overall