Overall Leaderboard
Weighted ranking across core quality signals.
| Rank | Model | Final Score | reasoning | reliability | cost | speed | coding | UX |
|---|---|---|---|---|---|---|---|---|
| 1 | Claude 4 Sonnet | 8.9 | 9.3 | 9.2 | 7.8 | 8.6 | 9.1 | 9.0 |
| 2 | GPT-4.2 | 8.7 | 9.0 | 8.8 | 7.3 | 8.9 | 9.4 | 8.6 |
| 3 | Gemini 2.5 Pro | 8.6 | 8.8 | 8.4 | 8.6 | 8.5 | 8.9 | 8.5 |
| 4 | Llama 4 Instruct | 8.2 | 8.1 | 7.8 | 9.2 | 8.2 | 8.0 | 7.6 |
Category: overall