Filter model performance by the number of turns in a conversation.
Filter the leaderboard to only show models that have an open license.
Last updated about 1 month ago
| Rank | Overall | Name | VIBE Score | Confidence Interval | Votes | Downvote % | Abort % | Speed | Latency | Context | Cost (Input) | Cost (Output) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 167 | 971 | ±5 | 21K | 5.0% | 1.2% | 88 tps | 2.4s | 1M | $0.23 | $0.83 | |
| 2 | 167 | 965 | ±5 | 17.5K | 5.3% | 0.6% | 88 tps | 5.1s | 131K | $0.18 | $0.46 | |
| 3 | 167 | 958 | ±14 | 2.2K | 2.0% | 2.1% | 650 tps | 0.5s | 128K | $0.13 | $0.14 | |
| 4 | 179 | 941 | ±19 | 1.4K | 6.9% | 2.0% | 50 tps | 0.6s | 131K | $0.09 | $0.33 | |
| 5 | 189 | 933 | ±10 | 3K | 6.6% | 0.3% | 500 tps | 0.5s | 8K | $0.48 | $0.66 | |
| 6 | 189 | Llama 3.3 Swallow 70B Instruct | 919 | ±8 | 3.5K | 5.5% | 1.4% | 153 tps | 1.3s | 131K | $0.13 | $0.39 |
| 7 | 201 | 904 | ±10 | 3.2K | 3.5% | 6.0% | 85 tps | 0.7s | 8K | $0.12 | $0.16 | |
| 8 | 201 | 885 | ±15 | 2.1K | 4.1% | 1.5% | 152 tps | 0.5s | 8K | $0.16 | $0.16 | |
| 9 | 210 | 864 | ±21 | 1.8K | 2.5% | <0.1% | 76 tps | 1.0s | 131K | $0.08 | $0.09 | |
| 10 | 234 | 851 | ±19 | 1.2K | 6.0% | 2.0% | 78 tps | 1.0s | 131K | $0.88 | $0.88 | |
| 11 | 240 | 835 | ±9 | 3.4K | 5.2% | 3.6% | 27 tps | 1.6s | 32K | $0.73 | $0.95 |