Filter model performance by the number of turns in a conversation.
Filter the leaderboard to only show models that have an open license.
Last updated about 1 month ago
| Rank | Name | VIBE Score | Confidence Interval | Votes | Downvote % | Abort % | Speed | Latency | Context | Cost (Input) | Cost (Output) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 447 | ±15 | 3.4K | 12.0% | 9.7% | 30 tps | 0.9s | 128K | $0.07 | $0.30 | |
| 2 | CodeLlama 7B Instruct Solidity | 463 | ±54 | 485 | 8.5% | 3.6% | 33 tps | 0.7s | 16K | $0.80 | $1.20 |
| 3 | 523 | ±25 | 4.1K | 6.1% | 3.0% | 44 tps | 2.5s | 128K | $0.21 | $0.63 | |
| 4 | 573 | ±17 | 2.1K | 5.5% | 21.0% | 29 tps | 1.0s | 33K | $0.06 | $0.25 | |
| 5 | 588 | ±22 | 1.6K | 9.2% | 2.3% | 67 tps | 2.0s | 33K | $0.01 | $0.01 | |
| 6 | 599 | ±21 | 1K | 7.1% | 7.4% | 40 tps | 1.1s | 128K | $0.07 | $0.30 | |
| 7 | 600 | ±21 | 2.3K | 5.8% | 1.2% | 22 tps | 1.1s | 4K | $0.18 | $0.18 | |
| 8 | UI-TARS 1.5 7B | 610 | ±40 | 530 | 11.7% | 4.0% | 75 tps | 0.9s | 128K | $0.10 | $0.20 |
| 9 | 686 | ±13 | 3.8K | 5.3% | <0.1% | 31 tps | 2.8s | 1M | $0.55 | $2.20 | |
| 10 | 696 | ±20 | 2K | 5.5% | 6.2% | 22 tps | 1.8s | 131K | $0.37 | $0.39 | |
| 11 | 702 | ±20 | 1.4K | 4.1% | 2.3% | 20 tps | 1.1s | 131K | $0.80 | $0.80 | |
| 12 | 706 | ±30 | 715 | 5.9% | 2.5% | 50 tps | 1.0s | 33K | $0.06 | $0.25 | |
| 13 | 719 | ±18 | 1.5K | 4.1% | 1.1% | 33 tps | 3.4s | 8K | $2.50 | $10.00 | |
| 14 | 722 | ±21 | 3K | 6.3% | 2.2% | 101 tps | 1.2s | 131K | $0.08 | $0.08 | |
| 15 | 737 | ±24 | 1.5K | 5.0% | 0.6% | 50 tps | 3.2s | 8K | $2.50 | $10.00 | |
| 16 | 738 | ±15 | 1.6K | 5.6% | 2.8% | 36 tps | 0.7s | 128K | $2.08 | $9.45 | |
| 17 | 738 | ±17 | 1.4K | 5.6% | 1.8% | 142 tps | 0.7s | 66K | $0.45 | $0.45 | |
| 18 | 742 | ±10 | 3.3K | 4.7% | 1.3% | 138 tps | 0.7s | 131K | $0.02 | $0.04 | |
| 19 | 746 | ±20 | 2.1K | 6.0% | 5.3% | 25 tps | 3.7s | 128K | $1.01 | $2.79 | |
| 20 | 754 | ±24 | 745 | 5.7% | 2.7% | 21 tps | 2.2s | 6K | $6.56 | $9.38 | |
| 21 | 759 | ±11 | 2.7K | 12.8% | 3.6% | 32 tps | 0.8s | 131K | $1.00 | $3.00 | |
| 22 | 762 | ±18 | 1.3K | 4.7% | 0.7% | 176 tps | 0.4s | 33K | $0.25 | $0.25 | |
| 23 | 770 | ±12 | 1.2K | 4.5% | 1.7% | 142 tps | 0.6s | 32K | $0.43 | $1.30 | |
| 24 | Baichuan-M2-32B | 770 | ±30 | 740 | 10.8% | <0.1% | 32 tps | 3.3s | 131K | $0.07 | $0.07 |
| 25 | 778 | ±18 | 2.2K | 4.9% | 5.8% | 54 tps | 0.6s | 128K | $0.30 | $0.99 | |
| 26 | 781 | ±29 | 460 | 8.9% | 1.1% | 67 tps | 0.6s | 131K | $0.12 | $0.39 | |
| 27 | 785 | ±16 | 1.1K | 5.8% | 1.5% | 54 tps | 0.7s | 33K | $2.00 | $6.00 | |
| 28 | 787 | ±9 | 2K | 2.7% | <0.1% | 46 tps | 1.2s | 4K | $1.50 | $2.00 | |
| 29 | 793 | ±27 | 510 | 3.8% | <0.1% | 247 tps | 2.2s | 32K | $0.25 | $1.00 | |
| 30 | 797 | ±21 | 815 | 8.4% | 3.5% | 31 tps | 0.9s | 131K | $0.52 | $1.73 | |
| 31 | 798 | ±16 | 1.7K | 3.4% | 5.1% | 28 tps | 1.3s | 128K | $0.10 | $0.32 | |
| 32 | 801 | ±12 | 1.9K | 3.1% | 11.6% | 11 tps | 2.5s | 66K | $0.77 | $0.77 | |
| 33 | 802 | ±11 | 2K | 6.1% | 0.6% | 176 tps | 1.0s | 33K | $0.06 | $0.10 | |
| 34 | 802 | ±18 | 1.8K | 7.5% | 2.7% | 116 tps | 0.6s | 131K | $0.50 | $1.50 | |
| 35 | 806 | ±16 | 2.3K | 5.1% | 0.8% | 248 tps | 0.4s | 131K | $0.08 | $0.08 | |
| 36 | 815 | ±17 | 1.5K | 4.1% | 1.4% | 44 tps | 1.4s | 8K | $0.80 | $0.80 | |
| 37 | 818 | ±18 | 825 | 11.3% | <0.1% | 142 tps | 0.3s | 33K | $0.01 | $0.02 | |
| 38 | 820 | ±17 | 950 | 3.1% | 1.4% | 53 tps | 1.4s | 33K | $1.00 | $3.00 | |
| 39 | 821 | ±7 | 3.8K | 4.0% | 1.5% | 43 tps | 0.5s | 128K | $0.50 | $1.50 | |
| 40 | 825 | ±17 | 2.2K | 5.5% | 1.4% | 177 tps | 0.4s | 128K | $0.14 | $0.14 |