Models
Topics
Language
More

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

Open license models

Filter the leaderboard to only show models that have an open license.

1388
Claude Opus 4.6 (Thinking)
1367
GPT-5.4 (High)
1339
GPT-5.4
1328
Claude Opus 4.6
1326
Nova Experimental Chat 10-20
1325
Claude Sonnet 4.6 (Thinking)
1307
GPT-5.1 (Medium)
1268
Gemini 3.1 Pro
1266
GPT-5.2 Instant
1258
GPT-5.1 (High)
1249
gpt-oss-120b
1243
GPT-5 (High)
1236
Mistral Medium 3.1
1233
GPT-5.1
1232
Claude Sonnet 4.6

Last updated about 1 month ago

RankOverallNameVIBE
Score
Confidence
Interval
VotesDownvote %Abort %SpeedLatencyContextCost
(Input)
Cost
(Output)
11Claude Opus 4.6 (Thinking)1388±101.8K0.8%2.5%56 tps1.6s200K$5.00$25.00
24GPT-5.4 (High)1367±158251.2%4.6%68 tps7.9s1M$2.50$15.00
32GPT-5.41339±155300.9%2.6%55 tps0.8s1M$2.50$15.00
42Claude Opus 4.61328±82.3K0.9%2.1%48 tps1.7s200K$5.00$25.00
537Nova Experimental Chat 10-201326±62.1K3.3%<0.1%30 tps0.5s98K$0$0
65Claude Sonnet 4.6 (Thinking)1325±71.6K1.2%4.7%57 tps1.1s200K$3.00$15.00
78GPT-5.1 (Medium)1307±71.2K2.5%<0.1%86 tps3.8s400K$0.83$6.67
86Gemini 3.1 Pro1268±84.3K0.7%3.5%35 tps4.1s1M$2.00$12.00
910GPT-5.2 Instant1266±46.2K0.7%1.7%52 tps2.0s400K$1.75$14.00
108GPT-5.1 (High)1258±65.3K1.3%3.2%76 tps6.9s400K$1.25$10.00
1148gpt-oss-120b1249±47.3K1.3%0.7%213 tps0.5s131K$0.11$0.50
1226GPT-5 (High)1243±73K2.3%4.5%81 tps35.9s400K$1.25$10.00
1319Mistral Medium 3.11236±55.1K2.3%<0.1%77 tps0.7s128K$0.40$2.00
148GPT-5.11233±83.3K1.4%2.3%71 tps1.4s400K$1.42$11.33
154Claude Sonnet 4.61232±111.6K0.9%1.6%47 tps1.2s200K$3.00$15.00
1633Qwen3 30B A3B Instruct 25071226±85.6K2.2%1.2%55 tps1.3s131K$0.13$0.72
1710Gemini 3 Pro1217±511.7K0.9%2.1%50 tps3.6s1M$2.00$12.00
18104Grok 3 Beta1209±16535<0.1%<0.1%58 tps0.8s131K$3.00$15.00
1922GPT-5 Chat1208±510.4K2.2%1.3%95 tps0.9s400K$1.25$10.00
2029Qwen3 VL 235B A22B Instruct1202±101.8K4.5%3.1%75 tps1.9s129K$0.37$1.81
2133Qwen Plus 07281199±118502.3%<0.1%55 tps0.9s1M$0.40$1.20
2233Qwen3 Next 80B A3B Instruct1198±73.4K3.0%0.6%84 tps1.1s256K$0.20$1.42
2348OpenAI o1-mini1195±410.8K1.1%<0.1%118 tpsN/A128K$1.13$4.51
2413GPT-5.3 Instant1191±111.6K1.2%0.9%63 tps0.8s400K$1.75$14.00
2517Grok 4.20 Beta Reasoning1190±175400.9%1.1%77 tps4.5s2M$2.00$5.50
2614Gemini 3 Pro (Low)1189±64.8K0.9%2.4%51 tps3.5s1M$2.00$12.00
2732Gemini 2.5 Pro High1182±36.7K2.2%1.5%48 tps2.3s1M$1.25$10.00
2840Qwen3 235B A22B Instruct 25071178±65.1K1.9%6.8%13 tps1.9s262K$0.13$0.52
29106Claude Sonnet 3.5 v21177±81.6K1.2%<0.1%46 tps1.4s200K$3.00$15.00
3017GPT-5.2 (High)1177±77.4K0.8%6.7%18 tps16.3s400K$1.75$14.00
31111Claude Sonnet 3.71173±62.8K1.9%<0.1%39 tps1.6s200K$3.00$15.00
3214Gemini 3 Flash Preview Thinking1173±54.4K0.6%1.6%3 tps6.2s1M$0.50$3.00
3316Nova Experimental Chat 11-101171±62.7K1.3%0.4%84 tps8.9s98K$0$0
3481GPT-4o1170±92.3K2.8%1.0%49 tps2.4s128K$3.71$12.57
3516GPT-5.21168±63K1.2%4.1%18 tps2.7s400K$1.75$14.00
3617Gemini 3 Flash Preview1166±72.4K0.6%1.3%138 tps1.4s1M$0.50$3.00
3760Gemini 2.5 Flash Preview 09251163±72.7K2.9%1.2%5 tps0.9s1M$0.13$0.97
3843Gemini 2.5 Flash Thinking Preview 09251159±53.1K2.7%<0.1%111 tps4.7s1M$0.30$2.50
3956Gemini 2.5 Pro Low1159±63.3K3.5%<0.1%89 tps2.4s1M$1.25$10.00
40100Gemini 2.5 Flash Preview1159±111K1.4%<0.1%138 tps6.9s1M$0.15$0.60
417Claude Opus 4.5 (Thinking)1155±55.3K1.6%1.8%49 tps1.4s200K$5.00$25.00
4226Grok 4.1 Fast Non-Reasoning1151±63.2K1.8%0.9%101 tps0.5s2M$0.20$0.50
4368Qwen Plus (Aug'24)1150±57.5K1.4%1.4%53 tps1.3s30K$0.40$1.20
44101gpt-oss-20b1150±74K1.7%0.5%216 tps0.5s131K$0.06$0.26
45147Arcee AI Maestro Reasoning1149±72K1.4%<0.1%85 tps0.3s131K$0.90$3.30
4652Qwen3.5 122B A17B1147±137651.9%1.5%82 tps1.4s256K$0.40$3.20
4717Claude Opus 4.51144±82.4K2.1%1.5%45 tps1.5s200K$5.00$25.00
48100Qwen Plus 0728 (Thinking)1141±145002.0%<0.1%56 tps1.1s1M$0.40$4.00
4937Kimi K2.5 Instant1140±101.1K1.4%2.9%32 tps3.0s262K$0.50$3.00
5048Step 3.5 Flash1140±159650.5%2.2%109 tps0.6s256K$0.05$0.15
5137Qwen3 Omni 30B A3B Thinking1139±71.6K1.2%3.7%67 tps1.2s66K$0.97$1.79
5229Nova Experimental Chat 12-101138±91.9K0.5%2.4%84 tps12.9s98K$0$0
53111Solar Pro 3 (Reasoning)1136±138301.2%3.2%118 tps1.2s131K$0.15$0.60
5410Claude Sonnet 4.5 (Thinking)1136±56.8K2.7%1.9%44 tps1.1s200K$3.00$15.00
5542GPT-5.2 (Extra High) 1131±53.7K0.9%13.2%17 tps20.5s400K$1.75$14.00
5626Claude Haiku 4.5 (Extended Thinking)1129±53.6K1.6%1.4%115 tps0.7s200K$1.00$5.00
57213DeepSeek R1T Chimera1128±81.9K2.5%<0.1%46 tps1.1s164K$0.09$0.36
5842Qwen3 Max Instruct Preview1126±44.3K2.8%1.1%31 tps1.7s256K$1.43$6.61
5944Gemini 2.5 Pro1126±416.2K1.5%2.3%45 tps2.6s1M$1.25$10.00
6077Claude Opus 4.11122±91.3K2.3%3.0%17 tps3.7s200K$15.00$75.00
6144Grok 4.1 Fast Reasoning1119±65.4K1.5%1.5%58 tps7.3s2M$0.20$0.50
6237Claude Sonnet 4.51116±65K3.1%1.4%41 tps1.3s200K$1.80$9.00
6356DeepSeek V3.1 Turbo1114±64K2.1%0.9%173 tps1.3s164K$2.00$3.75
6440DeepSeek V3.21113±53.6K0.8%1.4%83 tps5.1s131K$0.43$1.09
65101Gemini 2.5 Flash Lite1112±67.6K1.7%1.3%210 tps0.7s1M$0.10$0.40
6693Qwen Max1111±67.6K1.4%1.5%49 tps1.5s33K$1.60$6.40
6795Qwen3 32B1111±175151.9%3.9%30 tps3.1s41K$0.12$0.42
6822GLM 51110±71.8K0.8%3.4%36 tps2.7s200K$0.72$2.55
6984GPT-5 Mini Minimal1107±109703.5%1.2%63 tps1.4s400K$0.25$2.00
7093DeepSeek V3 0324 Turbo1103±54.4K1.9%6.3%12 tps2.4s164K$0.73$1.79
71121Qwen3 32B Fast1098±89K1.0%11.6%30 tps3.1s41K$0.10$0.25
7286DeepSeek V3.1 Chat1097±71.9K2.3%2.8%21 tps1.6s131K$0.38$1.00
7356DeepSeek V3.2 Thinking1096±63.8K0.9%9.0%30 tps2.6s131K$0.28$0.42
74111LongCat Flash Chat1095±71.7K2.8%0.8%85 tps0.9s131K$0.14$0.68
7584Nova Experimental Chat 10-091093±101.3K7.4%<0.1%59 tps6.1s98K$0$0
76121QwQ 32B1091±59.9K0.9%5.4%41 tps2.1s16K$0.43$0.56
7733Kimi K2.51090±64.5K0.7%6.5%33 tps1.7s262K$0.34$2.57
7886Nemotron 3 Nano (Thinking)1089±91.5K0.7%2.0%200 tps0.5s256K$0$0
7962GPT-5.1 Instant1085±63.7K1.1%1.3%50 tps1.9s400K$1.25$10.00
80106DeepSeek V3 03241084±45.7K1.4%5.8%12 tps2.7s164K$0.38$0.93
8152GPT-51083±57.6K2.2%3.1%78 tps23.1s400K$1.25$9.67
8286Claude Sonnet 41083±512K1.6%1.8%49 tps1.3s200K$3.00$15.00
8360MiniMax M2.11080±65.2K0.6%2.1%66 tps2.6s205K$0.30$1.20
84159Qwen Turbo1079±83.9K1.4%<0.1%53 tps1.1s1M$0.05$0.20
8580GPT-5 (Minimal)1077±53K3.6%<0.1%67 tps1.4s400K$1.25$10.00
8648Grok 4 Fast Reasoning1077±63.3K2.8%2.1%102 tps3.1s2M$0.30$0.75
8721Claude Opus 41077±139202.6%<0.1%25 tps1.5s200K$15.00$75.00
88121NVIDIA Llama 3.3 Nemotron Super 49B v1.51076±127551.9%2.0%50 tps0.6s131K$0.09$0.33
8952Claude Haiku 4.51076±84.2K2.2%1.1%100 tps0.9s200K$1.00$5.00
9052Grok 4 Fast Non-Reasoning1075±62.9K3.3%1.5%93 tps0.6s2M$0.27$0.67
9195DeepSeek-R1 Turbo1075±61.9K2.4%2.6%29 tps1.8s64K$2.85$4.75
9256Gemini 3.1 Flash Lite Preview Thinking1071±135601.8%1.7%75 tps4.7s1M$0.25$1.50
9368Grok 41070±413.8K1.6%3.9%29 tps11.1s256K$3.00$15.00
9495Gemini 2.5 Flash1068±411.2K1.2%1.3%2 tps3.7s1M$0.30$2.50
9556MiniMax M2.1 Lightning1067±128550.6%1.7%52 tps2.1s205K$0.30$2.40
96292AFM 4.5B1067±62.1K1.6%<0.1%81 tps0.3s66K$0.05$0.20
9779Qwen3 Max Thinking Preview1067±63.1K1.4%3.1%40 tps2.1s256K$1.20$6.00
98108GPT-5 Mini Low1067±117554.4%<0.1%69 tps3.2s400K$0.25$2.00
9971Gemini 2.5 Flash Lite Preview 09251066±63.3K2.8%1.2%209 tps0.7s1M$0.25$0.35
100124Qwen3 235B A22B Thinking 25071065±71.8K1.9%2.5%53 tps1.6s131K$0.59$5.70
10144Kimi K2 Thinking Turbo1065±63K1.9%2.0%75 tps1.4s262K$1.15$8.00
10265Mistral Large 31064±71.8K2.2%2.1%51 tps1.0s256K$0.50$1.50
103118GPT-4.1 mini1062±55.5K1.8%1.1%67 tps0.9s1M$0.34$1.60
10481OpenAI o3-pro1061±141.3K2.7%5.2%22 tps70.8s200K$20.00$80.00
105133Solar Pro 2 2507101060±54.8K1.5%<0.1%9 tpsN/A66K$0.50$0.50
106106Grok 31054±67.1K1.7%1.5%53 tps0.6s1M$3.67$18.33
10795Kimi K2 Thinking1054±91.9K3.8%4.2%61 tps5.9s262K$0.24$1.03
10871DeepSeek V3.11053±131.8K1.6%0.8%197 tps0.4s164K$0.55$1.60
10944DeepSeek V3.1 Terminus Chat1053±62.6K2.6%3.4%27 tps1.5s131K$0.86$1.80
110133Gemini 2.5 Pro Preview 06051051±12655<0.1%<0.1%0 tps3.7s1M$1.25$10.00
111126Qwen3 30B A3B1051±73.9K1.3%5.1%163 tps1.0s41K$0.06$0.21
11265DeepSeek V3.2 Exp Chat1047±92.2K3.1%2.6%29 tps1.5s131K$0.27$0.39
11362MiniMax M21046±63.8K1.9%2.2%39 tps2.3s205K$0.21$0.85
114119ERNIE 4.5 300B A47B1046±65.3K1.3%4.7%23 tps2.3s123K$0.28$1.10
115133GPT-4.1 nano1046±85.1K2.0%0.6%175 tps0.5s1M$0.10$0.40
116241OLMo 3 7B Think1045±127101.4%4.2%77 tps0.4s66K$0.12$0.20
11748Claude Sonnet 4 (Thinking)1044±58.4K2.3%1.5%52 tps1.5s200K$3.00$13.67
11871Gemini 2.5 Flash Thinking1042±56.5K1.5%2.2%88 tps6.4s1M$0.30$2.50
119182Gemini 2.5 Flash Preview Thinking1041±166201.6%<0.1%26 tps1.8s1M$0.15$1.76
12071Qwen3.5 397B A17B1040±101.4K1.4%4.3%57 tps1.4s256K$0.52$3.00
121119GLM 4.7 FP81039±95151.0%6.9%40 tps1.3s200K$0.30$1.20
122106DeepSeek V3.1 Terminus Thinking1035±111.4K2.8%5.9%27 tps1.8s131K$0.56$1.68
123113Mistral Medium1035±53.6K1.8%1.8%48 tps0.6s33K$1.48$4.55
124147GLM 4.5 Air1035±73.2K2.3%<0.1%22 tps1.4s131K$0.10$0.38
12565GLM 4.61030±82.6K2.8%5.4%39 tps1.5s200K$0.42$1.66
12686Qwen3 235B A22B1030±93.1K1.6%5.3%71 tps0.9s41K$0.23$0.63
12795DeepSeek V3.2 Exp Thinking1029±111.4K0.7%7.2%26 tps3.0s131K$0.28$0.42
12868GLM 4.71026±64.5K0.8%5.8%40 tps1.5s200K$0.77$1.73
12971GPT-5 Mini1025±63.2K2.0%2.6%66 tps14.2s400K$0.25$2.00
130314Weather1025±195804.1%<0.1%36 tps1.1s32K$0$0
131126Qwen3 VL 235B A22B Thinking1024±111.6K4.2%4.3%47 tps3.0s127K$0.47$3.31
132159Gemini 2.5 Pro Preview 03251023±187352.6%<0.1%3 tps16.6s1M$1.25$10.00
133143Gemini 2.0 Flash1022±72.5K2.5%<0.1%76 tps0.5s1M$0.14$0.56
134153Qwen 2.5 32B Instruct1019±81.4K1.8%2.5%48 tps1.0s131K$0.21$0.25
135113GLM 4.51019±62.5K1.6%3.7%46 tps1.4s131K$0.43$1.63
13671Seed 1.8 2512281018±64.4K1.0%3.7%41 tps2.1s256K$0.25$2.00
137139GLM 4.6V1018±121.6K1.2%6.4%21 tps1.8s128K$0.38$0.90
138148Qwen3 30B A3B Thinking 25071017±92.2K1.8%0.5%124 tps1.2s131K$0.16$1.70
13956Claude Opus 4.1 (Thinking)1017±81.5K1.3%<0.1%20 tps3.9s200K$15.00$75.00
140133Kimi K2 09051013±112.1K3.7%4.0%30 tps1.4s262K$0.63$2.39
141126DeepSeek V31013±68.8K1.3%0.9%69 tps1.1s64K$0.59$1.49
142101DeepSeek V3 (Turbo)1013±127051.4%1.5%32 tps1.5s64K$0.40$1.30
143129Qwen3 Max Thinking1012±62.1K0.2%13.5%32 tps2.3s256K$1.20$6.00
144129Command A1005±58.6K1.7%2.2%42 tps0.8s256K$2.00$7.33
145143Seed 1.6 2506151005±208802.2%3.1%46 tps2.2s256K$0.25$2.00
146213Claude Haiku 3.51005±121.5K3.0%0.8%40 tps2.8s200K$0.80$4.00
147133DeepSeek V3.2 Speciale1003±101.3K2.2%6.0%43 tps1.4s131K$0.84$1.52
148113Kimi K2 Fast1003±410K1.8%0.8%365 tps0.5s131K$1.00$3.00
149113Gemini 2.5 Flash Lite Thinking1003±83.7K2.4%1.0%118 tps4.4s1M$0.03$0.13
150133Qwen3 14B1002±63.6K1.6%1.7%109 tps0.8s41K$0.04$0.15
151148DeepSeek-R11001±65K1.7%0.8%133 tps0.6s64K$0.91$3.07
152157Qwen3 Next 80B A3B Thinking1000±73.2K3.0%0.6%175 tps1.3s256K$0.21$2.26
153133DeepSeek-R1 0528998±44.9K1.5%1.3%93 tps0.5s64K$1.60$3.67
154292GPT-5 Nano Minimal992±165154.6%<0.1%88 tps0.8s400K$0.05$0.40
155161Qwen3 8B992±83.1K1.6%2.4%61 tps1.4s41K$0.02$0.07
15671MiniMax M2.5 FP8988±195251.9%3.6%33 tps1.7s205K$0.45$1.75
157143Gemini 2.0 Flash Lite988±64.1K2.6%<0.1%42 tps0.5s1M$0.08$0.30
15884Claude Sonnet 3.7 (Thinking)983±64.8K2.1%<0.1%41 tps2.6s200K$3.00$15.00
159200K2 Think982±118950.6%<0.1%418 tps2.8sN/A$0$0
160153OpenAI o1981±59.1K1.7%4.2%92 tps5.5s200K$15.00$60.00
161161Llama 4 Maverick980±67.3K1.8%1.2%88 tps2.4s1M$0.23$0.83
16295Gemini 2.5 Flash Lite Thinking Preview 0925978±72.2K2.5%1.5%152 tps3.0s1M$0.10$0.40
163182Fauna Fox977±62K1.7%<0.1%194 tps0.3s128K$0.04$0.15
164124Kimi K2 0905 Turbo972±73.2K3.9%0.7%373 tps0.5s262K$1.70$6.50
16579MiniMax M2.5 Lightning972±179351.1%1.5%51 tps2.0s205K$0.60$2.40
166165Qwen3 4B970±83.1K2.7%1.9%94 tps1.5s128K$0.01$0.01
167148OpenAI o4-mini-high966±49.3K1.8%1.9%117 tps15.9s200K$1.10$4.40
16886Amazon Nova 2 Lite966±91.6K3.0%1.0%137 tps0.6s300K$0.35$2.95
169175OpenAI o3-mini-low963±88.8K1.9%0.7%139 tps1.5s200K$1.10$4.40
170129DeepSeek V3.1 Thinking956±102.2K2.4%7.1%18 tps1.8s131K$0.23$0.75
171270Solar Pro 2 250710 (Reasoning)955±72.3K2.3%<0.1%9 tpsN/A66K$0.50$0.50
172170Llama 3.1 8B Turbo954±206852.1%2.1%650 tps0.5s128K$0.13$0.14
173170Mistral Small 3.2 24B953±91.4K1.8%2.8%141 tps0.7s33K$0.02$0.08
174177Mistral Small 3.1 24B Instruct949±167452.0%7.5%15 tps2.4s131K$0.06$0.18
175277Grok 2949±166500.8%<0.1%55 tps1.1s131K$2.00$10.00
176165DeepSeek R1T2 Chimera947±206202.4%3.0%28 tps1.8s164K$0.13$0.45
177160Llama 4 Scout941±76.9K1.5%0.6%88 tps5.1s131K$0.18$0.46
178179GLM 4.7 Flash939±111.1K1.3%5.8%61 tps2.8s128K$0.07$0.39
179186Grok 3 Mini Fast939±113.9K2.3%1.6%44 tps0.5s131K$0.60$4.00
180253R1 1776938±103.1K0.9%<0.1%61 tps1.0s128K$2.00$8.00
181157Cogito v2.1 671B937±158851.7%0.8%85 tps0.5s128K$1.25$1.25
182177Llama 3 70B Turbo935±141K1.4%<0.1%31 tps0.0s8K$0.73$0.83
183148OpenAI o3935±54.3K1.7%0.9%85 tps6.8s128K$7.33$29.33
184161DeepSeek Prover v2933±178252.4%5.2%14 tps1.3s164K$0.40$1.56
185214Llama 3.3 70B Instruct Turbo931±275053.8%2.0%78 tps1.0s131K$0.88$0.88
186201Gemma 3 27B IT930±156552.2%2.0%60 tps0.8s128K$0.17$0.29
187157GPT-5 Nano928±101.8K3.0%3.2%113 tps20.9s400K$0.05$0.40
188170Kimi K2 0711928±93.2K2.2%1.6%29 tps1.3s131K$0.72$2.60
189253Magistral Medium927±165305.4%<0.1%95 tps0.5s41K$2.00$5.00
190139OpenAI o4-mini925±84K2.3%1.4%97 tps7.0s128K$1.10$4.40
191209Seed 1.6 Flash 250715922±165802.5%2.5%108 tps1.6s256K$0.07$0.30
192214OpenAI o3-mini-high922±67.5K2.0%2.4%231 tps10.5s200K$1.10$4.40
193177OpenAI o3-mini921±69K1.9%0.8%143 tps3.3s200K$1.10$4.40
194214Gemma 3 12B921±186353.1%4.2%73 tps0.8s131K$0.05$0.12
195219NVIDIA Llama 3.3 Nemotron Super 49B v1918±159201.1%<0.1%13 tpsN/A131K$0.07$0.20
196186Jamba 1.6 Large915±186601.5%2.0%59 tps1.2s256K$1.33$5.33
197200Claude Sonnet 3.5914±148302.4%1.0%40 tps2.7s200K$3.00$15.00
198113GLM 4.5 AirX913±235401.8%3.3%75 tps1.2s131K$1.10$4.50
199314DeepSeek-R1 0528 Qwen3 8B912±53.4K2.3%<0.1%45 tps2.4s128K$0.05$0.09
200211Gemini 1.5 Pro912±155802.5%<0.1%15 tps0.0s2M$0.78$3.13
201186Grok 3 Mini908±93.8K2.3%1.2%43 tps0.5s131K$0.30$0.50
202214Qwen 2.5 7B901±194904.9%3.7%40 tps1.9s131K$0.08$0.27
203194Llama 3.3 70B900±121.1K3.0%0.3%500 tps0.5s8K$0.48$0.66
204170Devstral Medium899±109951.5%1.5%77 tps0.6s131K$0.40$2.00
205186GLM 4.6V Flash899±151.3K2.3%3.7%64 tps2.1s128K$0.04$0.40
206186Gemma 3n E4B892±102K2.6%2.0%30 tps0.5s8K$0.01$0.02
207241GPT-5 Mini High891±101.4K2.8%<0.1%33 tps3.9s400K$0.25$2.00
208165Pixtral Large890±126404.5%2.5%57 tps1.3s128K$1.50$4.50
209292NVIDIA Llama 3.1 Nemotron Ultra 253B v1888±225750.9%<0.1%40 tps0.8s128K$0.30$0.90
210265Llama 3.1 405B Instruct Turbo885±225603.4%<0.1%26 tps0.8s131K$3.50$3.50
211277GLM Z1 32B885±141.1K2.2%<0.1%18 tps9.3s33K$0.09$0.11
212277Wikipedia882±102.8K3.7%<0.1%47 tps2.1s32K$0$0
213179Switchpoint Router876±166751.5%1.7%71 tps4.9s131K$0.85$3.40
214233Llama 3.1 70B Instruct Turbo869±199952.0%<0.1%110 tps0.8s128K$0.88$0.88
215229ERNIE 4.5 21B A3B Thinking866±166852.8%1.8%87 tps1.5s120K$0.07$0.28
216324Solar Pro 3856±205201.0%2.0%99 tps1.3s131K$0.15$0.60
217209Llama 3.3 Swallow 70B Instruct854±188901.1%1.4%153 tps1.3s131K$0.13$0.39
218200NVIDIA Llama 3.1 Nemotron 70B851±191.3K1.1%<0.1%9 tps0.1s128K$0.33$0.39
219229Magistral Medium 2509849±169903.9%4.0%58 tps0.9s131K$2.00$5.00
220194Magistral Small 2506847±141.3K1.9%1.6%156 tps0.5s40K$0.37$1.10
221374Cogito V2 671B839±131.3K3.0%<0.1%41 tps0.6s164K$1.25$1.25
222161Mistral Small 3.1835±176751.5%7.4%13 tps2.6s32K$0.17$0.28
223209Qwen 2.5 14B Instruct830±245701.7%2.4%40 tps1.6s1M$0.40$1.61
224270AFM 4.5B Preview830±228752.2%<0.1%32 tps0.0s66K$0$0
225265Magistral Small 2509830±238255.7%2.7%116 tps0.6s131K$0.50$1.50
226179Inception Mercury829±102K1.5%0.4%257 tps1.1s32K$0.25$1.00
227260Hermes 4 405B Reasoning FP8828±111.3K3.7%3.6%32 tps0.8s131K$1.00$3.00
228201Llama 3 8B826±177201.4%6.0%85 tps0.7s8K$0.12$0.16
229235Gemma 3 4B825±147553.2%1.3%138 tps0.7s131K$0.02$0.04
230222Jamba 1.5 Large819±156901.4%1.7%48 tps0.9s256K$1.50$6.00
231194Llama 3.2 11B Instruct816±225252.8%1.5%152 tps0.5s8K$0.16$0.16
232241Claude Haiku 3813±186452.3%0.4%62 tps0.5s200K$0.25$1.25
233270Arcee AI Virtuoso-Medium809±215400.9%<0.1%3 tpsN/A131K$0.50$0.80
234179Amazon Nova Pro 1.0807±191.4K1.7%0.9%96 tps0.7s300K$0.80$1.70
235201GPT-4o mini803±245454.4%2.1%71 tps1.7s128K$0.15$0.60
236222Sky T1 32B Preview797±176251.6%7.8%73 tps0.6s16K$0.12$0.18
237292Arcee AI Spotlight796±151.4K1.8%<0.1%121 tps0.4s131K$0.18$0.18
238219Arcee AI Virtuoso-Large791±118401.8%<0.1%64 tps0.5s131K$0.75$1.20
239314MAI-DS-R1778±121.7K3.4%<0.1%73 tps3.2s64K$0.10$0.40
240225Command R 7B775±188701.7%1.1%76 tps0.4s128K$0.04$0.15
241399Magistral Medium (Thinking)774±72.3K2.7%<0.1%67 tps0.8s41K$2.00$5.00
242339Refuel LLM 2 Small768±208002.4%<0.1%116 tps0.5s8K$0.20$0.20
243246Ministral 3B766±215751.7%0.8%248 tps0.4s131K$0.08$0.08
244229Ministral 8B763±235253.7%1.4%177 tps0.4s128K$0.14$0.14
245225Command R754±225402.7%5.8%54 tps0.6s128K$0.30$0.99
246235GLM 4 32B751±197402.0%2.6%40 tps1.6s33K$0.14$0.14
247241Arcee AI Blitz748±157101.4%<0.1%6 tpsN/A33K$0.45$0.75
248246DeepSeek-R1 Distill Llama 70B733±102.6K2.3%3.6%27 tps1.6s32K$0.73$0.95
249256Gemma 3 1B733±216702.9%0.6%176 tps1.0s33K$0.06$0.10
250214C4AI Aya Expanse 32B702±228251.8%1.5%43 tps0.5s128K$0.50$1.50
251225GPT-3.5 Turbo 16k699±176900.7%<0.1%22 tps0.6s16K$3.00$4.00
252406DeepSeek-R1 Distill Qwen 14B664±141.7K2.5%<0.1%44 tps1.7s64K$0.63$0.63
253424DeepSeek-R1 Distill Qwen 7B664±216401.5%<0.1%0 tpsN/A131K$0.05$0.10
254430DeepSeek-R1 Distill Qwen 1.5B648±179152.1%<0.1%20 tps0.0s131K$0.18$0.18
255274DeepSeek-R1 Distill Qwen 32B638±91.8K2.7%6.2%22 tps1.8s131K$0.37$0.39
256287Phi 4 Reasoning632±132K2.2%21.0%29 tps1.0s33K$0.06$0.25
257428DeepSeek-R1 Distill Llama 8B624±201.1K3.1%<0.1%17 tpsN/A32K$0.04$0.04
258284MiniMax M1570±103K1.3%<0.1%31 tps2.8s1M$0.55$2.20
259291Phi 4 Mini Reasoning545±122.3K3.1%9.7%30 tps0.9s128K$0.07$0.30
260281MythoMax L2 13B466±305204.6%1.2%22 tps1.1s4K$0.18$0.18
Show Less