Leaderboard | Coding

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1170

Gemini 2.5 Pro Low

1171

GPT-5.1 Instant

1171

GPT-5.1 Codex (Medium)

1171

Claude Sonnet 3.5 v2

1172

Qwen3 235B A22B Instruct 2507

1173

Gemini 2.5 Flash Thinking Preview 0925

1176

Gemini 2.5 Pro

1177

Grok 4 Fast Reasoning

1178

DeepSeek V3.2 Thinking

1178

Grok 4.1 Fast Reasoning

1178

GPT-5.3 Codex (Low)

1182

GLM 4.6

1182

Nova Experimental Chat 12-10

1183

MiniMax M2

1185

Grok 4 Fast Non-Reasoning

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
321	75	Gemini 2.5 Pro Low	1170	±4	9.6K	8.1%	<0.1%	89 tps	2.4s	1M	$1.25	$10.00
322	60	GPT-5.1 Instant	1171	±8	8.3K	4.1%	1.3%	50 tps	1.9s	400K	$1.25	$10.00
323	60	GPT-5.1 Codex (Medium)	1171	±14	3K	3.2%	4.6%	71 tps	3.7s	400K	$1.25	$10.00
324	60	Claude Sonnet 3.5 v2	1171	±6	5.5K	3.4%	<0.1%	46 tps	1.4s	200K	$3.00	$15.00
325	60	Qwen3 235B A22B Instruct 2507	1172	±4	12.6K	6.4%	6.8%	13 tps	1.9s	262K	$0.13	$0.52
326	75	Gemini 2.5 Flash Thinking Preview 0925	1173	±7	9.2K	6.8%	<0.1%	111 tps	4.7s	1M	$0.30	$2.50
327	60	Gemini 2.5 Pro	1176	±3	37.9K	4.8%	2.3%	45 tps	2.6s	1M	$1.25	$10.00
328	60	Grok 4 Fast Reasoning	1177	±3	14.5K	5.0%	2.1%	102 tps	3.1s	2M	$0.30	$0.75
329	60	DeepSeek V3.2 Thinking	1178	±9	23.3K	4.0%	9.0%	30 tps	2.6s	131K	$0.28	$0.42
330	60	Grok 4.1 Fast Reasoning	1178	±7	39.5K	4.4%	1.5%	58 tps	7.3s	2M	$0.20	$0.50
331	49	GPT-5.3 Codex (Low)	1178	±28	510	1.0%	1.8%	61 tps	4.3s	400K	$1.75	$14.00
332	49	GLM 4.6	1182	±7	17.2K	4.4%	5.4%	39 tps	1.5s	200K	$0.42	$1.66
333	49	Nova Experimental Chat 12-10	1182	±9	2.9K	3.8%	2.4%	84 tps	12.9s	98K	$0	$0
334	49	MiniMax M2	1183	±5	19.7K	4.2%	2.2%	39 tps	2.3s	205K	$0.21	$0.85
335	49	Grok 4 Fast Non-Reasoning	1185	±5	8.1K	7.1%	1.5%	93 tps	0.6s	2M	$0.27	$0.67
336	49	GPT-5	1185	±4	21.3K	5.3%	3.1%	78 tps	23.1s	400K	$1.25	$9.67
337	49	MiniMax M2.5 FP8	1185	±17	610	3.2%	3.6%	33 tps	1.7s	205K	$0.45	$1.75
338	62	Qwen Plus 0728	1189	±8	2.1K	7.5%	<0.1%	55 tps	0.9s	1M	$0.40	$1.20
339	49	DeepSeek V3.2	1189	±8	5.1K	4.7%	1.4%	83 tps	5.1s	131K	$0.43	$1.09
340	49	MiniMax M2.1	1192	±8	19.4K	3.6%	2.1%	66 tps	2.6s	205K	$0.30	$1.20
341	62	OpenAI o1-mini	1192	±4	15K	4.6%	<0.1%	118 tps	N/A	128K	$1.13	$4.51
342	49	Kimi K2 Thinking Turbo	1192	±6	20.3K	3.4%	2.0%	75 tps	1.4s	262K	$1.15	$8.00
343	49	Qwen3 30B A3B Instruct 2507	1194	±5	12.7K	5.7%	1.2%	55 tps	1.3s	131K	$0.13	$0.72
344	43	MiniMax M2.1 Lightning	1197	±23	875	3.3%	1.7%	52 tps	2.1s	205K	$0.30	$2.40
345	43	GPT-5.1 Codex Max	1200	±12	6.4K	3.9%	3.0%	118 tps	4.1s	400K	$1.25	$10.00
346	58	Claude Sonnet 3.7	1201	±4	12.1K	3.2%	<0.1%	39 tps	1.6s	200K	$3.00	$15.00
347	43	Qwen3 Max Instruct Preview	1203	±6	16.1K	4.6%	1.1%	31 tps	1.7s	256K	$1.43	$6.61
348	43	Gemini 2.5 Pro High	1204	±3	21.1K	5.7%	1.5%	48 tps	2.3s	1M	$1.25	$10.00
349	43	Gemini 3 Flash Preview	1205	±11	7.2K	3.7%	1.3%	138 tps	1.4s	1M	$0.50	$3.00
350	43	Claude Sonnet 4	1205	±3	43.2K	3.7%	1.8%	49 tps	1.3s	200K	$3.00	$15.00
351	53	Mistral Medium 3.1	1206	±5	16.4K	5.1%	<0.1%	77 tps	0.7s	128K	$0.40	$2.00
352	53	Claude Sonnet 3.7 (Thinking)	1210	±3	13.6K	3.1%	<0.1%	41 tps	2.6s	200K	$3.00	$15.00
353	36	Kimi K2.5 Instant	1210	±8	1.8K	3.2%	2.9%	32 tps	3.0s	262K	$0.50	$3.00
354	36	Qwen3.5 27B	1211	±16	910	4.7%	3.7%	55 tps	2.6s	256K	$0.30	$2.40
355	36	GPT-5.2 Codex (Medium)	1211	±12	2.4K	3.0%	5.7%	37 tps	6.3s	400K	$1.75	$14.00
356	36	GPT-5 Codex (Medium)	1214	±6	8.8K	3.9%	4.1%	122 tps	5.2s	400K	$1.25	$10.00
357	36	Qwen3.5 122B A17B	1216	±15	1.9K	3.1%	1.5%	82 tps	1.4s	256K	$0.40	$3.20
358	36	Qwen3 VL 235B A22B Instruct	1220	±7	5.6K	6.7%	3.1%	75 tps	1.9s	129K	$0.37	$1.81
359	36	GPT-5.2 (Extra High)	1221	±9	8K	3.5%	13.2%	17 tps	20.5s	400K	$1.75	$14.00
360	44	Nova Experimental Chat 10-20	1221	±5	4.4K	8.1%	<0.1%	30 tps	0.5s	98K	$0	$0

9of11

View All (404 models)