Leaderboard | Coding

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1200

GPT-5.1 Codex Max

1197

MiniMax M2.1 Lightning

1194

Qwen3 30B A3B Instruct 2507

1192

MiniMax M2.1

1189

DeepSeek V3.2

1185

MiniMax M2.5 FP8

1185

GPT-5

1185

Grok 4 Fast Non-Reasoning

1183

MiniMax M2

1182

Nova Experimental Chat 12-10

1182

GLM 4.6

1178

GPT-5.3 Codex (Low)

1178

Grok 4.1 Fast Reasoning

1177

Grok 4 Fast Reasoning

1176

Gemini 2.5 Pro

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
41	43	GPT-5.1 Codex Max	1200	±12	6.4K	3.9%	3.0%	118 tps	4.1s	400K	$1.25	$10.00
42	43	MiniMax M2.1 Lightning	1197	±23	875	3.3%	1.7%	52 tps	2.1s	205K	$0.30	$2.40
43	49	Qwen3 30B A3B Instruct 2507	1194	±5	12.7K	5.7%	1.2%	55 tps	1.3s	131K	$0.13	$0.72
44	49	MiniMax M2.1	1192	±8	19.4K	3.6%	2.1%	66 tps	2.6s	205K	$0.30	$1.20
45	49	DeepSeek V3.2	1189	±8	5.1K	4.7%	1.4%	83 tps	5.1s	131K	$0.43	$1.09
46	49	MiniMax M2.5 FP8	1185	±17	610	3.2%	3.6%	33 tps	1.7s	205K	$0.45	$1.75
47	49	GPT-5	1185	±4	21.3K	5.3%	3.1%	78 tps	23.1s	400K	$1.25	$9.67
48	49	Grok 4 Fast Non-Reasoning	1185	±5	8.1K	7.1%	1.5%	93 tps	0.6s	2M	$0.27	$0.67
49	49	MiniMax M2	1183	±5	19.7K	4.2%	2.2%	39 tps	2.3s	205K	$0.21	$0.85
50	49	Nova Experimental Chat 12-10	1182	±9	2.9K	3.8%	2.4%	84 tps	12.9s	98K	$0	$0
51	49	GLM 4.6	1182	±7	17.2K	4.4%	5.4%	39 tps	1.5s	200K	$0.42	$1.66
52	49	GPT-5.3 Codex (Low)	1178	±28	510	1.0%	1.8%	61 tps	4.3s	400K	$1.75	$14.00
53	60	Grok 4.1 Fast Reasoning	1178	±7	39.5K	4.4%	1.5%	58 tps	7.3s	2M	$0.20	$0.50
54	60	Grok 4 Fast Reasoning	1177	±3	14.5K	5.0%	2.1%	102 tps	3.1s	2M	$0.30	$0.75
55	60	Gemini 2.5 Pro	1176	±3	37.9K	4.8%	2.3%	45 tps	2.6s	1M	$1.25	$10.00
56	60	Qwen3 235B A22B Instruct 2507	1172	±4	12.6K	6.4%	6.8%	13 tps	1.9s	262K	$0.13	$0.52
57	60	Claude Sonnet 3.5 v2	1171	±6	5.5K	3.4%	<0.1%	46 tps	1.4s	200K	$3.00	$15.00
58	60	GPT-5.1 Codex (Medium)	1171	±14	3K	3.2%	4.6%	71 tps	3.7s	400K	$1.25	$10.00
59	60	GPT-5.1 Instant	1171	±8	8.3K	4.1%	1.3%	50 tps	1.9s	400K	$1.25	$10.00
60	60	Grok 4.20 Beta Reasoning	1167	±22	1.2K	4.1%	1.1%	77 tps	4.5s	2M	$2.00	$5.50
61	69	Qwen3.5 35B A3B	1164	±25	865	3.9%	2.1%	116 tps	2.1s	256K	$0.63	$1.13
62	69	GPT-5 Codex (Low)	1163	±10	5K	4.1%	2.7%	112 tps	3.5s	400K	$1.25	$10.00
63	69	GLM 4.7	1161	±7	16.8K	3.7%	5.8%	40 tps	1.5s	200K	$0.77	$1.73
64	69	DeepSeek V3.1 Terminus Chat	1158	±5	6.5K	6.9%	3.4%	27 tps	1.5s	131K	$0.86	$1.80
65	74	Qwen Plus (Aug'24)	1146	±5	17.2K	4.7%	1.4%	53 tps	1.3s	30K	$0.40	$1.20
66	74	Qwen3.5 397B A17B	1142	±14	2.5K	2.9%	4.3%	57 tps	1.4s	256K	$0.52	$3.00
67	74	Gemini 2.5 Flash Preview 0925	1140	±6	7.6K	6.0%	1.2%	5 tps	0.9s	1M	$0.13	$0.97
68	77	GPT-5 Mini	1131	±5	8.6K	5.4%	2.6%	66 tps	14.2s	400K	$0.25	$2.00
69	77	DeepSeek V3.1 Turbo	1130	±7	4.8K	5.3%	0.9%	173 tps	1.3s	164K	$2.00	$3.75
70	77	Grok 4.20 Multi Agent Beta	1129	±19	945	3.6%	1.2%	56 tps	8.8s	2M	$2.00	$6.00
71	77	Qwen3 Max Thinking Preview	1127	±10	6.3K	5.7%	3.1%	40 tps	2.1s	256K	$1.20	$6.00
72	77	Grok 4	1125	±3	39.6K	4.4%	3.9%	29 tps	11.1s	256K	$3.00	$15.00
73	77	GPT-4.1	1123	±5	32.8K	5.2%	3.7%	112 tps	1.3s	1M	$2.00	$8.00
74	77	Gemini 2.5 Flash Lite Preview 0925	1122	±7	8.5K	6.6%	1.2%	209 tps	0.7s	1M	$0.25	$0.35
75	85	Gemini 2.5 Flash Thinking	1118	±4	13.7K	3.6%	2.2%	88 tps	6.4s	1M	$0.30	$2.50
76	85	GPT-5 Mini Minimal	1114	±12	3.2K	8.5%	1.2%	63 tps	1.4s	400K	$0.25	$2.00
77	85	GPT-5.2 Codex (Low)	1113	±19	1.2K	3.2%	4.5%	41 tps	5.0s	400K	$1.75	$14.00
78	85	DeepSeek V3.1 Chat	1110	±7	4.9K	6.6%	2.8%	21 tps	1.6s	131K	$0.38	$1.00
79	85	Qwen3 Omni 30B A3B Thinking	1110	±10	2.3K	6.0%	3.7%	67 tps	1.2s	66K	$0.97	$1.79
80	90	Qwen Max	1107	±4	18.3K	4.2%	1.5%	49 tps	1.5s	33K	$1.60	$6.40

2of6

View All (210 models)