Leaderboard | Coding

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1054

GPT-5.1 Codex Mini (High)

1057

GPT-5.1 Codex Mini (Medium)

1058

LongCat Flash Chat

1058

Seed 2.0 Lite (Medium)

1061

Gemini 2.5 Flash Lite Thinking

1061

OpenAI o1-pro

1061

DeepSeek V3.1 Terminus Thinking

1062

OpenAI o1

1063

Grok 4.20 Beta Non-reasoning

1066

gpt-oss-20b

1070

Kimi K2 0905 Turbo

1070

GPT-5 (Low)

1073

Kimi K2 Fast

1074

Kimi K2 0905

1075

GLM 4.5

Last updated about 1 month ago

Rank	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
161	GPT-5.1 Codex Mini (High)	1054	±15	2.2K	3.9%	5.9%	70 tps	4.6s	400K	$0.25	$2.00
162	GPT-5.1 Codex Mini (Medium)	1057	±15	1.9K	4.9%	4.6%	69 tps	4.1s	400K	$0.25	$2.00
163	LongCat Flash Chat	1058	±12	2.7K	5.9%	0.8%	85 tps	0.9s	131K	$0.14	$0.68
164	Seed 2.0 Lite (Medium)	1058	±20	525	3.7%	6.6%	33 tps	1.6s	256K	$0.25	$2.00
165	Gemini 2.5 Flash Lite Thinking	1061	±4	9.8K	6.2%	1.0%	118 tps	4.4s	1M	$0.03	$0.13
166	OpenAI o1-pro	1061	±20	680	7.5%	5.2%	33 tps	72.8s	200K	$150.00	$600.00
167	DeepSeek V3.1 Terminus Thinking	1061	±9	2.9K	9.4%	5.9%	27 tps	1.8s	131K	$0.56	$1.68
168	OpenAI o1	1062	±6	9.9K	3.3%	4.2%	92 tps	5.5s	200K	$15.00	$60.00
169	Grok 4.20 Beta Non-reasoning	1063	±36	500	4.8%	1.1%	151 tps	0.6s	2M	$2.00	$6.00
170	gpt-oss-20b	1066	±6	7.7K	7.1%	0.5%	216 tps	0.5s	131K	$0.06	$0.26
171	Kimi K2 0905 Turbo	1070	±6	7.5K	9.1%	0.7%	373 tps	0.5s	262K	$1.70	$6.50
172	GPT-5 (Low)	1070	±14	690	3.5%	1.8%	75 tps	8.2s	400K	$1.25	$10.00
173	Kimi K2 Fast	1073	±5	35K	6.4%	0.8%	365 tps	0.5s	131K	$1.00	$3.00
174	Kimi K2 0905	1074	±7	8.7K	4.3%	4.0%	30 tps	1.4s	262K	$0.63	$2.39
175	GLM 4.5	1075	±5	6K	7.0%	3.7%	46 tps	1.4s	131K	$0.43	$1.63
176	Qwen3 Max Thinking	1080	±18	1.5K	2.0%	13.5%	32 tps	2.3s	256K	$1.20	$6.00
177	Mistral Medium	1080	±4	9.6K	5.6%	1.8%	48 tps	0.6s	33K	$1.48	$4.55
178	Seed 1.8 251228	1081	±10	3.2K	3.1%	3.7%	41 tps	2.1s	256K	$0.25	$2.00
179	DeepSeek V3 (Turbo)	1082	±20	1.5K	5.1%	1.5%	32 tps	1.5s	64K	$0.40	$1.30
180	Qwen3 Omni 30B A3B Instruct	1085	±13	775	4.3%	3.9%	65 tps	1.2s	66K	$0.35	$0.97
181	GPT-4.1 nano	1085	±5	17K	5.0%	0.6%	175 tps	0.5s	1M	$0.10	$0.40
182	GPT-4.1 mini	1087	±5	19.7K	4.2%	1.1%	67 tps	0.9s	1M	$0.34	$1.60
183	DeepSeek V3.2 Exp Thinking	1089	±7	5.9K	3.5%	7.2%	26 tps	3.0s	131K	$0.28	$0.42
184	DeepSeek V3.1	1089	±12	2.3K	4.7%	0.8%	197 tps	0.4s	164K	$0.55	$1.60
185	OpenAI o3-pro	1090	±8	5.4K	4.3%	5.2%	22 tps	70.8s	200K	$20.00	$80.00
186	Qwen3 235B A22B	1093	±6	4.5K	8.0%	5.3%	71 tps	0.9s	41K	$0.23	$0.63
187	DeepSeek V3 0324 Turbo	1093	±5	15.5K	5.7%	6.3%	12 tps	2.4s	164K	$0.73	$1.79
188	Grok 3	1098	±4	19.1K	5.5%	1.5%	53 tps	0.6s	1M	$3.67	$18.33
189	Gemini 2.5 Flash	1098	±4	35.9K	3.2%	1.3%	2 tps	3.7s	1M	$0.30	$2.50
190	Qwen3 Coder 480B A35B Instruct	1099	±8	3.1K	4.5%	3.3%	61 tps	2.0s	262K	$0.71	$1.34
191	DeepSeek V3 0324	1100	±4	15.1K	4.3%	5.8%	12 tps	2.7s	164K	$0.38	$0.93
192	Step 3.5 Flash	1102	±24	810	3.6%	2.2%	109 tps	0.6s	256K	$0.05	$0.15
193	GPT-4o	1102	±5	8.5K	3.7%	1.0%	49 tps	2.4s	128K	$3.71	$12.57
194	Grok 3 Fast	1102	±14	2.5K	4.7%	1.7%	52 tps	2.4s	131K	$5.00	$25.00
195	Gemini 2.5 Flash Lite	1103	±5	21.3K	6.2%	1.3%	210 tps	0.7s	1M	$0.10	$0.40
196	Qwen Max	1107	±4	18.3K	4.2%	1.5%	49 tps	1.5s	33K	$1.60	$6.40
197	DeepSeek V3.2 Exp Chat	1107	±4	5.5K	6.1%	2.6%	29 tps	1.5s	131K	$0.27	$0.39
198	Qwen3 Omni 30B A3B Thinking	1110	±10	2.3K	6.0%	3.7%	67 tps	1.2s	66K	$0.97	$1.79
199	DeepSeek V3.1 Chat	1110	±7	4.9K	6.6%	2.8%	21 tps	1.6s	131K	$0.38	$1.00
200	GPT-5.2 Codex (Low)	1113	±19	1.2K	3.2%	4.5%	41 tps	5.0s	400K	$1.75	$14.00

5of8

View All (286 models)