Leaderboard | Coding

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1046

Arcee AI Maestro Reasoning

1049

ERNIE 4.5 300B A47B

1051

GLM 4.5 X

1052

Qwen3 32B Fast

1054

GPT-5.1 Codex Mini (High)

1057

GPT-5.1 Codex Mini (Medium)

1057

OpenAI Codex Mini

1058

LongCat Flash Chat

1058

Seed 2.0 Lite (Medium)

1059

Llama 3 8B Turbo

1060

GLM 4.5 FP8

1061

Gemini 2.5 Flash Lite Thinking

1061

OpenAI o1-pro

1061

DeepSeek V3.1 Terminus Thinking

1062

OpenAI o1

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
241	164	Arcee AI Maestro Reasoning	1046	±7	3.8K	4.6%	<0.1%	85 tps	0.3s	131K	$0.90	$3.30
242	128	ERNIE 4.5 300B A47B	1049	±4	13.5K	3.9%	4.7%	23 tps	2.3s	123K	$0.28	$1.10
243	151	GLM 4.5 X	1051	±16	645	5.8%	<0.1%	48 tps	2.8s	131K	$2.20	$8.90
244	119	Qwen3 32B Fast	1052	±6	11.4K	5.2%	11.6%	30 tps	3.1s	41K	$0.10	$0.25
245	119	GPT-5.1 Codex Mini (High)	1054	±15	2.2K	3.9%	5.9%	70 tps	4.6s	400K	$0.25	$2.00
246	119	GPT-5.1 Codex Mini (Medium)	1057	±15	1.9K	4.9%	4.6%	69 tps	4.1s	400K	$0.25	$2.00
247	151	OpenAI Codex Mini	1057	±5	9.8K	3.3%	<0.1%	46 tps	2.1s	200K	$1.50	$6.00
248	119	LongCat Flash Chat	1058	±12	2.7K	5.9%	0.8%	85 tps	0.9s	131K	$0.14	$0.68
249	119	Seed 2.0 Lite (Medium)	1058	±20	525	3.7%	6.6%	33 tps	1.6s	256K	$0.25	$2.00
250	151	Llama 3 8B Turbo	1059	±24	600	1.6%	<0.1%	97 tps	0.1s	8K	$0.12	$0.13
251	151	GLM 4.5 FP8	1060	±18	610	8.3%	<0.1%	59 tps	1.2s	131K	$0.41	$1.65
252	119	Gemini 2.5 Flash Lite Thinking	1061	±4	9.8K	6.2%	1.0%	118 tps	4.4s	1M	$0.03	$0.13
253	119	OpenAI o1-pro	1061	±20	680	7.5%	5.2%	33 tps	72.8s	200K	$150.00	$600.00
254	119	DeepSeek V3.1 Terminus Thinking	1061	±9	2.9K	9.4%	5.9%	27 tps	1.8s	131K	$0.56	$1.68
255	119	OpenAI o1	1062	±6	9.9K	3.3%	4.2%	92 tps	5.5s	200K	$15.00	$60.00
256	112	Grok 4.20 Beta Non-reasoning	1063	±36	500	4.8%	1.1%	151 tps	0.6s	2M	$2.00	$6.00
257	144	Qwen Turbo	1064	±5	10K	6.0%	<0.1%	53 tps	1.1s	1M	$0.05	$0.20
258	112	gpt-oss-20b	1066	±6	7.7K	7.1%	0.5%	216 tps	0.5s	131K	$0.06	$0.26
259	112	Kimi K2 0905 Turbo	1070	±6	7.5K	9.1%	0.7%	373 tps	0.5s	262K	$1.70	$6.50
260	112	GPT-5 (Low)	1070	±14	690	3.5%	1.8%	75 tps	8.2s	400K	$1.25	$10.00
261	112	Kimi K2 Fast	1073	±5	35K	6.4%	0.8%	365 tps	0.5s	131K	$1.00	$3.00
262	112	Kimi K2 0905	1074	±7	8.7K	4.3%	4.0%	30 tps	1.4s	262K	$0.63	$2.39
263	112	GLM 4.5	1075	±5	6K	7.0%	3.7%	46 tps	1.4s	131K	$0.43	$1.63
264	105	Qwen3 Max Thinking	1080	±18	1.5K	2.0%	13.5%	32 tps	2.3s	256K	$1.20	$6.00
265	105	Mistral Medium	1080	±4	9.6K	5.6%	1.8%	48 tps	0.6s	33K	$1.48	$4.55
266	105	Seed 1.8 251228	1081	±10	3.2K	3.1%	3.7%	41 tps	2.1s	256K	$0.25	$2.00
267	132	Solar Pro 2 250710	1081	±5	10.6K	6.9%	<0.1%	9 tps	N/A	66K	$0.50	$0.50
268	105	DeepSeek V3 (Turbo)	1082	±20	1.5K	5.1%	1.5%	32 tps	1.5s	64K	$0.40	$1.30
269	105	Qwen3 Omni 30B A3B Instruct	1085	±13	775	4.3%	3.9%	65 tps	1.2s	66K	$0.35	$0.97
270	105	GPT-4.1 nano	1085	±5	17K	5.0%	0.6%	175 tps	0.5s	1M	$0.10	$0.40
271	105	GPT-4.1 mini	1087	±5	19.7K	4.2%	1.1%	67 tps	0.9s	1M	$0.34	$1.60
272	132	Qwen Plus 0728 (Thinking)	1087	±9	1.2K	8.9%	<0.1%	56 tps	1.1s	1M	$0.40	$4.00
273	132	Claude Sonnet 3.5	1088	±10	2.9K	4.9%	1.0%	40 tps	2.7s	200K	$3.00	$15.00
274	98	DeepSeek V3.2 Exp Thinking	1089	±7	5.9K	3.5%	7.2%	26 tps	3.0s	131K	$0.28	$0.42
275	98	DeepSeek V3.1	1089	±12	2.3K	4.7%	0.8%	197 tps	0.4s	164K	$0.55	$1.60
276	98	OpenAI o3-pro	1090	±8	5.4K	4.3%	5.2%	22 tps	70.8s	200K	$20.00	$80.00
277	123	Sherlock Dash Alpha	1090	±19	835	6.7%	<0.1%	68 tps	0.7s	2M	$0	$0
278	123	Nova Experimental Chat 10-09	1091	±7	3.2K	10.7%	<0.1%	59 tps	6.1s	98K	$0	$0
279	98	Qwen3 235B A22B	1093	±6	4.5K	8.0%	5.3%	71 tps	0.9s	41K	$0.23	$0.63
280	98	DeepSeek V3 0324 Turbo	1093	±5	15.5K	5.7%	6.3%	12 tps	2.4s	164K	$0.73	$1.79

7of11

View All (404 models)