Leaderboard | Coding

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

447

Phi 4 Mini Reasoning

463

CodeLlama 7B Instruct Solidity

523

Qwen 2.5 VL 3B Instruct

573

Phi 4 Reasoning

588

Hunyuan A13B Instruct

599

Phi 4 Mini Instruct

600

MythoMax L2 13B

610

UI-TARS 1.5 7B

686

MiniMax M1

696

DeepSeek-R1 Distill Qwen 32B

702

Hermes 3 405B Instruct

706

DeepHermes 3 Mistral 24B Preview

719

Inflection 3 Pi

722

Pixtral 12B

737

Inflection 3 Productivity

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
1	286	Phi 4 Mini Reasoning	447	±15	3.4K	12.0%	9.7%	30 tps	0.9s	128K	$0.07	$0.30
2	284	CodeLlama 7B Instruct Solidity	463	±54	485	8.5%	3.6%	33 tps	0.7s	16K	$0.80	$1.20
3	284	Qwen 2.5 VL 3B Instruct	523	±25	4.1K	6.1%	3.0%	44 tps	2.5s	128K	$0.21	$0.63
4	279	Phi 4 Reasoning	573	±17	2.1K	5.5%	21.0%	29 tps	1.0s	33K	$0.06	$0.25
5	279	Hunyuan A13B Instruct	588	±22	1.6K	9.2%	2.3%	67 tps	2.0s	33K	$0.01	$0.01
6	279	Phi 4 Mini Instruct	599	±21	1K	7.1%	7.4%	40 tps	1.1s	128K	$0.07	$0.30
7	279	MythoMax L2 13B	600	±21	2.3K	5.8%	1.2%	22 tps	1.1s	4K	$0.18	$0.18
8	279	UI-TARS 1.5 7B	610	±40	530	11.7%	4.0%	75 tps	0.9s	128K	$0.10	$0.20
9	276	MiniMax M1	686	±13	3.8K	5.3%	<0.1%	31 tps	2.8s	1M	$0.55	$2.20
10	276	DeepSeek-R1 Distill Qwen 32B	696	±20	2K	5.5%	6.2%	22 tps	1.8s	131K	$0.37	$0.39
11	276	Hermes 3 405B Instruct	702	±20	1.4K	4.1%	2.3%	20 tps	1.1s	131K	$0.80	$0.80
12	269	DeepHermes 3 Mistral 24B Preview	706	±30	715	5.9%	2.5%	50 tps	1.0s	33K	$0.06	$0.25
13	269	Inflection 3 Pi	719	±18	1.5K	4.1%	1.1%	33 tps	3.4s	8K	$2.50	$10.00
14	269	Pixtral 12B	722	±21	3K	6.3%	2.2%	101 tps	1.2s	131K	$0.08	$0.08
15	269	Inflection 3 Productivity	737	±24	1.5K	5.0%	0.6%	50 tps	3.2s	8K	$2.50	$10.00
16	269	Command R+	738	±15	1.6K	5.6%	2.8%	36 tps	0.7s	128K	$2.08	$9.45
17	269	Mixtral 8x22B Instruct	738	±17	1.4K	5.6%	1.8%	142 tps	0.7s	66K	$0.45	$0.45
18	269	Gemma 3 4B	742	±10	3.3K	4.7%	1.3%	138 tps	0.7s	131K	$0.02	$0.04
19	262	Qwen 2.5 VL 72B Instruct	746	±20	2.1K	6.0%	5.3%	25 tps	3.7s	128K	$1.01	$2.79
20	262	Goliath 120B	754	±24	745	5.7%	2.7%	21 tps	2.2s	6K	$6.56	$9.38
21	262	Hermes 4 405B Reasoning FP8	759	±11	2.7K	12.8%	3.6%	32 tps	0.8s	131K	$1.00	$3.00
22	262	Open Mistral 7B	762	±18	1.3K	4.7%	0.7%	176 tps	0.4s	33K	$0.25	$0.25
23	262	Mistral Small	770	±12	1.2K	4.5%	1.7%	142 tps	0.6s	32K	$0.43	$1.30
24	262	Baichuan-M2-32B	770	±30	740	10.8%	<0.1%	32 tps	3.3s	131K	$0.07	$0.07
25	262	Command R	778	±18	2.2K	4.9%	5.8%	54 tps	0.6s	128K	$0.30	$0.99
26	252	Hermes 4 70B	781	±29	460	8.9%	1.1%	67 tps	0.6s	131K	$0.12	$0.39
27	252	Mistral Large	785	±16	1.1K	5.8%	1.5%	54 tps	0.7s	33K	$2.00	$6.00
28	252	GPT-3.5 Turbo Instruct	787	±9	2K	2.7%	<0.1%	46 tps	1.2s	4K	$1.50	$2.00
29	252	Mercury Coder	793	±27	510	3.8%	<0.1%	247 tps	2.2s	32K	$0.25	$1.00
30	252	Hermes 4 405B FP8	797	±21	815	8.4%	3.5%	31 tps	0.9s	131K	$0.52	$1.73
31	252	Phi 4	798	±16	1.7K	3.4%	5.1%	28 tps	1.3s	128K	$0.10	$0.32
32	252	WizardLM-2 8x22B	801	±12	1.9K	3.1%	11.6%	11 tps	2.5s	66K	$0.77	$0.77
33	252	Gemma 3 1B	802	±11	2K	6.1%	0.6%	176 tps	1.0s	33K	$0.06	$0.10
34	252	Magistral Small 2509	802	±18	1.8K	7.5%	2.7%	116 tps	0.6s	131K	$0.50	$1.50
35	252	Ministral 3B	806	±16	2.3K	5.1%	0.8%	248 tps	0.4s	131K	$0.08	$0.08
36	240	Gemma 2 27B	815	±17	1.5K	4.1%	1.4%	44 tps	1.4s	8K	$0.80	$0.80
37	240	LFM2 8B A1B	818	±18	825	11.3%	<0.1%	142 tps	0.3s	33K	$0.01	$0.02
38	240	Moonshot V1 32k	820	±17	950	3.1%	1.4%	53 tps	1.4s	33K	$1.00	$3.00
39	240	C4AI Aya Expanse 32B	821	±7	3.8K	4.0%	1.5%	43 tps	0.5s	128K	$0.50	$1.50
40	240	Ministral 8B	825	±17	2.2K	5.5%	1.4%	177 tps	0.4s	128K	$0.14	$0.14

1of8

View All (286 models)