Leaderboard | Coding

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

759

Hermes 4 405B Reasoning FP8

754

Goliath 120B

742

Gemma 3 4B

738

Mixtral 8x22B Instruct

738

Command R+

734

Command

722

Pixtral 12B

709

Mythalion 13B

702

Hermes 3 405B Instruct

697

Phi 4 Multimodal Instruct

633

DeepSeek-R1 Distill Qwen 7B

601

Llema 7B

600

MythoMax L2 13B

599

Phi 4 Mini Instruct

573

Phi 4 Reasoning

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
81	262	Hermes 4 405B Reasoning FP8	759	±11	2.7K	12.8%	3.6%	32 tps	0.8s	131K	$1.00	$3.00
82	262	Goliath 120B	754	±24	745	5.7%	2.7%	21 tps	2.2s	6K	$6.56	$9.38
83	269	Gemma 3 4B	742	±10	3.3K	4.7%	1.3%	138 tps	0.7s	131K	$0.02	$0.04
84	269	Mixtral 8x22B Instruct	738	±17	1.4K	5.6%	1.8%	142 tps	0.7s	66K	$0.45	$0.45
85	269	Command R+	738	±15	1.6K	5.6%	2.8%	36 tps	0.7s	128K	$2.08	$9.45
86	361	Command	734	±18	765	4.4%	<0.1%	25 tps	N/A	4K	$0.83	$1.33
87	269	Pixtral 12B	722	±21	3K	6.3%	2.2%	101 tps	1.2s	131K	$0.08	$0.08
88	374	Mythalion 13B	709	±10	1.1K	1.3%	<0.1%	63 tps	0.5s	4K	$0.56	$1.13
89	276	Hermes 3 405B Instruct	702	±20	1.4K	4.1%	2.3%	20 tps	1.1s	131K	$0.80	$0.80
90	374	Phi 4 Multimodal Instruct	697	±16	2.1K	6.8%	<0.1%	17 tps	1.4s	128K	$0.03	$0.05
91	386	DeepSeek-R1 Distill Qwen 7B	633	±19	565	5.0%	<0.1%	0 tps	N/A	131K	$0.05	$0.10
92	390	Llema 7B	601	±21	850	4.5%	<0.1%	1 tps	15.0s	4K	$0.80	$1.20
93	279	MythoMax L2 13B	600	±21	2.3K	5.8%	1.2%	22 tps	1.1s	4K	$0.18	$0.18
94	279	Phi 4 Mini Instruct	599	±21	1K	7.1%	7.4%	40 tps	1.1s	128K	$0.07	$0.30
95	279	Phi 4 Reasoning	573	±17	2.1K	5.5%	21.0%	29 tps	1.0s	33K	$0.06	$0.25
96	284	Qwen 2.5 VL 3B Instruct	523	±25	4.1K	6.1%	3.0%	44 tps	2.5s	128K	$0.21	$0.63
97	399	DeepSeek-R1 Distill Qwen 1.5B	481	±19	730	5.2%	<0.1%	20 tps	0.0s	131K	$0.18	$0.18
98	284	CodeLlama 7B Instruct Solidity	463	±54	485	8.5%	3.6%	33 tps	0.7s	16K	$0.80	$1.20
99	286	Phi 4 Mini Reasoning	447	±15	3.4K	12.0%	9.7%	30 tps	0.9s	128K	$0.07	$0.30

3of3

View All (99 models)