Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Topics

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

807

DeepSeek-R1 Distill Qwen 32B

802

DeepSeek-R1 Distill Qwen 14B

798

Shisa V2 Llama 3.3 70B

798

C4AI Aya Expanse 8B

794

Goliath 120B

791

MiniMax M2-her

784

Phi 4 Reasoning Plus

783

Gemini 1.5 Flash 8B

780

Magistral Medium (Thinking)

779

Moonshot V1 128k Vision

778

Phi 4 Mini Instruct

766

Llema 7B

764

ArliAI QwQ 32B Arliai RpR V1

755

Pixtral 12B

741

ERNIE 4.5 0.3B

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
401	274	DeepSeek-R1 Distill Qwen 32B	807	±6	4K	3.3%	6.2%	22 tps	1.8s	131K	$0.37	$0.39
402	406	DeepSeek-R1 Distill Qwen 14B	802	±5	3.6K	3.7%	<0.1%	44 tps	1.7s	64K	$0.63	$0.63
403	412	Shisa V2 Llama 3.3 70B	798	±9	1.4K	6.5%	<0.1%	8 tps	2.0s	33K	$0.03	$0.09
404	274	C4AI Aya Expanse 8B	798	±12	1.2K	6.5%	0.9%	61 tps	0.4s	8K	$0.50	$1.50
405	281	Goliath 120B	794	±5	3.1K	2.5%	2.7%	21 tps	2.2s	6K	$6.56	$9.38
406	274	MiniMax M2-her	791	±11	1.1K	2.2%	<0.1%	108 tps	0.7s	205K	$0.30	$1.20
407	392	Phi 4 Reasoning Plus	784	±13	650	6.5%	<0.1%	32 tps	1.2s	33K	$0.04	$0.17
408	399	Gemini 1.5 Flash 8B	783	±7	1.4K	4.5%	<0.1%	11 tps	0.0s	1M	$0.02	$0.10
409	399	Magistral Medium (Thinking)	780	±6	3.7K	3.7%	<0.1%	67 tps	0.8s	41K	$2.00	$5.00
410	274	Moonshot V1 128k Vision	779	±12	955	5.0%	3.1%	44 tps	3.8s	131K	$2.00	$5.00
411	285	Phi 4 Mini Instruct	778	±5	4.1K	3.2%	7.4%	40 tps	1.1s	128K	$0.07	$0.30
412	421	Llema 7B	766	±3	4.7K	1.4%	<0.1%	1 tps	15.0s	4K	$0.80	$1.20
413	412	ArliAI QwQ 32B Arliai RpR V1	764	±11	1.1K	6.6%	<0.1%	34 tps	1.8s	33K	$0.02	$0.07
414	274	Pixtral 12B	755	±12	4.6K	5.7%	2.2%	101 tps	1.2s	131K	$0.08	$0.08
415	424	ERNIE 4.5 0.3B	741	±13	1.5K	8.5%	<0.1%	85 tps	2.2s	120K	$0	$0
416	285	Hunyuan A13B Instruct	739	±5	4.6K	5.0%	2.3%	67 tps	2.0s	33K	$0.01	$0.01
417	419	Kimi Dev 72B	735	±10	1.1K	3.9%	<0.1%	17 tps	13.5s	131K	$0.12	$0.47
418	424	DeepSeek-R1 Distill Qwen 7B	721	±9	1.1K	3.4%	<0.1%	0 tps	N/A	131K	$0.05	$0.10
419	287	Phi 4 Reasoning	697	±8	4K	3.5%	21.0%	29 tps	1.0s	33K	$0.06	$0.25
420	284	MiniMax M1	688	±4	7.8K	4.0%	<0.1%	31 tps	2.8s	1M	$0.55	$2.20
421	428	DeepSeek-R1 Distill Llama 8B	678	±9	2.1K	3.4%	<0.1%	17 tps	N/A	32K	$0.04	$0.04
422	289	UI-TARS 1.5 7B	667	±18	1.4K	8.7%	4.0%	75 tps	0.9s	128K	$0.10	$0.20
423	430	Phi 3.5 Mini 128k Instruct	646	±13	835	2.9%	<0.1%	14 tps	0.7s	128K	$0.10	$0.10
424	430	OpenHands LM 32B V0.1	639	±10	1.9K	1.0%	<0.1%	11 tps	N/A	16K	$2.60	$3.40
425	291	Phi 4 Mini Reasoning	635	±4	7.9K	7.9%	9.7%	30 tps	0.9s	128K	$0.07	$0.30
426	291	LFM2.5 1.2B Thinking	630	±22	705	4.7%	2.6%	258 tps	0.4s	33K	$0	$0
427	288	Qwen 2.5 VL 3B Instruct	629	±7	5.1K	6.8%	3.0%	44 tps	2.5s	128K	$0.21	$0.63
428	430	DeepSeek-R1 Distill Qwen 1.5B	625	±11	1.5K	3.9%	<0.1%	20 tps	0.0s	131K	$0.18	$0.18
429	434	QwQ 32B RpR v1	612	±10	2.4K	7.0%	<0.1%	34 tps	3.3s	33K	$0.02	$0.07
430	438	ArliAI: QwQ 32B RpR v1	472	±20	485	7.6%	<0.1%	20 tps	2.5s	33K	$0	$0
431	439	Mistral Nemo 12B Inferor v0.0	413	±9	2.5K	1.2%	<0.1%	83 tps	0.8s	16K	$0.80	$1.20
432	439	MiniMax M1 (Extended)	389	±23	495	1.0%	<0.1%	3 tps	N/A	128K	$0	$0

11of11

View All (432 models)