Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Language

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1147

Kimi K2.5

1147

Claude Opus 4.5 (Thinking)

1147

Qwen3 235B A22B Instruct 2507

1147

GPT-5.2 (Extra High)

1145

Kimi K2 Thinking Turbo

1144

DeepSeek V3.2 Thinking

1142

MiniMax M2.7

1142

Grok 4 Fast Reasoning

1141

GPT-5.4 mini

1134

DeepSeek V3.1 Turbo

1133

Mistral Large 3

1129

MiniMax M2.1 Lightning

1129

Qwen3 235B A22B

1125

DeepSeek V3.1

1125

DeepSeek V3.2 Exp Chat

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
41	33	Kimi K2.5	1147	±3	16.1K	1.2%	6.5%	33 tps	1.7s	262K	$0.34	$2.57
42	7	Claude Opus 4.5 (Thinking)	1147	±4	21.9K	1.6%	1.8%	49 tps	1.4s	200K	$5.00	$25.00
43	40	Qwen3 235B A22B Instruct 2507	1147	±2	24.5K	1.4%	6.8%	13 tps	1.9s	262K	$0.13	$0.52
44	42	GPT-5.2 (Extra High)	1147	±2	15.6K	1.4%	13.2%	17 tps	20.5s	400K	$1.75	$14.00
45	44	Kimi K2 Thinking Turbo	1145	±2	10.9K	1.6%	2.0%	75 tps	1.4s	262K	$1.15	$8.00
46	56	DeepSeek V3.2 Thinking	1144	±4	16.9K	1.3%	9.0%	30 tps	2.6s	131K	$0.28	$0.42
47	29	MiniMax M2.7	1142	±13	700	1.4%	3.0%	34 tps	2.5s	205K	$0.30	$1.20
48	48	Grok 4 Fast Reasoning	1142	±3	14.5K	2.0%	2.1%	102 tps	3.1s	2M	$0.30	$0.75
49	17	GPT-5.4 mini	1141	±14	545	1.8%	0.8%	148 tps	0.5s	400K	$0.75	$4.50
50	56	DeepSeek V3.1 Turbo	1134	±3	9.5K	1.2%	0.9%	173 tps	1.3s	164K	$2.00	$3.75
51	65	Mistral Large 3	1133	±4	10.8K	2.6%	2.1%	51 tps	1.0s	256K	$0.50	$1.50
52	56	MiniMax M2.1 Lightning	1129	±5	3.6K	1.4%	1.7%	52 tps	2.1s	205K	$0.30	$2.40
53	86	Qwen3 235B A22B	1129	±3	7.8K	2.1%	5.3%	71 tps	0.9s	41K	$0.23	$0.63
54	71	DeepSeek V3.1	1125	±4	4.4K	1.1%	0.8%	197 tps	0.4s	164K	$0.55	$1.60
55	65	DeepSeek V3.2 Exp Chat	1125	±3	11.5K	1.9%	2.6%	29 tps	1.5s	131K	$0.27	$0.39
56	60	MiniMax M2.1	1124	±3	24.4K	1.0%	2.1%	66 tps	2.6s	205K	$0.30	$1.20
57	86	Nemotron 3 Nano (Thinking)	1123	±3	5.9K	1.5%	2.0%	200 tps	0.5s	256K	$0	$0
58	52	Qwen3.5 122B A17B	1123	±5	2.6K	1.3%	1.5%	82 tps	1.4s	256K	$0.40	$3.20
59	26	Claude Haiku 4.5 (Extended Thinking)	1121	±3	14.1K	1.8%	1.4%	115 tps	0.7s	200K	$1.00	$5.00
60	60	Gemini 2.5 Flash Preview 0925	1118	±3	14.4K	2.2%	1.2%	5 tps	0.9s	1M	$0.13	$0.97
61	52	GPT-5	1117	±2	31.1K	1.7%	3.1%	78 tps	23.1s	400K	$1.25	$9.67
62	81	OpenAI o3-pro	1116	±5	3.2K	2.8%	5.2%	22 tps	70.8s	200K	$20.00	$80.00
63	68	Grok 4	1110	±1	98.8K	0.9%	3.9%	29 tps	11.1s	256K	$3.00	$15.00
64	17	Claude Opus 4.5	1110	±4	12.9K	2.2%	1.5%	45 tps	1.5s	200K	$5.00	$25.00
65	62	MiniMax M2	1110	±3	17.2K	2.5%	2.2%	39 tps	2.3s	205K	$0.21	$0.85
66	71	Qwen3.5 397B A17B	1107	±6	5.1K	1.6%	4.3%	57 tps	1.4s	256K	$0.52	$3.00
67	86	DeepSeek V3.1 Nex N1	1107	±8	1.5K	1.3%	3.4%	24 tps	7.2s	131K	$0.14	$0.50
68	79	Qwen3 Max Thinking Preview	1106	±4	13.3K	2.0%	3.1%	40 tps	2.1s	256K	$1.20	$6.00
69	101	DeepSeek V3 (Turbo)	1105	±5	3.7K	1.5%	1.5%	32 tps	1.5s	64K	$0.40	$1.30
70	56	Gemini 3.1 Flash Lite Preview Thinking	1105	±8	2K	1.7%	1.7%	75 tps	4.7s	1M	$0.25	$1.50
71	68	GLM 4.7	1105	±3	21K	1.0%	5.8%	40 tps	1.5s	200K	$0.77	$1.73
72	95	DeepSeek-R1 Turbo	1104	±5	4.8K	1.5%	2.6%	29 tps	1.8s	64K	$2.85	$4.75
73	86	Amazon Nova 2 Lite	1099	±4	10.5K	2.7%	1.0%	137 tps	0.6s	300K	$0.35	$2.95
74	68	Qwen Plus (Aug'24)	1098	±2	50.5K	1.1%	1.4%	53 tps	1.3s	30K	$0.40	$1.20
75	101	GPT-5 (Low)	1097	±7	1.5K	1.0%	1.8%	75 tps	8.2s	400K	$1.25	$10.00
76	62	GPT-5.1 Instant	1096	±3	14.9K	1.5%	1.3%	50 tps	1.9s	400K	$1.25	$10.00
77	84	GPT-5 Mini Minimal	1094	±3	4.9K	3.0%	1.2%	63 tps	1.4s	400K	$0.25	$2.00
78	95	Kimi K2 Thinking	1092	±4	5.4K	2.0%	4.2%	61 tps	5.9s	262K	$0.24	$1.03
79	37	Claude Sonnet 4.5	1092	±2	25.2K	2.2%	1.4%	41 tps	1.3s	200K	$1.80	$9.00
80	71	MiniMax M2.5 FP8	1092	±10	2.1K	1.6%	3.6%	33 tps	1.7s	205K	$0.45	$1.75

2of8

View All (283 models)