Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Topics

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1158

Qwen3 30B A3B Instruct 2507

1151

Qwen3.5 122B A17B

1151

Kimi K2.5

1150

DeepSeek V3.1 Terminus Chat

1148

Qwen3 Max Instruct Preview

1147

Qwen3 235B A22B Instruct 2507

1144

DeepSeek V3.2

1144

gpt-oss-120b

1141

MiniMax M2.5 FP8

1140

DeepSeek V3.1 Turbo

1139

Claude Sonnet 4.5

1138

Claude Sonnet 4 (Thinking)

1137

Kimi K2 Thinking Turbo

1136

Gemini 2.5 Pro

1133

GPT-5.2 (Extra High)

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
41	33	Qwen3 30B A3B Instruct 2507	1158	±2	31.6K	4.1%	1.2%	55 tps	1.3s	131K	$0.13	$0.72
42	52	Qwen3.5 122B A17B	1151	±4	4.7K	1.6%	1.5%	82 tps	1.4s	256K	$0.40	$3.20
43	33	Kimi K2.5	1151	±3	32.5K	1.8%	6.5%	33 tps	1.7s	262K	$0.34	$2.57
44	44	DeepSeek V3.1 Terminus Chat	1150	±3	17.8K	4.2%	3.4%	27 tps	1.5s	131K	$0.86	$1.80
45	42	Qwen3 Max Instruct Preview	1148	±2	36.6K	3.5%	1.1%	31 tps	1.7s	256K	$1.43	$6.61
46	40	Qwen3 235B A22B Instruct 2507	1147	±2	32.2K	4.7%	6.8%	13 tps	1.9s	262K	$0.13	$0.52
47	40	DeepSeek V3.2	1144	±3	20.7K	1.9%	1.4%	83 tps	5.1s	131K	$0.43	$1.09
48	48	gpt-oss-120b	1144	±2	40.7K	3.7%	0.7%	213 tps	0.5s	131K	$0.11	$0.50
49	71	MiniMax M2.5 FP8	1141	±4	2.9K	1.7%	3.6%	33 tps	1.7s	205K	$0.45	$1.75
50	56	DeepSeek V3.1 Turbo	1140	±2	14.5K	2.3%	0.9%	173 tps	1.3s	164K	$2.00	$3.75
51	37	Claude Sonnet 4.5	1139	±2	37.7K	4.3%	1.4%	41 tps	1.3s	200K	$1.80	$9.00
52	48	Claude Sonnet 4 (Thinking)	1138	±2	30.7K	2.6%	1.5%	52 tps	1.5s	200K	$3.00	$13.67
53	44	Kimi K2 Thinking Turbo	1137	±3	29.8K	2.5%	2.0%	75 tps	1.4s	262K	$1.15	$8.00
54	44	Gemini 2.5 Pro	1136	±1	68.8K	3.9%	2.3%	45 tps	2.6s	1M	$1.25	$10.00
55	42	GPT-5.2 (Extra High)	1133	±3	20.9K	1.9%	13.2%	17 tps	20.5s	400K	$1.75	$14.00
56	60	MiniMax M2.1	1129	±2	41.8K	2.0%	2.1%	66 tps	2.6s	205K	$0.30	$1.20
57	48	Grok 4 Fast Reasoning	1128	±2	25.9K	3.9%	2.1%	102 tps	3.1s	2M	$0.30	$0.75
58	52	Claude Haiku 4.5	1128	±2	31.4K	3.7%	1.1%	100 tps	0.9s	200K	$1.00	$5.00
59	44	Grok 4.1 Fast Reasoning	1128	±2	57K	3.1%	1.5%	58 tps	7.3s	2M	$0.20	$0.50
60	52	Grok 4 Fast Non-Reasoning	1128	±3	21.3K	4.7%	1.5%	93 tps	0.6s	2M	$0.27	$0.67
61	56	DeepSeek V3.2 Thinking	1127	±3	37.6K	2.6%	9.0%	30 tps	2.6s	131K	$0.28	$0.42
62	86	Nemotron 3 Nano (Thinking)	1127	±3	7.5K	2.4%	2.0%	200 tps	0.5s	256K	$0	$0
63	62	MiniMax M2	1125	±2	33.6K	3.5%	2.2%	39 tps	2.3s	205K	$0.21	$0.85
64	65	DeepSeek V3.2 Exp Chat	1124	±3	14.3K	4.0%	2.6%	29 tps	1.5s	131K	$0.27	$0.39
65	71	DeepSeek V3.1	1124	±3	6.8K	2.0%	0.8%	197 tps	0.4s	164K	$0.55	$1.60
66	65	Mistral Large 3	1122	±3	14.3K	3.3%	2.1%	51 tps	1.0s	256K	$0.50	$1.50
67	79	MiniMax M2.5 Lightning	1121	±4	5.6K	1.3%	1.5%	51 tps	2.0s	205K	$0.60	$2.40
68	52	GPT-5	1119	±2	44.3K	3.9%	3.1%	78 tps	23.1s	400K	$1.25	$9.67
69	95	Qwen3 32B	1117	±6	3.3K	2.8%	3.9%	30 tps	3.1s	41K	$0.12	$0.42
70	86	DeepSeek V3.1 Nex N1	1112	±6	2.1K	1.7%	3.4%	24 tps	7.2s	131K	$0.14	$0.50
71	65	GLM 4.6	1108	±3	25.8K	4.3%	5.4%	39 tps	1.5s	200K	$0.42	$1.66
72	84	MiniMax M2.5	1105	±8	2.1K	1.6%	1.4%	70 tps	1.9s	205K	$0.28	$1.20
73	71	Seed 1.8 251228	1104	±3	19K	1.5%	3.7%	41 tps	2.1s	256K	$0.25	$2.00
74	86	DeepSeek V3.1 Chat	1102	±3	13.4K	4.1%	2.8%	21 tps	1.6s	131K	$0.38	$1.00
75	60	Gemini 2.5 Flash Preview 0925	1102	±2	19.5K	4.3%	1.2%	5 tps	0.9s	1M	$0.13	$0.97
76	68	GLM 4.7	1101	±3	35.7K	2.1%	5.8%	40 tps	1.5s	200K	$0.77	$1.73
77	68	Grok 4	1100	±1	120.3K	2.1%	3.9%	29 tps	11.1s	256K	$3.00	$15.00
78	101	DeepSeek V3 (Turbo)	1100	±3	4.8K	2.5%	1.5%	32 tps	1.5s	64K	$0.40	$1.30
79	86	Amazon Nova 2 Lite	1099	±3	12.6K	3.1%	1.0%	137 tps	0.6s	300K	$0.35	$2.95
80	68	Qwen Plus (Aug'24)	1098	±2	60.9K	2.4%	1.4%	53 tps	1.3s	30K	$0.40	$1.20

2of8

View All (288 models)