Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Language

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1211

Qwen3.5 27B

1210

Kimi K2.5 Instant

1205

Claude Sonnet 4

1205

Gemini 3 Flash Preview

1204

Gemini 2.5 Pro High

1203

Qwen3 Max Instruct Preview

1200

GPT-5.1 Codex Max

1197

MiniMax M2.1 Lightning

1194

Qwen3 30B A3B Instruct 2507

1192

Kimi K2 Thinking Turbo

1192

MiniMax M2.1

1189

DeepSeek V3.2

1185

MiniMax M2.5 FP8

1185

GPT-5

1185

Grok 4 Fast Non-Reasoning

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
41	36	Qwen3.5 27B	1211	±16	910	4.7%	3.7%	55 tps	2.6s	256K	$0.30	$2.40
42	36	Kimi K2.5 Instant	1210	±8	1.8K	3.2%	2.9%	32 tps	3.0s	262K	$0.50	$3.00
43	43	Claude Sonnet 4	1205	±3	43.2K	3.7%	1.8%	49 tps	1.3s	200K	$3.00	$15.00
44	43	Gemini 3 Flash Preview	1205	±11	7.2K	3.7%	1.3%	138 tps	1.4s	1M	$0.50	$3.00
45	43	Gemini 2.5 Pro High	1204	±3	21.1K	5.7%	1.5%	48 tps	2.3s	1M	$1.25	$10.00
46	43	Qwen3 Max Instruct Preview	1203	±6	16.1K	4.6%	1.1%	31 tps	1.7s	256K	$1.43	$6.61
47	43	GPT-5.1 Codex Max	1200	±12	6.4K	3.9%	3.0%	118 tps	4.1s	400K	$1.25	$10.00
48	43	MiniMax M2.1 Lightning	1197	±23	875	3.3%	1.7%	52 tps	2.1s	205K	$0.30	$2.40
49	49	Qwen3 30B A3B Instruct 2507	1194	±5	12.7K	5.7%	1.2%	55 tps	1.3s	131K	$0.13	$0.72
50	49	Kimi K2 Thinking Turbo	1192	±6	20.3K	3.4%	2.0%	75 tps	1.4s	262K	$1.15	$8.00
51	49	MiniMax M2.1	1192	±8	19.4K	3.6%	2.1%	66 tps	2.6s	205K	$0.30	$1.20
52	49	DeepSeek V3.2	1189	±8	5.1K	4.7%	1.4%	83 tps	5.1s	131K	$0.43	$1.09
53	49	MiniMax M2.5 FP8	1185	±17	610	3.2%	3.6%	33 tps	1.7s	205K	$0.45	$1.75
54	49	GPT-5	1185	±4	21.3K	5.3%	3.1%	78 tps	23.1s	400K	$1.25	$9.67
55	49	Grok 4 Fast Non-Reasoning	1185	±5	8.1K	7.1%	1.5%	93 tps	0.6s	2M	$0.27	$0.67
56	49	MiniMax M2	1183	±5	19.7K	4.2%	2.2%	39 tps	2.3s	205K	$0.21	$0.85
57	49	Nova Experimental Chat 12-10	1182	±9	2.9K	3.8%	2.4%	84 tps	12.9s	98K	$0	$0
58	49	GLM 4.6	1182	±7	17.2K	4.4%	5.4%	39 tps	1.5s	200K	$0.42	$1.66
59	49	GPT-5.3 Codex (Low)	1178	±28	510	1.0%	1.8%	61 tps	4.3s	400K	$1.75	$14.00
60	60	Grok 4.1 Fast Reasoning	1178	±7	39.5K	4.4%	1.5%	58 tps	7.3s	2M	$0.20	$0.50
61	60	DeepSeek V3.2 Thinking	1178	±9	23.3K	4.0%	9.0%	30 tps	2.6s	131K	$0.28	$0.42
62	60	Grok 4 Fast Reasoning	1177	±3	14.5K	5.0%	2.1%	102 tps	3.1s	2M	$0.30	$0.75
63	60	Gemini 2.5 Pro	1176	±3	37.9K	4.8%	2.3%	45 tps	2.6s	1M	$1.25	$10.00
64	60	Qwen3 235B A22B Instruct 2507	1172	±4	12.6K	6.4%	6.8%	13 tps	1.9s	262K	$0.13	$0.52
65	60	Claude Sonnet 3.5 v2	1171	±6	5.5K	3.4%	<0.1%	46 tps	1.4s	200K	$3.00	$15.00
66	60	GPT-5.1 Codex (Medium)	1171	±14	3K	3.2%	4.6%	71 tps	3.7s	400K	$1.25	$10.00
67	60	GPT-5.1 Instant	1171	±8	8.3K	4.1%	1.3%	50 tps	1.9s	400K	$1.25	$10.00
68	60	Grok 4.20 Beta Reasoning	1167	±22	1.2K	4.1%	1.1%	77 tps	4.5s	2M	$2.00	$5.50
69	69	gpt-oss-120b	1165	±5	19.2K	5.0%	0.7%	213 tps	0.5s	131K	$0.11	$0.50
70	69	Qwen3.5 35B A3B	1164	±25	865	3.9%	2.1%	116 tps	2.1s	256K	$0.63	$1.13
71	69	GPT-5 Codex (Low)	1163	±10	5K	4.1%	2.7%	112 tps	3.5s	400K	$1.25	$10.00
72	69	GLM 4.7	1161	±7	16.8K	3.7%	5.8%	40 tps	1.5s	200K	$0.77	$1.73
73	69	DeepSeek V3.1 Terminus Chat	1158	±5	6.5K	6.9%	3.4%	27 tps	1.5s	131K	$0.86	$1.80
74	74	Qwen Plus (Aug'24)	1146	±5	17.2K	4.7%	1.4%	53 tps	1.3s	30K	$0.40	$1.20
75	74	Qwen3.5 397B A17B	1142	±14	2.5K	2.9%	4.3%	57 tps	1.4s	256K	$0.52	$3.00
76	74	Gemini 2.5 Flash Preview 0925	1140	±6	7.6K	6.0%	1.2%	5 tps	0.9s	1M	$0.13	$0.97
77	77	Mistral Large 3	1131	±8	5.4K	5.8%	2.1%	51 tps	1.0s	256K	$0.50	$1.50
78	77	GPT-5 Mini	1131	±5	8.6K	5.4%	2.6%	66 tps	14.2s	400K	$0.25	$2.00
79	77	DeepSeek V3.1 Turbo	1130	±7	4.8K	5.3%	0.9%	173 tps	1.3s	164K	$2.00	$3.75
80	77	Grok 4.20 Multi Agent Beta	1129	±19	945	3.6%	1.2%	56 tps	8.8s	2M	$2.00	$6.00

2of8

View All (286 models)