Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Language

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1087

Mistral Medium

1090

DeepSeek V3 0324

1096

Seed 1.8 251228

1097

Switchpoint Router

1098

Gemini 2.5 Flash

1100

DeepSeek V3.1 Terminus Chat

1102

Claude Sonnet 4

1102

Grok 3

1104

Gemini 2.5 Flash Lite Thinking Preview 0925

1106

DeepSeek V3.1 Turbo

1108

Qwen3 30B A3B Instruct 2507

1108

Mistral Large 3

1109

Claude Sonnet 3.5 v2

1110

Grok 4 Fast Non-Reasoning

1110

Gemini 2.5 Flash Preview 0925

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
161	113	Mistral Medium	1087	±4	5.3K	9.0%	1.8%	48 tps	0.6s	33K	$1.48	$4.55
162	106	DeepSeek V3 0324	1090	±5	9.7K	8.2%	5.8%	12 tps	2.7s	164K	$0.38	$0.93
163	71	Seed 1.8 251228	1096	±6	4.1K	3.4%	3.7%	41 tps	2.1s	256K	$0.25	$2.00
164	179	Switchpoint Router	1097	±11	1.1K	9.5%	1.7%	71 tps	4.9s	131K	$0.85	$3.40
165	95	Gemini 2.5 Flash	1098	±5	21.4K	5.2%	1.3%	2 tps	3.7s	1M	$0.30	$2.50
166	44	DeepSeek V3.1 Terminus Chat	1100	±4	5.1K	9.6%	3.4%	27 tps	1.5s	131K	$0.86	$1.80
167	86	Claude Sonnet 4	1102	±4	18.3K	7.0%	1.8%	49 tps	1.3s	200K	$3.00	$15.00
168	106	Grok 3	1102	±5	9.3K	9.3%	1.5%	53 tps	0.6s	1M	$3.67	$18.33
169	95	Gemini 2.5 Flash Lite Thinking Preview 0925	1104	±5	4.9K	7.8%	1.5%	152 tps	3.0s	1M	$0.10	$0.40
170	56	DeepSeek V3.1 Turbo	1106	±9	2.6K	5.1%	0.9%	173 tps	1.3s	164K	$2.00	$3.75
171	33	Qwen3 30B A3B Instruct 2507	1108	±5	8.5K	9.7%	1.2%	55 tps	1.3s	131K	$0.13	$0.72
172	65	Mistral Large 3	1108	±7	4K	6.3%	2.1%	51 tps	1.0s	256K	$0.50	$1.50
173	106	Claude Sonnet 3.5 v2	1109	±7	2.9K	8.2%	<0.1%	46 tps	1.4s	200K	$3.00	$15.00
174	52	Grok 4 Fast Non-Reasoning	1110	±5	7.1K	8.3%	1.5%	93 tps	0.6s	2M	$0.27	$0.67
175	60	Gemini 2.5 Flash Preview 0925	1110	±4	6.7K	7.5%	1.2%	5 tps	0.9s	1M	$0.13	$0.97
176	68	GLM 4.7	1112	±8	8.8K	4.7%	5.8%	40 tps	1.5s	200K	$0.77	$1.73
177	71	Gemini 3.1 Flash Lite Preview	1114	±27	630	2.3%	1.0%	8 tps	1.2s	1M	$0.25	$1.50
178	29	Nova Experimental Chat 12-10	1115	±8	2.2K	4.8%	2.4%	84 tps	12.9s	98K	$0	$0
179	68	Qwen Plus (Aug'24)	1116	±5	8.9K	9.4%	1.4%	53 tps	1.3s	30K	$0.40	$1.20
180	56	DeepSeek V3.2 Thinking	1117	±6	10K	3.8%	9.0%	30 tps	2.6s	131K	$0.28	$0.42
181	81	GPT-4o	1124	±5	6.5K	6.1%	1.0%	49 tps	2.4s	128K	$3.71	$12.57
182	37	Kimi K2.5 Instant	1124	±13	1.4K	2.4%	2.9%	32 tps	3.0s	262K	$0.50	$3.00
183	52	Qwen3.5 122B A17B	1124	±17	1.1K	3.2%	1.5%	82 tps	1.4s	256K	$0.40	$3.20
184	48	Grok 4 Fast Reasoning	1125	±5	11.8K	5.5%	2.1%	102 tps	3.1s	2M	$0.30	$0.75
185	79	MiniMax M2.5 Lightning	1128	±14	995	2.5%	1.5%	51 tps	2.0s	205K	$0.60	$2.40
186	40	DeepSeek V3.2	1130	±5	4.4K	5.1%	1.4%	83 tps	5.1s	131K	$0.43	$1.09
187	71	GPT-5 Mini	1130	±4	6.1K	7.9%	2.6%	66 tps	14.2s	400K	$0.25	$2.00
188	68	Grok 4	1130	±2	23.2K	6.4%	3.9%	29 tps	11.1s	256K	$3.00	$15.00
189	62	GPT-5.1 Instant	1134	±5	5.5K	5.7%	1.3%	50 tps	1.9s	400K	$1.25	$10.00
190	52	Claude Haiku 4.5	1134	±5	9.9K	6.9%	1.1%	100 tps	0.9s	200K	$1.00	$5.00
191	65	GLM 4.6	1136	±5	14.1K	4.7%	5.4%	39 tps	1.5s	200K	$0.42	$1.66
192	26	Grok 4.1 Fast Non-Reasoning	1137	±5	7.4K	6.6%	0.9%	101 tps	0.5s	2M	$0.20	$0.50
193	52	GPT-5	1138	±4	14K	7.9%	3.1%	78 tps	23.1s	400K	$1.25	$9.67
194	84	GPT-5 Mini Minimal	1139	±8	2.8K	9.7%	1.2%	63 tps	1.4s	400K	$0.25	$2.00
195	33	Qwen3 Next 80B A3B Instruct	1141	±4	7.6K	7.7%	0.6%	84 tps	1.1s	256K	$0.20	$1.42
196	33	Grok 4.20 Multi Agent Beta	1143	±16	765	1.9%	1.2%	56 tps	8.8s	2M	$2.00	$6.00
197	40	Qwen3 235B A22B Instruct 2507	1146	±3	8.8K	12.2%	6.8%	13 tps	1.9s	262K	$0.13	$0.52
198	44	Grok 4.1 Fast Reasoning	1149	±6	21.2K	4.2%	1.5%	58 tps	7.3s	2M	$0.20	$0.50
199	60	MiniMax M2.1	1149	±6	10.4K	4.3%	2.1%	66 tps	2.6s	205K	$0.30	$1.20
200	42	Qwen3 Max Instruct Preview	1150	±4	13.5K	5.8%	1.1%	31 tps	1.7s	256K	$1.43	$6.61

5of6

View All (237 models)