Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Language

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1388

Claude Opus 4.6 (Thinking)

1367

GPT-5.4 (High)

1339

GPT-5.4

1328

Claude Opus 4.6

1326

Nova Experimental Chat 10-20

1325

Claude Sonnet 4.6 (Thinking)

1307

GPT-5.1 (Medium)

1268

Gemini 3.1 Pro

1266

GPT-5.2 Instant

1258

GPT-5.1 (High)

1243

GPT-5 (High)

1236

Mistral Medium 3.1

1233

GPT-5.1

1232

Claude Sonnet 4.6

1226

Qwen3 30B A3B Instruct 2507

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
1	1	Claude Opus 4.6 (Thinking)	1388	±10	1.8K	0.8%	2.5%	56 tps	1.6s	200K	$5.00	$25.00
2	4	GPT-5.4 (High)	1367	±15	825	1.2%	4.6%	68 tps	7.9s	1M	$2.50	$15.00
3	2	GPT-5.4	1339	±15	530	0.9%	2.6%	55 tps	0.8s	1M	$2.50	$15.00
4	2	Claude Opus 4.6	1328	±8	2.3K	0.9%	2.1%	48 tps	1.7s	200K	$5.00	$25.00
5	37	Nova Experimental Chat 10-20	1326	±6	2.1K	3.3%	<0.1%	30 tps	0.5s	98K	$0	$0
6	5	Claude Sonnet 4.6 (Thinking)	1325	±7	1.6K	1.2%	4.7%	57 tps	1.1s	200K	$3.00	$15.00
7	8	GPT-5.1 (Medium)	1307	±7	1.2K	2.5%	<0.1%	86 tps	3.8s	400K	$0.83	$6.67
8	6	Gemini 3.1 Pro	1268	±8	4.3K	0.7%	3.5%	35 tps	4.1s	1M	$2.00	$12.00
9	10	GPT-5.2 Instant	1266	±4	6.2K	0.7%	1.7%	52 tps	2.0s	400K	$1.75	$14.00
10	8	GPT-5.1 (High)	1258	±6	5.3K	1.3%	3.2%	76 tps	6.9s	400K	$1.25	$10.00
11	26	GPT-5 (High)	1243	±7	3K	2.3%	4.5%	81 tps	35.9s	400K	$1.25	$10.00
12	19	Mistral Medium 3.1	1236	±5	5.1K	2.3%	<0.1%	77 tps	0.7s	128K	$0.40	$2.00
13	8	GPT-5.1	1233	±8	3.3K	1.4%	2.3%	71 tps	1.4s	400K	$1.42	$11.33
14	4	Claude Sonnet 4.6	1232	±11	1.6K	0.9%	1.6%	47 tps	1.2s	200K	$3.00	$15.00
15	33	Qwen3 30B A3B Instruct 2507	1226	±8	5.6K	2.2%	1.2%	55 tps	1.3s	131K	$0.13	$0.72
16	10	Gemini 3 Pro	1217	±5	11.7K	0.9%	2.1%	50 tps	3.6s	1M	$2.00	$12.00
17	22	GPT-5 Chat	1208	±5	10.4K	2.2%	1.3%	95 tps	0.9s	400K	$1.25	$10.00
18	29	Qwen3 VL 235B A22B Instruct	1202	±10	1.8K	4.5%	3.1%	75 tps	1.9s	129K	$0.37	$1.81
19	33	Qwen Plus 0728	1199	±11	850	2.3%	<0.1%	55 tps	0.9s	1M	$0.40	$1.20
20	48	OpenAI o1-mini	1195	±4	10.8K	1.1%	<0.1%	118 tps	N/A	128K	$1.13	$4.51
21	13	GPT-5.3 Instant	1191	±11	1.6K	1.2%	0.9%	63 tps	0.8s	400K	$1.75	$14.00
22	17	Grok 4.20 Beta Reasoning	1190	±17	540	0.9%	1.1%	77 tps	4.5s	2M	$2.00	$5.50
23	14	Gemini 3 Pro (Low)	1189	±6	4.8K	0.9%	2.4%	51 tps	3.5s	1M	$2.00	$12.00
24	32	Gemini 2.5 Pro High	1182	±3	6.7K	2.2%	1.5%	48 tps	2.3s	1M	$1.25	$10.00
25	40	Qwen3 235B A22B Instruct 2507	1178	±6	5.1K	1.9%	6.8%	13 tps	1.9s	262K	$0.13	$0.52
26	106	Claude Sonnet 3.5 v2	1177	±8	1.6K	1.2%	<0.1%	46 tps	1.4s	200K	$3.00	$15.00
27	17	GPT-5.2 (High)	1177	±7	7.4K	0.8%	6.7%	18 tps	16.3s	400K	$1.75	$14.00
28	111	Claude Sonnet 3.7	1173	±6	2.8K	1.9%	<0.1%	39 tps	1.6s	200K	$3.00	$15.00
29	14	Gemini 3 Flash Preview Thinking	1173	±5	4.4K	0.6%	1.6%	3 tps	6.2s	1M	$0.50	$3.00
30	16	Nova Experimental Chat 11-10	1171	±6	2.7K	1.3%	0.4%	84 tps	8.9s	98K	$0	$0
31	81	GPT-4o	1170	±9	2.3K	2.8%	1.0%	49 tps	2.4s	128K	$3.71	$12.57
32	16	GPT-5.2	1168	±6	3K	1.2%	4.1%	18 tps	2.7s	400K	$1.75	$14.00
33	17	Gemini 3 Flash Preview	1166	±7	2.4K	0.6%	1.3%	138 tps	1.4s	1M	$0.50	$3.00
34	60	Gemini 2.5 Flash Preview 0925	1163	±7	2.7K	2.9%	1.2%	5 tps	0.9s	1M	$0.13	$0.97
35	43	Gemini 2.5 Flash Thinking Preview 0925	1159	±5	3.1K	2.7%	<0.1%	111 tps	4.7s	1M	$0.30	$2.50
36	56	Gemini 2.5 Pro Low	1159	±6	3.3K	3.5%	<0.1%	89 tps	2.4s	1M	$1.25	$10.00
37	100	Gemini 2.5 Flash Preview	1159	±11	1K	1.4%	<0.1%	138 tps	6.9s	1M	$0.15	$0.60
38	7	Claude Opus 4.5 (Thinking)	1155	±5	5.3K	1.6%	1.8%	49 tps	1.4s	200K	$5.00	$25.00
39	26	Grok 4.1 Fast Non-Reasoning	1151	±6	3.2K	1.8%	0.9%	101 tps	0.5s	2M	$0.20	$0.50
40	68	Qwen Plus (Aug'24)	1150	±5	7.5K	1.4%	1.4%	53 tps	1.3s	30K	$0.40	$1.20

1of5

View All (198 models)