Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Topics

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1569

Claude Opus 4.6 (Thinking)

1493

GPT-5.4

1469

Claude Opus 4.6

1418

Gemini 3.1 Pro

1368

GPT-5.1 (High)

1364

Claude Sonnet 4.6

1361

GPT-5.2 Instant

1360

GPT-5.1

1345

Qwen3 30B A3B Instruct 2507

1343

Gemini 3 Pro

1329

GPT-5.2

1313

Claude Opus 4.5 (Thinking)

1300

Gemini 3 Pro (Low)

1281

Gemini 3 Flash Preview

1280

Claude Sonnet 4.6 (Thinking)

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
1	1	Claude Opus 4.6 (Thinking)	1569	±10	1.5K	1.7%	2.5%	56 tps	1.6s	200K	$5.00	$25.00
2	2	GPT-5.4	1493	±14	695	1.4%	2.6%	55 tps	0.8s	1M	$2.50	$15.00
3	2	Claude Opus 4.6	1469	±16	1.6K	2.4%	2.1%	48 tps	1.7s	200K	$5.00	$25.00
4	6	Gemini 3.1 Pro	1418	±11	4.9K	1.0%	3.5%	35 tps	4.1s	1M	$2.00	$12.00
5	8	GPT-5.1 (High)	1368	±10	4.9K	2.2%	3.2%	76 tps	6.9s	400K	$1.25	$10.00
6	4	Claude Sonnet 4.6	1364	±19	1.9K	1.3%	1.6%	47 tps	1.2s	200K	$3.00	$15.00
7	10	GPT-5.2 Instant	1361	±11	4K	1.1%	1.7%	52 tps	2.0s	400K	$1.75	$14.00
8	8	GPT-5.1	1360	±9	2.6K	2.4%	2.3%	71 tps	1.4s	400K	$1.42	$11.33
9	33	Qwen3 30B A3B Instruct 2507	1345	±6	3.4K	1.4%	1.2%	55 tps	1.3s	131K	$0.13	$0.72
10	10	Gemini 3 Pro	1343	±7	16.2K	1.4%	2.1%	50 tps	3.6s	1M	$2.00	$12.00
11	16	GPT-5.2	1329	±9	3.2K	1.2%	4.1%	18 tps	2.7s	400K	$1.75	$14.00
12	7	Claude Opus 4.5 (Thinking)	1313	±9	6K	2.5%	1.8%	49 tps	1.4s	200K	$5.00	$25.00
13	14	Gemini 3 Pro (Low)	1300	±8	4.1K	1.8%	2.4%	51 tps	3.5s	1M	$2.00	$12.00
14	17	Gemini 3 Flash Preview	1281	±11	2K	1.2%	1.3%	138 tps	1.4s	1M	$0.50	$3.00
15	5	Claude Sonnet 4.6 (Thinking)	1280	±14	1.4K	2.2%	4.7%	57 tps	1.1s	200K	$3.00	$15.00
16	17	GPT-5.2 (High)	1275	±12	6.5K	1.4%	6.7%	18 tps	16.3s	400K	$1.75	$14.00
17	29	Qwen3 VL 235B A22B Instruct	1273	±14	1.1K	2.3%	3.1%	75 tps	1.9s	129K	$0.37	$1.81
18	33	Qwen3 Next 80B A3B Instruct	1270	±10	2.3K	2.8%	0.6%	84 tps	1.1s	256K	$0.20	$1.42
19	22	GPT-5 Chat	1269	±5	7.9K	1.6%	1.3%	95 tps	0.9s	400K	$1.25	$10.00
20	48	gpt-oss-120b	1269	±7	4.6K	1.4%	0.7%	213 tps	0.5s	131K	$0.11	$0.50
21	40	Qwen3 235B A22B Instruct 2507	1261	±8	3.1K	1.4%	6.8%	13 tps	1.9s	262K	$0.13	$0.52
22	10	Claude Sonnet 4.5 (Thinking)	1261	±7	5.5K	1.9%	1.9%	44 tps	1.1s	200K	$3.00	$15.00
23	26	Grok 4.1 Fast Non-Reasoning	1260	±16	2.5K	3.7%	0.9%	101 tps	0.5s	2M	$0.20	$0.50
24	14	Gemini 3 Flash Preview Thinking	1248	±10	3.7K	1.3%	1.6%	3 tps	6.2s	1M	$0.50	$3.00
25	32	Gemini 2.5 Pro High	1234	±6	4.6K	2.4%	1.5%	48 tps	2.3s	1M	$1.25	$10.00
26	62	GPT-5.1 Instant	1233	±12	2.6K	2.7%	1.3%	50 tps	1.9s	400K	$1.25	$10.00
27	81	GPT-4o	1228	±11	2.9K	1.7%	1.0%	49 tps	2.4s	128K	$3.71	$12.57
28	13	GPT-5.3 Instant	1225	±13	2.3K	1.3%	0.9%	63 tps	0.8s	400K	$1.75	$14.00
29	42	Qwen3 Max Instruct Preview	1192	±9	2.8K	3.0%	1.1%	31 tps	1.7s	256K	$1.43	$6.61
30	29	Nova Experimental Chat 12-10	1192	±11	1.4K	0.7%	2.4%	84 tps	12.9s	98K	$0	$0
31	17	Claude Opus 4.5	1177	±15	1.9K	4.7%	1.5%	45 tps	1.5s	200K	$5.00	$25.00
32	33	Kimi K2.5	1175	±12	3.1K	1.6%	6.5%	33 tps	1.7s	262K	$0.34	$2.57
33	42	GPT-5.2 (Extra High)	1172	±13	3K	1.6%	13.2%	17 tps	20.5s	400K	$1.75	$14.00
34	44	Kimi K2 Thinking Turbo	1171	±11	1.8K	4.2%	2.0%	75 tps	1.4s	262K	$1.15	$8.00
35	48	Step 3.5 Flash	1163	±23	640	0.8%	2.2%	109 tps	0.6s	256K	$0.05	$0.15
36	56	Gemini 3.1 Flash Lite Preview Thinking	1162	±19	730	2.0%	1.7%	75 tps	4.7s	1M	$0.25	$1.50
37	44	Gemini 2.5 Pro	1161	±6	12.1K	1.4%	2.3%	45 tps	2.6s	1M	$1.25	$10.00
38	44	Grok 4.1 Fast Reasoning	1159	±11	4.3K	2.9%	1.5%	58 tps	7.3s	2M	$0.20	$0.50
39	56	DeepSeek V3.2 Thinking	1158	±19	2.4K	2.3%	9.0%	30 tps	2.6s	131K	$0.28	$0.42
40	26	Claude Haiku 4.5 (Extended Thinking)	1154	±8	2.4K	2.8%	1.4%	115 tps	0.7s	200K	$1.00	$5.00

1of5

View All (173 models)