Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Language

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1155

Claude Opus 4.5 (Thinking)

1151

Grok 4.1 Fast Non-Reasoning

1150

Qwen Plus (Aug'24)

1150

gpt-oss-20b

1149

Arcee AI Maestro Reasoning

1147

Qwen3.5 122B A17B

1144

Claude Opus 4.5

1141

Qwen Plus 0728 (Thinking)

1140

Kimi K2.5 Instant

1140

Step 3.5 Flash

1139

Qwen3 Omni 30B A3B Thinking

1138

Nova Experimental Chat 12-10

1136

Solar Pro 3 (Reasoning)

1136

Claude Sonnet 4.5 (Thinking)

1131

GPT-5.2 (Extra High)

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
41	7	Claude Opus 4.5 (Thinking)	1155	±5	5.3K	1.6%	1.8%	49 tps	1.4s	200K	$5.00	$25.00
42	26	Grok 4.1 Fast Non-Reasoning	1151	±6	3.2K	1.8%	0.9%	101 tps	0.5s	2M	$0.20	$0.50
43	68	Qwen Plus (Aug'24)	1150	±5	7.5K	1.4%	1.4%	53 tps	1.3s	30K	$0.40	$1.20
44	101	gpt-oss-20b	1150	±7	4K	1.7%	0.5%	216 tps	0.5s	131K	$0.06	$0.26
45	147	Arcee AI Maestro Reasoning	1149	±7	2K	1.4%	<0.1%	85 tps	0.3s	131K	$0.90	$3.30
46	52	Qwen3.5 122B A17B	1147	±13	765	1.9%	1.5%	82 tps	1.4s	256K	$0.40	$3.20
47	17	Claude Opus 4.5	1144	±8	2.4K	2.1%	1.5%	45 tps	1.5s	200K	$5.00	$25.00
48	100	Qwen Plus 0728 (Thinking)	1141	±14	500	2.0%	<0.1%	56 tps	1.1s	1M	$0.40	$4.00
49	37	Kimi K2.5 Instant	1140	±10	1.1K	1.4%	2.9%	32 tps	3.0s	262K	$0.50	$3.00
50	48	Step 3.5 Flash	1140	±15	965	0.5%	2.2%	109 tps	0.6s	256K	$0.05	$0.15
51	37	Qwen3 Omni 30B A3B Thinking	1139	±7	1.6K	1.2%	3.7%	67 tps	1.2s	66K	$0.97	$1.79
52	29	Nova Experimental Chat 12-10	1138	±9	1.9K	0.5%	2.4%	84 tps	12.9s	98K	$0	$0
53	111	Solar Pro 3 (Reasoning)	1136	±13	830	1.2%	3.2%	118 tps	1.2s	131K	$0.15	$0.60
54	10	Claude Sonnet 4.5 (Thinking)	1136	±5	6.8K	2.7%	1.9%	44 tps	1.1s	200K	$3.00	$15.00
55	42	GPT-5.2 (Extra High)	1131	±5	3.7K	0.9%	13.2%	17 tps	20.5s	400K	$1.75	$14.00
56	26	Claude Haiku 4.5 (Extended Thinking)	1129	±5	3.6K	1.6%	1.4%	115 tps	0.7s	200K	$1.00	$5.00
57	213	DeepSeek R1T Chimera	1128	±8	1.9K	2.5%	<0.1%	46 tps	1.1s	164K	$0.09	$0.36
58	42	Qwen3 Max Instruct Preview	1126	±4	4.3K	2.8%	1.1%	31 tps	1.7s	256K	$1.43	$6.61
59	44	Gemini 2.5 Pro	1126	±4	16.2K	1.5%	2.3%	45 tps	2.6s	1M	$1.25	$10.00
60	77	Claude Opus 4.1	1122	±9	1.3K	2.3%	3.0%	17 tps	3.7s	200K	$15.00	$75.00
61	44	Grok 4.1 Fast Reasoning	1119	±6	5.4K	1.5%	1.5%	58 tps	7.3s	2M	$0.20	$0.50
62	37	Claude Sonnet 4.5	1116	±6	5K	3.1%	1.4%	41 tps	1.3s	200K	$1.80	$9.00
63	56	DeepSeek V3.1 Turbo	1114	±6	4K	2.1%	0.9%	173 tps	1.3s	164K	$2.00	$3.75
64	40	DeepSeek V3.2	1113	±5	3.6K	0.8%	1.4%	83 tps	5.1s	131K	$0.43	$1.09
65	101	Gemini 2.5 Flash Lite	1112	±6	7.6K	1.7%	1.3%	210 tps	0.7s	1M	$0.10	$0.40
66	93	Qwen Max	1111	±6	7.6K	1.4%	1.5%	49 tps	1.5s	33K	$1.60	$6.40
67	95	Qwen3 32B	1111	±17	515	1.9%	3.9%	30 tps	3.1s	41K	$0.12	$0.42
68	22	GLM 5	1110	±7	1.8K	0.8%	3.4%	36 tps	2.7s	200K	$0.72	$2.55
69	84	GPT-5 Mini Minimal	1107	±10	970	3.5%	1.2%	63 tps	1.4s	400K	$0.25	$2.00
70	93	DeepSeek V3 0324 Turbo	1103	±5	4.4K	1.9%	6.3%	12 tps	2.4s	164K	$0.73	$1.79
71	121	Qwen3 32B Fast	1098	±8	9K	1.0%	11.6%	30 tps	3.1s	41K	$0.10	$0.25
72	86	DeepSeek V3.1 Chat	1097	±7	1.9K	2.3%	2.8%	21 tps	1.6s	131K	$0.38	$1.00
73	56	DeepSeek V3.2 Thinking	1096	±6	3.8K	0.9%	9.0%	30 tps	2.6s	131K	$0.28	$0.42
74	111	LongCat Flash Chat	1095	±7	1.7K	2.8%	0.8%	85 tps	0.9s	131K	$0.14	$0.68
75	84	Nova Experimental Chat 10-09	1093	±10	1.3K	7.4%	<0.1%	59 tps	6.1s	98K	$0	$0
76	121	QwQ 32B	1091	±5	9.9K	0.9%	5.4%	41 tps	2.1s	16K	$0.43	$0.56
77	33	Kimi K2.5	1090	±6	4.5K	0.7%	6.5%	33 tps	1.7s	262K	$0.34	$2.57
78	86	Nemotron 3 Nano (Thinking)	1089	±9	1.5K	0.7%	2.0%	200 tps	0.5s	256K	$0	$0
79	62	GPT-5.1 Instant	1085	±6	3.7K	1.1%	1.3%	50 tps	1.9s	400K	$1.25	$10.00
80	106	DeepSeek V3 0324	1084	±4	5.7K	1.4%	5.8%	12 tps	2.7s	164K	$0.38	$0.93

2of7

View All (260 models)