Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Topics

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1524

Claude Opus 4.6 (Thinking)

1424

Claude Opus 4.6

1280

Claude Opus 4.5 (Thinking)

1266

Claude Sonnet 4.6

1256

GPT-5.2 Instant

1248

Gemini 3 Pro

1244

Gemini 3.1 Pro

1240

Gemini 3 Pro (Low)

1231

GPT-5.1 (High)

1230

GPT-5.1

1222

Claude Sonnet 4.6 (Thinking)

1178

Mistral Medium 3.1

1167

Gemini 3 Flash Preview Thinking

1165

Gemini 3 Flash Preview

1164

GPT-5 Chat

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
1	1	Claude Opus 4.6 (Thinking)	1524	±16	980	1.0%	2.5%	56 tps	1.6s	200K	$5.00	$25.00
2	2	Claude Opus 4.6	1424	±16	950	1.0%	2.1%	48 tps	1.7s	200K	$5.00	$25.00
3	7	Claude Opus 4.5 (Thinking)	1280	±14	2.8K	1.4%	1.8%	49 tps	1.4s	200K	$5.00	$25.00
4	4	Claude Sonnet 4.6	1266	±27	650	1.5%	1.6%	47 tps	1.2s	200K	$3.00	$15.00
5	10	GPT-5.2 Instant	1256	±15	1.3K	1.8%	1.7%	52 tps	2.0s	400K	$1.75	$14.00
6	10	Gemini 3 Pro	1248	±16	3.5K	1.5%	2.1%	50 tps	3.6s	1M	$2.00	$12.00
7	6	Gemini 3.1 Pro	1244	±23	1.4K	1.7%	3.5%	35 tps	4.1s	1M	$2.00	$12.00
8	14	Gemini 3 Pro (Low)	1240	±19	1.2K	0.8%	2.4%	51 tps	3.5s	1M	$2.00	$12.00
9	8	GPT-5.1 (High)	1231	±15	1.8K	1.7%	3.2%	76 tps	6.9s	400K	$1.25	$10.00
10	8	GPT-5.1	1230	±13	1.3K	1.9%	2.3%	71 tps	1.4s	400K	$1.42	$11.33
11	5	Claude Sonnet 4.6 (Thinking)	1222	±23	630	1.6%	4.7%	57 tps	1.1s	200K	$3.00	$15.00
12	19	Mistral Medium 3.1	1178	±10	1.5K	1.6%	<0.1%	77 tps	0.7s	128K	$0.40	$2.00
13	14	Gemini 3 Flash Preview Thinking	1167	±17	1.4K	1.7%	1.6%	3 tps	6.2s	1M	$0.50	$3.00
14	17	Gemini 3 Flash Preview	1165	±21	675	1.5%	1.3%	138 tps	1.4s	1M	$0.50	$3.00
15	22	GPT-5 Chat	1164	±12	3.5K	1.6%	1.3%	95 tps	0.9s	400K	$1.25	$10.00
16	16	GPT-5.2	1162	±18	785	1.9%	4.1%	18 tps	2.7s	400K	$1.75	$14.00
17	17	GPT-5.2 (High)	1145	±15	2.2K	1.6%	6.7%	18 tps	16.3s	400K	$1.75	$14.00
18	17	Claude Opus 4.5	1135	±21	1.1K	1.4%	1.5%	45 tps	1.5s	200K	$5.00	$25.00
19	44	Gemini 2.5 Pro	1125	±6	3.1K	3.7%	2.3%	45 tps	2.6s	1M	$1.25	$10.00
20	26	Claude Haiku 4.5 (Extended Thinking)	1123	±19	1.1K	1.9%	1.4%	115 tps	0.7s	200K	$1.00	$5.00
21	16	Nova Experimental Chat 11-10	1120	±20	500	2.0%	0.4%	84 tps	8.9s	98K	$0	$0
22	32	Gemini 2.5 Pro High	1119	±10	2.5K	2.4%	1.5%	48 tps	2.3s	1M	$1.25	$10.00
23	13	GPT-5.3 Instant	1110	±33	515	1.0%	0.9%	63 tps	0.8s	400K	$1.75	$14.00
24	33	Kimi K2.5	1110	±26	720	2.0%	6.5%	33 tps	1.7s	262K	$0.34	$2.57
25	42	GPT-5.2 (Extra High)	1107	±24	890	2.7%	13.2%	17 tps	20.5s	400K	$1.75	$14.00
26	10	Claude Sonnet 4.5 (Thinking)	1102	±13	3.2K	3.6%	1.9%	44 tps	1.1s	200K	$3.00	$15.00
27	43	Gemini 2.5 Flash Thinking Preview 0925	1097	±10	1.3K	1.6%	<0.1%	111 tps	4.7s	1M	$0.30	$2.50
28	29	Qwen3 VL 235B A22B Instruct	1094	±15	675	2.2%	3.1%	75 tps	1.9s	129K	$0.37	$1.81
29	48	Claude Sonnet 4 (Thinking)	1093	±14	1.6K	2.4%	1.5%	52 tps	1.5s	200K	$3.00	$13.67
30	42	Qwen3 Max Instruct Preview	1083	±17	1.1K	1.7%	1.1%	31 tps	1.7s	256K	$1.43	$6.61
31	44	DeepSeek V3.1 Terminus Chat	1078	±12	580	1.7%	3.4%	27 tps	1.5s	131K	$0.86	$1.80
32	26	GPT-5 (High)	1061	±9	2.5K	2.7%	4.5%	81 tps	35.9s	400K	$1.25	$10.00
33	52	Claude Haiku 4.5	1060	±13	1.6K	3.1%	1.1%	100 tps	0.9s	200K	$1.00	$5.00
34	65	GLM 4.6	1059	±25	640	2.3%	5.4%	39 tps	1.5s	200K	$0.42	$1.66
35	33	Qwen3 30B A3B Instruct 2507	1056	±18	810	2.4%	1.2%	55 tps	1.3s	131K	$0.13	$0.72
36	40	Qwen3 235B A22B Instruct 2507	1053	±19	680	0.7%	6.8%	13 tps	1.9s	262K	$0.13	$0.52
37	95	Gemini 2.5 Flash	1049	±18	2.1K	1.9%	1.3%	2 tps	3.7s	1M	$0.30	$2.50
38	68	Qwen Plus (Aug'24)	1048	±22	730	2.0%	1.4%	53 tps	1.3s	30K	$0.40	$1.20
39	56	Gemini 2.5 Pro Low	1044	±16	1.3K	2.3%	<0.1%	89 tps	2.4s	1M	$1.25	$10.00
40	84	Claude Sonnet 3.7 (Thinking)	1041	±22	575	2.5%	<0.1%	41 tps	2.6s	200K	$3.00	$15.00
41	62	GPT-5.1 Instant	1040	±13	915	2.7%	1.3%	50 tps	1.9s	400K	$1.25	$10.00
42	37	Claude Sonnet 4.5	1040	±8	2K	3.2%	1.4%	41 tps	1.3s	200K	$1.80	$9.00
43	111	Claude Sonnet 3.7	1039	±19	800	2.4%	<0.1%	39 tps	1.6s	200K	$3.00	$15.00
44	33	Qwen3 Next 80B A3B Instruct	1038	±15	920	2.6%	0.6%	84 tps	1.1s	256K	$0.20	$1.42
45	60	MiniMax M2.1	1036	±22	695	1.4%	2.1%	66 tps	2.6s	205K	$0.30	$1.20
46	60	Gemini 2.5 Flash Preview 0925	1025	±13	1.2K	2.0%	1.2%	5 tps	0.9s	1M	$0.13	$0.97
47	26	Grok 4.1 Fast Non-Reasoning	1023	±21	820	1.8%	0.9%	101 tps	0.5s	2M	$0.20	$0.50
48	52	Grok 4 Fast Non-Reasoning	1023	±16	870	1.7%	1.5%	93 tps	0.6s	2M	$0.27	$0.67
49	68	Grok 4	1022	±10	2.1K	2.5%	3.9%	29 tps	11.1s	256K	$3.00	$15.00
50	48	Grok 4 Fast Reasoning	1022	±14	1.2K	2.0%	2.1%	102 tps	3.1s	2M	$0.30	$0.75
51	44	Grok 4.1 Fast Reasoning	1016	±18	1.4K	2.0%	1.5%	58 tps	7.3s	2M	$0.20	$0.50
52	86	Claude Sonnet 4	1011	±19	1.8K	1.4%	1.8%	49 tps	1.3s	200K	$3.00	$15.00
53	71	Gemini 2.5 Flash Thinking	1000	±18	1K	1.9%	2.2%	88 tps	6.4s	1M	$0.30	$2.50
54	48	gpt-oss-120b	1000	±15	1.1K	1.3%	0.7%	213 tps	0.5s	131K	$0.11	$0.50
55	56	Claude Opus 4.1 (Thinking)	997	±14	740	3.9%	<0.1%	20 tps	3.9s	200K	$15.00	$75.00
56	68	GLM 4.7	992	±33	635	2.3%	5.8%	40 tps	1.5s	200K	$0.77	$1.73
57	93	Qwen Max	979	±19	695	2.1%	1.5%	49 tps	1.5s	33K	$1.60	$6.40
58	56	DeepSeek V3.1 Turbo	969	±37	665	1.5%	0.9%	173 tps	1.3s	164K	$2.00	$3.75
59	108	GPT-5 Mini Low	968	±14	590	3.3%	<0.1%	69 tps	3.2s	400K	$0.25	$2.00
60	77	Claude Opus 4.1	964	±22	610	4.7%	3.0%	17 tps	3.7s	200K	$15.00	$75.00
61	52	GPT-5	957	±20	1.6K	2.9%	3.1%	78 tps	23.1s	400K	$1.25	$9.67
62	84	GPT-5 Mini Minimal	953	±16	595	3.3%	1.2%	63 tps	1.4s	400K	$0.25	$2.00
63	101	Gemini 2.5 Flash Lite	948	±16	1.6K	2.7%	1.3%	210 tps	0.7s	1M	$0.10	$0.40
64	71	Gemini 2.5 Flash Lite Preview 0925	948	±16	1.1K	2.2%	1.2%	209 tps	0.7s	1M	$0.25	$0.35
65	81	GPT-4o	945	±31	505	3.8%	1.0%	49 tps	2.4s	128K	$3.71	$12.57
66	56	DeepSeek V3.2 Thinking	942	±26	705	2.8%	9.0%	30 tps	2.6s	131K	$0.28	$0.42
67	48	OpenAI o1-mini	937	±27	580	2.5%	<0.1%	118 tps	N/A	128K	$1.13	$4.51
68	79	Qwen3 Max Thinking Preview	925	±26	530	1.9%	3.1%	40 tps	2.1s	256K	$1.20	$6.00
69	126	Qwen3 VL 235B A22B Thinking	922	±19	645	3.0%	4.3%	47 tps	3.0s	127K	$0.47	$3.31
70	80	GPT-5 (Minimal)	922	±13	955	4.0%	<0.1%	67 tps	1.4s	400K	$1.25	$10.00
71	106	DeepSeek V3 0324	920	±25	570	0.9%	5.8%	12 tps	2.7s	164K	$0.38	$0.93
72	44	Kimi K2 Thinking Turbo	917	±27	530	1.9%	2.0%	75 tps	1.4s	262K	$1.15	$8.00
73	126	DeepSeek V3	910	±38	565	1.7%	0.9%	69 tps	1.1s	64K	$0.59	$1.49
74	118	GPT-4.1 mini	900	±16	950	1.0%	1.1%	67 tps	0.9s	1M	$0.34	$1.60
75	62	MiniMax M2	900	±24	720	2.7%	2.2%	39 tps	2.3s	205K	$0.21	$0.85
76	113	Kimi K2 Fast	887	±14	1.6K	1.0%	0.8%	365 tps	0.5s	131K	$1.00	$3.00
77	129	DeepSeek V3.1 Thinking	886	±16	510	2.9%	7.1%	18 tps	1.8s	131K	$0.23	$0.75
78	65	Mistral Large 3	881	±27	495	3.9%	2.1%	51 tps	1.0s	256K	$0.50	$1.50
79	124	Kimi K2 0905 Turbo	881	±17	710	2.1%	0.7%	373 tps	0.5s	262K	$1.70	$6.50
80	106	Grok 3	872	±26	745	1.3%	1.5%	53 tps	0.6s	1M	$3.67	$18.33
81	71	GPT-5 Mini	870	±15	940	4.1%	2.6%	66 tps	14.2s	400K	$0.25	$2.00
82	95	Gemini 2.5 Flash Lite Thinking Preview 0925	859	±16	1.2K	2.5%	1.5%	152 tps	3.0s	1M	$0.10	$0.40
83	143	Gemini 2.0 Flash Lite	856	±24	585	3.3%	<0.1%	42 tps	0.5s	1M	$0.08	$0.30
84	148	OpenAI o4-mini-high	854	±22	560	1.8%	1.9%	117 tps	15.9s	200K	$1.10	$4.40
85	93	DeepSeek V3 0324 Turbo	843	±21	635	0.8%	6.3%	12 tps	2.4s	164K	$0.73	$1.79
86	157	GPT-5 Nano	836	±34	685	4.2%	3.2%	113 tps	20.9s	400K	$0.05	$0.40
87	129	Command A	836	±15	855	1.2%	2.2%	42 tps	0.8s	256K	$2.00	$7.33
88	133	GPT-4.1 nano	824	±22	735	2.0%	0.6%	175 tps	0.5s	1M	$0.10	$0.40
89	101	gpt-oss-20b	816	±20	555	1.8%	0.5%	216 tps	0.5s	131K	$0.06	$0.26
90	113	Gemini 2.5 Flash Lite Thinking	804	±18	775	3.1%	1.0%	118 tps	4.4s	1M	$0.03	$0.13
91	213	Claude Haiku 3.5	803	±24	545	6.0%	0.8%	40 tps	2.8s	200K	$0.80	$4.00
92	302	YouTube	802	±22	485	4.0%	<0.1%	34 tps	2.7s	32K	$0.99	$0.99
93	139	OpenAI o4-mini	768	±32	545	0.9%	1.4%	97 tps	7.0s	128K	$1.10	$4.40
94	157	Qwen3 Next 80B A3B Thinking	767	±23	810	2.4%	0.6%	175 tps	1.3s	256K	$0.21	$2.26
95	160	Llama 4 Scout	722	±33	700	2.1%	0.6%	88 tps	5.1s	131K	$0.18	$0.46
96	161	Llama 4 Maverick	719	±27	1K	2.9%	1.2%	88 tps	2.4s	1M	$0.23	$0.83
97	177	OpenAI o3-mini	676	±27	690	1.4%	0.8%	143 tps	3.3s	200K	$1.10	$4.40
98	175	OpenAI o3-mini-low	659	±21	505	2.9%	0.7%	139 tps	1.5s	200K	$1.10	$4.40

Show Less