Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Language

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1061

OpenAI o3-pro

1054

Grok 3

1054

Kimi K2 Thinking

1053

DeepSeek V3.1

1053

DeepSeek V3.1 Terminus Chat

1051

Qwen3 30B A3B

1047

DeepSeek V3.2 Exp Chat

1046

MiniMax M2

1046

ERNIE 4.5 300B A47B

1046

GPT-4.1 nano

1044

Claude Sonnet 4 (Thinking)

1042

Gemini 2.5 Flash Thinking

1040

Qwen3.5 397B A17B

1039

GLM 4.7 FP8

1035

DeepSeek V3.1 Terminus Thinking

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
81	81	OpenAI o3-pro	1061	±14	1.3K	2.7%	5.2%	22 tps	70.8s	200K	$20.00	$80.00
82	106	Grok 3	1054	±6	7.1K	1.7%	1.5%	53 tps	0.6s	1M	$3.67	$18.33
83	95	Kimi K2 Thinking	1054	±9	1.9K	3.8%	4.2%	61 tps	5.9s	262K	$0.24	$1.03
84	71	DeepSeek V3.1	1053	±13	1.8K	1.6%	0.8%	197 tps	0.4s	164K	$0.55	$1.60
85	44	DeepSeek V3.1 Terminus Chat	1053	±6	2.6K	2.6%	3.4%	27 tps	1.5s	131K	$0.86	$1.80
86	126	Qwen3 30B A3B	1051	±7	3.9K	1.3%	5.1%	163 tps	1.0s	41K	$0.06	$0.21
87	65	DeepSeek V3.2 Exp Chat	1047	±9	2.2K	3.1%	2.6%	29 tps	1.5s	131K	$0.27	$0.39
88	62	MiniMax M2	1046	±6	3.8K	1.9%	2.2%	39 tps	2.3s	205K	$0.21	$0.85
89	119	ERNIE 4.5 300B A47B	1046	±6	5.3K	1.3%	4.7%	23 tps	2.3s	123K	$0.28	$1.10
90	133	GPT-4.1 nano	1046	±8	5.1K	2.0%	0.6%	175 tps	0.5s	1M	$0.10	$0.40
91	48	Claude Sonnet 4 (Thinking)	1044	±5	8.4K	2.3%	1.5%	52 tps	1.5s	200K	$3.00	$13.67
92	71	Gemini 2.5 Flash Thinking	1042	±5	6.5K	1.5%	2.2%	88 tps	6.4s	1M	$0.30	$2.50
93	71	Qwen3.5 397B A17B	1040	±10	1.4K	1.4%	4.3%	57 tps	1.4s	256K	$0.52	$3.00
94	119	GLM 4.7 FP8	1039	±9	515	1.0%	6.9%	40 tps	1.3s	200K	$0.30	$1.20
95	106	DeepSeek V3.1 Terminus Thinking	1035	±11	1.4K	2.8%	5.9%	27 tps	1.8s	131K	$0.56	$1.68
96	113	Mistral Medium	1035	±5	3.6K	1.8%	1.8%	48 tps	0.6s	33K	$1.48	$4.55
97	65	GLM 4.6	1030	±8	2.6K	2.8%	5.4%	39 tps	1.5s	200K	$0.42	$1.66
98	86	Qwen3 235B A22B	1030	±9	3.1K	1.6%	5.3%	71 tps	0.9s	41K	$0.23	$0.63
99	95	DeepSeek V3.2 Exp Thinking	1029	±11	1.4K	0.7%	7.2%	26 tps	3.0s	131K	$0.28	$0.42
100	68	GLM 4.7	1026	±6	4.5K	0.8%	5.8%	40 tps	1.5s	200K	$0.77	$1.73
101	71	GPT-5 Mini	1025	±6	3.2K	2.0%	2.6%	66 tps	14.2s	400K	$0.25	$2.00
102	126	Qwen3 VL 235B A22B Thinking	1024	±11	1.6K	4.2%	4.3%	47 tps	3.0s	127K	$0.47	$3.31
103	143	Gemini 2.0 Flash	1022	±7	2.5K	2.5%	<0.1%	76 tps	0.5s	1M	$0.14	$0.56
104	153	Qwen 2.5 32B Instruct	1019	±8	1.4K	1.8%	2.5%	48 tps	1.0s	131K	$0.21	$0.25
105	113	GLM 4.5	1019	±6	2.5K	1.6%	3.7%	46 tps	1.4s	131K	$0.43	$1.63
106	71	Seed 1.8 251228	1018	±6	4.4K	1.0%	3.7%	41 tps	2.1s	256K	$0.25	$2.00
107	139	GLM 4.6V	1018	±12	1.6K	1.2%	6.4%	21 tps	1.8s	128K	$0.38	$0.90
108	148	Qwen3 30B A3B Thinking 2507	1017	±9	2.2K	1.8%	0.5%	124 tps	1.2s	131K	$0.16	$1.70
109	133	Kimi K2 0905	1013	±11	2.1K	3.7%	4.0%	30 tps	1.4s	262K	$0.63	$2.39
110	126	DeepSeek V3	1013	±6	8.8K	1.3%	0.9%	69 tps	1.1s	64K	$0.59	$1.49
111	101	DeepSeek V3 (Turbo)	1013	±12	705	1.4%	1.5%	32 tps	1.5s	64K	$0.40	$1.30
112	129	Qwen3 Max Thinking	1012	±6	2.1K	0.2%	13.5%	32 tps	2.3s	256K	$1.20	$6.00
113	129	Command A	1005	±5	8.6K	1.7%	2.2%	42 tps	0.8s	256K	$2.00	$7.33
114	143	Seed 1.6 250615	1005	±20	880	2.2%	3.1%	46 tps	2.2s	256K	$0.25	$2.00
115	133	DeepSeek V3.2 Speciale	1003	±10	1.3K	2.2%	6.0%	43 tps	1.4s	131K	$0.84	$1.52
116	113	Kimi K2 Fast	1003	±4	10K	1.8%	0.8%	365 tps	0.5s	131K	$1.00	$3.00
117	113	Gemini 2.5 Flash Lite Thinking	1003	±8	3.7K	2.4%	1.0%	118 tps	4.4s	1M	$0.03	$0.13
118	133	Qwen3 14B	1002	±6	3.6K	1.6%	1.7%	109 tps	0.8s	41K	$0.04	$0.15
119	148	DeepSeek-R1	1001	±6	5K	1.7%	0.8%	133 tps	0.6s	64K	$0.91	$3.07
120	157	Qwen3 Next 80B A3B Thinking	1000	±7	3.2K	3.0%	0.6%	175 tps	1.3s	256K	$0.21	$2.26

3of5

View All (193 models)