Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Language

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1049

Grok 4 Fast Reasoning

1045

GPT-4.1 mini

1043

Mistral Medium

1042

Amazon Nova 2 Lite

1042

Gemini 2.5 Flash Lite

1040

Qwen3 Omni 30B A3B Thinking

1038

GLM 4.5

1038

DeepSeek V3 0324 Turbo

1038

DeepSeek V3.2 Exp Thinking

1038

DeepSeek V3.1 Chat

1036

Qwen3 VL 235B A22B Instruct

1034

Grok 3

1031

MiniMax M2.5 Lightning

1030

Grok 4 Fast Non-Reasoning

1027

Claude Sonnet 3.7

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
81	48	Grok 4 Fast Reasoning	1049	±10	2.3K	3.6%	2.1%	102 tps	3.1s	2M	$0.30	$0.75
82	118	GPT-4.1 mini	1045	±8	3.4K	2.5%	1.1%	67 tps	0.9s	1M	$0.34	$1.60
83	113	Mistral Medium	1043	±11	1.1K	3.1%	1.8%	48 tps	0.6s	33K	$1.48	$4.55
84	86	Amazon Nova 2 Lite	1042	±23	690	2.1%	1.0%	137 tps	0.6s	300K	$0.35	$2.95
85	101	Gemini 2.5 Flash Lite	1042	±6	7.8K	4.3%	1.3%	210 tps	0.7s	1M	$0.10	$0.40
86	37	Qwen3 Omni 30B A3B Thinking	1040	±20	750	2.0%	3.7%	67 tps	1.2s	66K	$0.97	$1.79
87	113	GLM 4.5	1038	±12	915	3.2%	3.7%	46 tps	1.4s	131K	$0.43	$1.63
88	93	DeepSeek V3 0324 Turbo	1038	±9	2.2K	1.8%	6.3%	12 tps	2.4s	164K	$0.73	$1.79
89	95	DeepSeek V3.2 Exp Thinking	1038	±17	655	3.7%	7.2%	26 tps	3.0s	131K	$0.28	$0.42
90	86	DeepSeek V3.1 Chat	1038	±13	975	2.5%	2.8%	21 tps	1.6s	131K	$0.38	$1.00
91	29	Qwen3 VL 235B A22B Instruct	1036	±16	1.3K	4.2%	3.1%	75 tps	1.9s	129K	$0.37	$1.81
92	106	Grok 3	1034	±8	2.8K	2.8%	1.5%	53 tps	0.6s	1M	$3.67	$18.33
93	79	MiniMax M2.5 Lightning	1031	±20	820	1.8%	1.5%	51 tps	2.0s	205K	$0.60	$2.40
94	52	Grok 4 Fast Non-Reasoning	1030	±17	1.5K	4.1%	1.5%	93 tps	0.6s	2M	$0.27	$0.67
95	111	Claude Sonnet 3.7	1027	±9	4K	4.9%	<0.1%	39 tps	1.6s	200K	$3.00	$15.00
96	68	Qwen Plus (Aug'24)	1023	±9	2.4K	2.9%	1.4%	53 tps	1.3s	30K	$0.40	$1.20
97	56	DeepSeek V3.2 Thinking	1021	±13	1.9K	1.8%	9.0%	30 tps	2.6s	131K	$0.28	$0.42
98	44	Grok 4.1 Fast Reasoning	1020	±7	3.7K	3.0%	1.5%	58 tps	7.3s	2M	$0.20	$0.50
99	56	MiniMax M2.1 Lightning	1019	±24	830	1.8%	1.7%	52 tps	2.1s	205K	$0.30	$2.40
100	71	GPT-5 Mini	1017	±10	3.1K	5.2%	2.6%	66 tps	14.2s	400K	$0.25	$2.00
101	106	DeepSeek V3 0324	1013	±11	2.1K	3.1%	5.8%	12 tps	2.7s	164K	$0.38	$0.93
102	124	Qwen3 235B A22B Thinking 2507	1010	±16	745	3.2%	2.5%	53 tps	1.6s	131K	$0.59	$5.70
103	95	DeepSeek-R1 Turbo	1009	±20	660	3.6%	2.6%	29 tps	1.8s	64K	$2.85	$4.75
104	93	Qwen Max	1009	±11	2.7K	2.7%	1.5%	49 tps	1.5s	33K	$1.60	$6.40
105	80	GPT-5 (Minimal)	1003	±10	1.9K	5.4%	<0.1%	67 tps	1.4s	400K	$1.25	$10.00
106	133	DeepSeek-R1 0528	1001	±15	1.1K	4.1%	1.3%	93 tps	0.5s	64K	$1.60	$3.67
107	106	DeepSeek V3.1 Terminus Thinking	1000	±14	745	2.6%	5.9%	27 tps	1.8s	131K	$0.56	$1.68
108	65	GLM 4.6	991	±15	945	3.6%	5.4%	39 tps	1.5s	200K	$0.42	$1.66
109	147	GLM 4.5 Air	991	±16	1.1K	2.7%	<0.1%	22 tps	1.4s	131K	$0.10	$0.38
110	37	Nova Experimental Chat 10-20	984	±20	555	5.1%	<0.1%	30 tps	0.5s	98K	$0	$0
111	71	Seed 1.8 251228	983	±10	3K	2.6%	3.7%	41 tps	2.1s	256K	$0.25	$2.00
112	113	Kimi K2 Fast	975	±10	4.8K	2.3%	0.8%	365 tps	0.5s	131K	$1.00	$3.00
113	143	Gemini 2.0 Flash	974	±19	1.9K	4.7%	<0.1%	76 tps	0.5s	1M	$0.14	$0.56
114	133	GPT-4.1 nano	974	±11	2.3K	3.4%	0.6%	175 tps	0.5s	1M	$0.10	$0.40
115	148	OpenAI o3	970	±10	1.2K	3.1%	0.9%	85 tps	6.8s	128K	$7.33	$29.33
116	129	Command A	965	±8	3K	2.9%	2.2%	42 tps	0.8s	256K	$2.00	$7.33
117	111	LongCat Flash Chat	963	±25	560	4.3%	0.8%	85 tps	0.9s	131K	$0.14	$0.68
118	111	Solar Pro 3 (Reasoning)	960	±23	505	1.0%	3.2%	118 tps	1.2s	131K	$0.15	$0.60
119	153	OpenAI o1	960	±11	2.3K	2.4%	4.2%	92 tps	5.5s	200K	$15.00	$60.00
120	126	DeepSeek V3	960	±7	3.4K	2.3%	0.9%	69 tps	1.1s	64K	$0.59	$1.49

3of5

View All (188 models)