Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Language

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

922

Qwen3 VL 235B A22B Thinking

922

Kimi K2 0905

925

Kimi K2 0905 Turbo

926

Kimi K2 Thinking

928

Seed 1.6 250615

947

Mistral Large 3

949

Qwen3.5 35B A3B

950

gpt-oss-20b

951

OpenAI o3-pro

952

Qwen3 Max Thinking Preview

955

DeepSeek V3.1 Thinking

956

OpenAI o4-mini

958

OpenAI o4-mini-high

960

DeepSeek V3

960

OpenAI o1

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
41	126	Qwen3 VL 235B A22B Thinking	922	±18	745	4.5%	4.3%	47 tps	3.0s	127K	$0.47	$3.31
42	133	Kimi K2 0905	922	±21	805	4.2%	4.0%	30 tps	1.4s	262K	$0.63	$2.39
43	124	Kimi K2 0905 Turbo	925	±13	1.5K	4.7%	0.7%	373 tps	0.5s	262K	$1.70	$6.50
44	95	Kimi K2 Thinking	926	±17	740	2.0%	4.2%	61 tps	5.9s	262K	$0.24	$1.03
45	143	Seed 1.6 250615	928	±21	635	5.2%	3.1%	46 tps	2.2s	256K	$0.25	$2.00
46	65	Mistral Large 3	947	±20	1.3K	4.4%	2.1%	51 tps	1.0s	256K	$0.50	$1.50
47	101	Qwen3.5 35B A3B	949	±27	530	2.8%	2.1%	116 tps	2.1s	256K	$0.63	$1.13
48	101	gpt-oss-20b	950	±18	1.4K	4.7%	0.5%	216 tps	0.5s	131K	$0.06	$0.26
49	81	OpenAI o3-pro	951	±19	1.6K	3.4%	5.2%	22 tps	70.8s	200K	$20.00	$80.00
50	79	Qwen3 Max Thinking Preview	952	±20	1.1K	2.2%	3.1%	40 tps	2.1s	256K	$1.20	$6.00
51	129	DeepSeek V3.1 Thinking	955	±14	1.1K	2.2%	7.1%	18 tps	1.8s	131K	$0.23	$0.75
52	139	OpenAI o4-mini	956	±16	1.4K	2.8%	1.4%	97 tps	7.0s	128K	$1.10	$4.40
53	148	OpenAI o4-mini-high	958	±11	2.2K	3.1%	1.9%	117 tps	15.9s	200K	$1.10	$4.40
54	126	DeepSeek V3	960	±7	3.4K	2.3%	0.9%	69 tps	1.1s	64K	$0.59	$1.49
55	153	OpenAI o1	960	±11	2.3K	2.4%	4.2%	92 tps	5.5s	200K	$15.00	$60.00
56	111	LongCat Flash Chat	963	±25	560	4.3%	0.8%	85 tps	0.9s	131K	$0.14	$0.68
57	129	Command A	965	±8	3K	2.9%	2.2%	42 tps	0.8s	256K	$2.00	$7.33
58	148	OpenAI o3	970	±10	1.2K	3.1%	0.9%	85 tps	6.8s	128K	$7.33	$29.33
59	133	GPT-4.1 nano	974	±11	2.3K	3.4%	0.6%	175 tps	0.5s	1M	$0.10	$0.40
60	143	Gemini 2.0 Flash	974	±19	1.9K	4.7%	<0.1%	76 tps	0.5s	1M	$0.14	$0.56
61	113	Kimi K2 Fast	975	±10	4.8K	2.3%	0.8%	365 tps	0.5s	131K	$1.00	$3.00
62	71	Seed 1.8 251228	983	±10	3K	2.6%	3.7%	41 tps	2.1s	256K	$0.25	$2.00
63	65	GLM 4.6	991	±15	945	3.6%	5.4%	39 tps	1.5s	200K	$0.42	$1.66
64	106	DeepSeek V3.1 Terminus Thinking	1000	±14	745	2.6%	5.9%	27 tps	1.8s	131K	$0.56	$1.68
65	133	DeepSeek-R1 0528	1001	±15	1.1K	4.1%	1.3%	93 tps	0.5s	64K	$1.60	$3.67
66	93	Qwen Max	1009	±11	2.7K	2.7%	1.5%	49 tps	1.5s	33K	$1.60	$6.40
67	95	DeepSeek-R1 Turbo	1009	±20	660	3.6%	2.6%	29 tps	1.8s	64K	$2.85	$4.75
68	124	Qwen3 235B A22B Thinking 2507	1010	±16	745	3.2%	2.5%	53 tps	1.6s	131K	$0.59	$5.70
69	106	DeepSeek V3 0324	1013	±11	2.1K	3.1%	5.8%	12 tps	2.7s	164K	$0.38	$0.93
70	71	GPT-5 Mini	1017	±10	3.1K	5.2%	2.6%	66 tps	14.2s	400K	$0.25	$2.00
71	56	MiniMax M2.1 Lightning	1019	±24	830	1.8%	1.7%	52 tps	2.1s	205K	$0.30	$2.40
72	44	Grok 4.1 Fast Reasoning	1020	±7	3.7K	3.0%	1.5%	58 tps	7.3s	2M	$0.20	$0.50
73	56	DeepSeek V3.2 Thinking	1021	±13	1.9K	1.8%	9.0%	30 tps	2.6s	131K	$0.28	$0.42
74	68	Qwen Plus (Aug'24)	1023	±9	2.4K	2.9%	1.4%	53 tps	1.3s	30K	$0.40	$1.20
75	52	Grok 4 Fast Non-Reasoning	1030	±17	1.5K	4.1%	1.5%	93 tps	0.6s	2M	$0.27	$0.67
76	79	MiniMax M2.5 Lightning	1031	±20	820	1.8%	1.5%	51 tps	2.0s	205K	$0.60	$2.40
77	106	Grok 3	1034	±8	2.8K	2.8%	1.5%	53 tps	0.6s	1M	$3.67	$18.33
78	29	Qwen3 VL 235B A22B Instruct	1036	±16	1.3K	4.2%	3.1%	75 tps	1.9s	129K	$0.37	$1.81
79	86	DeepSeek V3.1 Chat	1038	±13	975	2.5%	2.8%	21 tps	1.6s	131K	$0.38	$1.00
80	95	DeepSeek V3.2 Exp Thinking	1038	±17	655	3.7%	7.2%	26 tps	3.0s	131K	$0.28	$0.42

2of4

View All (154 models)