Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Topics

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1130

Grok 4.1 Fast Non-Reasoning

1127

Qwen3 30B A3B Instruct 2507

1126

OpenAI o3-pro

1118

Qwen3 235B A22B Instruct 2507

1115

Qwen Plus (Aug'24)

1115

Claude Haiku 4.5 (Extended Thinking)

1114

Qwen3 VL 235B A22B Instruct

1114

Claude Opus 4 (Thinking)

1112

Qwen3.5 397B A17B

1110

GPT-5 (High)

1101

Kimi K2.5 Instant

1098

Gemini 2.5 Flash

1093

Grok 4

1090

Grok 4 Fast Reasoning

1087

Claude Opus 4

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
41	26	Grok 4.1 Fast Non-Reasoning	1130	±15	2K	4.3%	0.9%	101 tps	0.5s	2M	$0.20	$0.50
42	33	Qwen3 30B A3B Instruct 2507	1127	±9	2.5K	2.9%	1.2%	55 tps	1.3s	131K	$0.13	$0.72
43	81	OpenAI o3-pro	1126	±10	2.3K	3.1%	5.2%	22 tps	70.8s	200K	$20.00	$80.00
44	40	Qwen3 235B A22B Instruct 2507	1118	±11	2.5K	2.1%	6.8%	13 tps	1.9s	262K	$0.13	$0.52
45	68	Qwen Plus (Aug'24)	1115	±8	1.9K	2.6%	1.4%	53 tps	1.3s	30K	$0.40	$1.20
46	26	Claude Haiku 4.5 (Extended Thinking)	1115	±12	2.2K	2.7%	1.4%	115 tps	0.7s	200K	$1.00	$5.00
47	29	Qwen3 VL 235B A22B Instruct	1114	±8	1.3K	2.5%	3.1%	75 tps	1.9s	129K	$0.37	$1.81
48	21	Claude Opus 4 (Thinking)	1114	±8	770	3.1%	<0.1%	28 tps	1.3s	200K	$15.00	$75.00
49	71	Qwen3.5 397B A17B	1112	±24	580	2.5%	4.3%	57 tps	1.4s	256K	$0.52	$3.00
50	26	GPT-5 (High)	1110	±7	4.3K	3.1%	4.5%	81 tps	35.9s	400K	$1.25	$10.00
51	37	Kimi K2.5 Instant	1101	±28	495	1.0%	2.9%	32 tps	3.0s	262K	$0.50	$3.00
52	95	Gemini 2.5 Flash	1098	±9	4.8K	2.7%	1.3%	2 tps	3.7s	1M	$0.30	$2.50
53	68	Grok 4	1093	±5	5.7K	4.0%	3.9%	29 tps	11.1s	256K	$3.00	$15.00
54	48	Grok 4 Fast Reasoning	1090	±11	2.1K	3.1%	2.1%	102 tps	3.1s	2M	$0.30	$0.75
55	21	Claude Opus 4	1087	±9	3.2K	2.9%	<0.1%	25 tps	1.5s	200K	$15.00	$75.00
56	56	Gemini 3.1 Flash Lite Preview Thinking	1083	±32	485	3.0%	1.7%	75 tps	4.7s	1M	$0.25	$1.50
57	33	Kimi K2.5	1083	±16	1.7K	3.2%	6.5%	33 tps	1.7s	262K	$0.34	$2.57
58	48	gpt-oss-120b	1083	±7	3.5K	3.0%	0.7%	213 tps	0.5s	131K	$0.11	$0.50
59	56	Claude Opus 4.1 (Thinking)	1083	±6	2K	4.1%	<0.1%	20 tps	3.9s	200K	$15.00	$75.00
60	44	Grok 4.1 Fast Reasoning	1076	±10	2.6K	4.2%	1.5%	58 tps	7.3s	2M	$0.20	$0.50
61	71	GPT-5 Mini	1075	±9	2.1K	4.3%	2.6%	66 tps	14.2s	400K	$0.25	$2.00
62	62	GPT-5.1 Instant	1075	±9	2.2K	2.6%	1.3%	50 tps	1.9s	400K	$1.25	$10.00
63	93	DeepSeek V3 0324 Turbo	1074	±14	2.1K	1.9%	6.3%	12 tps	2.4s	164K	$0.73	$1.79
64	71	Gemini 2.5 Flash Lite Preview 0925	1066	±11	2.2K	2.8%	1.2%	209 tps	0.7s	1M	$0.25	$0.35
65	86	Claude Sonnet 4	1066	±8	5.3K	2.5%	1.8%	49 tps	1.3s	200K	$3.00	$15.00
66	42	Qwen3 Max Instruct Preview	1063	±7	2.7K	1.5%	1.1%	31 tps	1.7s	256K	$1.43	$6.61
67	40	DeepSeek V3.2	1063	±16	1.1K	2.5%	1.4%	83 tps	5.1s	131K	$0.43	$1.09
68	52	Claude Haiku 4.5	1057	±6	3.4K	3.4%	1.1%	100 tps	0.9s	200K	$1.00	$5.00
69	52	Grok 4 Fast Non-Reasoning	1054	±8	1.6K	2.5%	1.5%	93 tps	0.6s	2M	$0.27	$0.67
70	81	GPT-4o	1046	±15	1.4K	2.5%	1.0%	49 tps	2.4s	128K	$3.71	$12.57
71	48	OpenAI o1-mini	1045	±8	1.8K	3.5%	<0.1%	118 tps	N/A	128K	$1.13	$4.51
72	60	MiniMax M2.1	1044	±12	1.7K	2.8%	2.1%	66 tps	2.6s	205K	$0.30	$1.20
73	95	Gemini 2.5 Flash Lite Thinking Preview 0925	1044	±9	1.7K	3.5%	1.5%	152 tps	3.0s	1M	$0.10	$0.40
74	62	MiniMax M2	1043	±9	1.8K	4.2%	2.2%	39 tps	2.3s	205K	$0.21	$0.85
75	65	GLM 4.6	1041	±11	1.6K	2.9%	5.4%	39 tps	1.5s	200K	$0.42	$1.66
76	44	DeepSeek V3.1 Terminus Chat	1037	±9	1.3K	2.2%	3.4%	27 tps	1.5s	131K	$0.86	$1.80
77	56	DeepSeek V3.2 Thinking	1033	±15	1.7K	2.0%	9.0%	30 tps	2.6s	131K	$0.28	$0.42
78	77	Claude Opus 4.1	1032	±11	2K	2.7%	3.0%	17 tps	3.7s	200K	$15.00	$75.00
79	129	Qwen3 Max Thinking	1029	±31	600	2.4%	13.5%	32 tps	2.3s	256K	$1.20	$6.00
80	65	Mistral Large 3	1026	±22	1.1K	4.1%	2.1%	51 tps	1.0s	256K	$0.50	$1.50

2of4

View All (159 models)