Leaderboard | Text

Models

Choose model family

Claude by Anthropic

Mistral by Mistral AI

Topics

Choose topic

All topics Facts and Information Creative Writing and Ideation Logic and Problem-Solving Task Completion Coding

Choose language

All languages English Chinese Arabic Spanish Indonesian Japanese

More filters

Show inactive models

Hide models that are no longer actively available on Yupp.

Turns

Filter model performance by the number of turns in a conversation.

All Single turn Multiple turns

Open license models

Filter the leaderboard to only show models that have an open license.

All selected Open license Proprietary license

1090

Grok 4 Fast Reasoning

1083

Gemini 3.1 Flash Lite Preview Thinking

1076

Grok 4.1 Fast Reasoning

1075

GPT-5 Mini

1075

GPT-5.1 Instant

1074

DeepSeek V3 0324 Turbo

1066

Gemini 2.5 Flash Lite Preview 0925

1066

Claude Sonnet 4

1063

Qwen3 Max Instruct Preview

1063

DeepSeek V3.2

1057

Claude Haiku 4.5

1054

Grok 4 Fast Non-Reasoning

1046

GPT-4o

1044

MiniMax M2.1

1044

Gemini 2.5 Flash Lite Thinking Preview 0925

Last updated about 1 month ago

Rank	Overall	Name	VIBE Score	Confidence Interval	Votes	Downvote %	Abort %	Speed	Latency	Context	Cost (Input)	Cost (Output)
41	48	Grok 4 Fast Reasoning	1090	±11	2.1K	3.1%	2.1%	102 tps	3.1s	2M	$0.30	$0.75
42	56	Gemini 3.1 Flash Lite Preview Thinking	1083	±32	485	3.0%	1.7%	75 tps	4.7s	1M	$0.25	$1.50
43	44	Grok 4.1 Fast Reasoning	1076	±10	2.6K	4.2%	1.5%	58 tps	7.3s	2M	$0.20	$0.50
44	71	GPT-5 Mini	1075	±9	2.1K	4.3%	2.6%	66 tps	14.2s	400K	$0.25	$2.00
45	62	GPT-5.1 Instant	1075	±9	2.2K	2.6%	1.3%	50 tps	1.9s	400K	$1.25	$10.00
46	93	DeepSeek V3 0324 Turbo	1074	±14	2.1K	1.9%	6.3%	12 tps	2.4s	164K	$0.73	$1.79
47	71	Gemini 2.5 Flash Lite Preview 0925	1066	±11	2.2K	2.8%	1.2%	209 tps	0.7s	1M	$0.25	$0.35
48	86	Claude Sonnet 4	1066	±8	5.3K	2.5%	1.8%	49 tps	1.3s	200K	$3.00	$15.00
49	42	Qwen3 Max Instruct Preview	1063	±7	2.7K	1.5%	1.1%	31 tps	1.7s	256K	$1.43	$6.61
50	40	DeepSeek V3.2	1063	±16	1.1K	2.5%	1.4%	83 tps	5.1s	131K	$0.43	$1.09
51	52	Claude Haiku 4.5	1057	±6	3.4K	3.4%	1.1%	100 tps	0.9s	200K	$1.00	$5.00
52	52	Grok 4 Fast Non-Reasoning	1054	±8	1.6K	2.5%	1.5%	93 tps	0.6s	2M	$0.27	$0.67
53	81	GPT-4o	1046	±15	1.4K	2.5%	1.0%	49 tps	2.4s	128K	$3.71	$12.57
54	60	MiniMax M2.1	1044	±12	1.7K	2.8%	2.1%	66 tps	2.6s	205K	$0.30	$1.20
55	95	Gemini 2.5 Flash Lite Thinking Preview 0925	1044	±9	1.7K	3.5%	1.5%	152 tps	3.0s	1M	$0.10	$0.40
56	62	MiniMax M2	1043	±9	1.8K	4.2%	2.2%	39 tps	2.3s	205K	$0.21	$0.85
57	65	GLM 4.6	1041	±11	1.6K	2.9%	5.4%	39 tps	1.5s	200K	$0.42	$1.66
58	44	DeepSeek V3.1 Terminus Chat	1037	±9	1.3K	2.2%	3.4%	27 tps	1.5s	131K	$0.86	$1.80
59	129	Qwen3 Max Thinking	1029	±31	600	2.4%	13.5%	32 tps	2.3s	256K	$1.20	$6.00
60	93	Qwen Max	1021	±14	1.8K	2.7%	1.5%	49 tps	1.5s	33K	$1.60	$6.40
61	79	Qwen3 Max Thinking Preview	1020	±10	1.2K	2.4%	3.1%	40 tps	2.1s	256K	$1.20	$6.00
62	86	DeepSeek V3.1 Chat	1018	±12	1.1K	3.1%	2.8%	21 tps	1.6s	131K	$0.38	$1.00
63	133	Kimi K2 0905	1014	±13	810	2.4%	4.0%	30 tps	1.4s	262K	$0.63	$2.39
64	101	Gemini 2.5 Flash Lite	1014	±9	5.3K	3.9%	1.3%	210 tps	0.7s	1M	$0.10	$0.40
65	86	Amazon Nova 2 Lite	1013	±18	815	4.7%	1.0%	137 tps	0.6s	300K	$0.35	$2.95
66	106	Grok 3	1003	±9	2K	2.6%	1.5%	53 tps	0.6s	1M	$3.67	$18.33
67	170	Kimi K2 0711	1002	±15	720	3.4%	1.6%	29 tps	1.3s	131K	$0.72	$2.60
68	113	Gemini 2.5 Flash Lite Thinking	996	±11	2.5K	3.7%	1.0%	118 tps	4.4s	1M	$0.03	$0.13
69	124	Kimi K2 0905 Turbo	991	±12	1.6K	1.8%	0.7%	373 tps	0.5s	262K	$1.70	$6.50
70	106	DeepSeek V3 0324	990	±11	2.1K	3.0%	5.8%	12 tps	2.7s	164K	$0.38	$0.93
71	113	Mistral Medium	989	±14	1.1K	2.7%	1.8%	48 tps	0.6s	33K	$1.48	$4.55
72	95	Kimi K2 Thinking	985	±21	620	3.1%	4.2%	61 tps	5.9s	262K	$0.24	$1.03
73	118	GPT-4.1 mini	976	±13	2.7K	1.8%	1.1%	67 tps	0.9s	1M	$0.34	$1.60
74	113	GLM 4.5	969	±19	1.3K	3.5%	3.7%	46 tps	1.4s	131K	$0.43	$1.63
75	84	GPT-5 Mini Minimal	968	±17	795	3.6%	1.2%	63 tps	1.4s	400K	$0.25	$2.00
76	148	OpenAI o3	960	±16	600	2.4%	0.9%	85 tps	6.8s	128K	$7.33	$29.33
77	129	DeepSeek V3.1 Thinking	958	±14	1K	2.4%	7.1%	18 tps	1.8s	131K	$0.23	$0.75
78	56	DeepSeek V3.1 Turbo	957	±14	820	4.1%	0.9%	173 tps	1.3s	164K	$2.00	$3.75
79	148	OpenAI o4-mini-high	950	±12	1.5K	3.8%	1.9%	117 tps	15.9s	200K	$1.10	$4.40
80	68	GLM 4.7	949	±13	1.6K	1.8%	5.8%	40 tps	1.5s	200K	$0.77	$1.73

2of3

View All (107 models)