排名
| 大模子
| 机构
| 输出价格(元/M tok)
| 总分
|
1
| DeepSeek-R1
| 深度求索
| 16.0
| 87.34
|
2
| qwq-32b-preview
| 阿里巴巴
| 7.0
| 77.85
|
3
| DeepSeek-R1-Distill-Qwen-32B
| 深度求索
| 1.3
| 77.49
|
4
| qwen2.5-72b-instruct
| 阿里巴巴
| 12.0
| 76.89
|
5
| qwen2.5-32b-instruct
| 阿里巴巴
| 7.0
| 75.85
|
6
| deepseek-chat-v3
| 深度求索
| 8.0
| 75.03
|
7
| qwen2.5-14b-instruct
| 阿里巴巴
| 6.0
| 72.77
|
8
| DeepSeek-R1-Distill-Qwen-14B
| 深度求索
| 0.7
| 72.77
|
9
| DeepSeek-R1-Distill-Llama-70B
| 深度求索
| 4.1
| 71.37
|
10
| internlm2_5-20b-chat
| 上海人工智能实验室
| 1.0
| 70.20
|
11
| Meta-Llama-3.1-405B-Instruct
| Meta
| 21.0
| 69.55
|
12
| qwen2.5-7b-instruct
| 阿里巴巴
| 2.0
| 69.11
|
13
| internlm2_5-7b-chat
| 上海人工智能实验室
| 0.4
| 68.05
|
14
| Llama-3.3-70B-Instruct
| meta
| 4.1
| 67.86
|
15
| glm-4-9b-chat
| 智谱AI
| 0.6
| 67.12
|
16
| qwen2.5-math-72b-instruct
| 阿里巴巴
| 12.0
| 67.03
|
17
| Llama-3.3-70B-Instruct-fp8
| meta
| 2.2
| 66.86
|
18
| Llama-3.1-Nemotron-70B-Instruct-fp8
| nvidia
| 2.2
| 66.67
|
19
| Yi-1.5-34B-Chat
| 零一万物
| 1.3
| 66.64
|
20
| Hermes-3-Llama-3.1-405B
| NousResearch
| 5.8
| 65.65
|
21
| phi-4
| 微软
| 1.0
| 62.92
|
22
| qwen2.5-3b-instruct
| 阿里巴巴
| 0.0
| 58.64
|
23
| Yi-1.5-9B-Chat
| 零一万物
| 0.4
| 58.56
|
24
| gemma-2-27b-it
| Google
| 1.3
| 57.89
|
25
| gemma-2-9b-it
| Google
| 0.6
| 55.41
|
26
| Llama-3.1-8B-Instruct
| Meta
| 0.4
| 53.03
|
27
| DeepSeek-R1-Distill-Qwen-7B
| 深度求索
| 0.4
| 52.42
|
28
| DeepSeek-R1-Distill-Llama-8B
| 深度求索
| 0.4
| 52.35
|
29
| Mistral-Nemo-Instruct-2407
| Mistral
| 0.6
| 52.24
|
30
| Meta-Llama-3.1-8B-Instruct-fp8
| meta
| 0.4
| 51.39
|
31
| qwen2.5-1.5b-instruct
| 阿里巴巴
| 0.0
| 49.03
|
32
| Llama-3.2-3B-Instruct
| meta
| 0.2
| 46.76
|
33
| Mistral-7B-Instruct-v0.3
| Mistral
| 0.4
| 42.19
|
34
| DeepSeek-R1-Distill-Qwen-1.5B
| 深度求索
| 0.1
| 40.43
|
35
| qwen2.5-0.5b-instruct
| 阿里巴巴
| 0.0
| 37.89
|
36
| Llama-3.2-1B-Instruct
| meta
| 0.2
| 36.59
|
各细分领域完备评测结果详见:https://github.com/jeinlee1991/chinese-llm-benchmark