GPT模型 | 参数目 | Layers | Heads | hidden size | LR | Batch of Tokens |
1.3B Dense | 1.3B | 24 | 32 | 2048 | 2e-4 | 1M |
2.7B Dense | 2.7B | 32 | 32 | 2560 | 1.6e-4 | 1M |
3.6B Dense | 3.6B | 30 | 32 | 3072 | 1.6e-4 | 1M |
0.35B+MoE-64 | 6.7B | 24 | 16 | 1024 | 3e-4 | 0.5M |
1.3B+MoE-32 | 13B | 24 | 32 | 2048 | 2e-4 | 1M |
1.3B+MoE-64 | 27B | 24 | 32 | 2048 | 1.6e-4 | 1M |
2.7B+MoE-64 | 56B | 32 | 32 | 2560 | 1.6e-4 | 1M |
3.6B+MoE-64 | 75B | 30 | 32 | 3072 | 1.6e-4 | 1M |
模型 | latency (ms) | memory (MB) | num of gpus |
1.3B Dense | 399.66 | 9476 | 1 |
2.7B Dense | 753.37 | 17340 | 1 |
3.6B Dense | 777.54 | 22558 | 1 |
0.35B+MoE64 | 356.22 | 15772 | 1 |
1.3B+MoE32 | 581.34 | 33294 | 1 |
1.3B+MoE64 | 586.18 | 57880 |
欢迎光临 ToB企服应用市场:ToB评测及商务社交产业平台 (https://dis.qidao123.com/) | Powered by Discuz! X3.4 |