部门 | 参数 | finetune_llama3_1_8b.yaml | finetune_llama3_1_70b.yaml |
训练器 | model_name | llama3_1_8b | llama3_1_70b |
优化器 | betas | [0.9, 0.95] | [0.9, 0.999] |
并行设置 | data_parallel | 8 | 1 |
model_parallel | 1 | 8 | |
pipeline_stage | 1 | 8 | |
use_seq_parallel | FALSE | TRUE | |
micro_batch_num 微批量数量 | 1 | 256 | |
vocab_emb_dp 词汇嵌入数据并行 | TRUE | FALSE | |
重计算设置 | recompute | TRUE | FALSE |
select_recompute | FALSE | [10,8,6,4,2,0,0,0] | |
select_comm_recompute | 未界说 | [10,8,6,4,2,0,0,0] | |
上下文 | max_device_memory | 58GB | 52.5GB |
mempool_block_size | 未界说 | 52.5GB | |
模型设置 | hidden_size | 4096 | 8192 |
num_layers | 32 | 80 | |
num_heads | 32 | 64 | |
ffn_dim_multiplier | 未界说 | 1.3 | |
multiple_of | 未界说 | 256 | |
param_init_type 参数初始化范例 | float16 | float32 | |
fine_grain_interleave 细粒度交织 | 1 | 2 |
设置部门 | 参数 | predict_llama3_1_8b.yaml | predict_llama3_1_70b.yaml |
trainer(训练器) | model_name(模型名称) | llama3_1_8b | llama3_1_70b |
use_parallel | False(否) | True(是) | |
parallel_config | model_parallel(模型并行) | 1 | 4 |
model_config | seq_length(序列长度) | 512 | 8192 |
hidden_size(隐蔽层巨细) | 4096 | 8192 | |
num_layers(层数) | 32 | 80 | |
num_heads(头数) | 32 | 64 | |
ffn_dim_multiplier(FFN维度倍数) | 未设置 | 1.3 | |
multiple_of(倍数) | 未设置 | 256 | |
is_dynamic(是否动态) | True(是) | 未设置(默认值为否) | |
fine_grain_interleave(细粒度交织) | 1 | 未设置 |
欢迎光临 qidao123.com技术社区-IT企服评测·应用市场 (https://dis.qidao123.com/) | Powered by Discuz! X3.4 |