关于 c10::Half 类型和float不匹配

科技颠覆者 · 2025-3-4 05:41:39

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

您需要登录才可以下载或查看，没有账号？立即注册

x

相关错误

# error-1 ; (all-no-half) self-attn RuntimeError: expected m1 and m2 to have the same dtype, but got: float != c10::Half
# error-2 : (embed-half) self-attn RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
# error-3 : model(half) embed(no-half) conv RuntimeError: Input type (float) and bias type (c10::Half) should be the same
# error-4 : model(half) embed(half) conv RuntimeError: Input type (float) and bias type (c10::Half) should be the same
# cuda + half all : RuntimeError: Input type (float) and bias type (c10::Half) should be the same

复制代码

我在跑大模子推理的时间，碰到了上面的错误。
起首有一个题目需要考虑：

我希望模子以半精度的方式推理，所以在from_pretrained的时间，是以float16的方式加载的

self.llama_model = LlamaForCausalLM.from_pretrained(
args.llama_model, torch_dtype=torch.float16, )

复制代码

我希望模子可以在gpu上面推理，但是我默认了模子会自动加载到gpu上面。。。

办理方法

检查llama模子是不是正确加载到gpu，一半出现 c10:Half 这个类型，模子很大概率是加载到CPU上面去推理的，所以只要修改到gpu上就不会报错了
模子推理的时间，记得加上autocase

with torch.cuda.amp.autocast():
....

复制代码

末了代码

由于是修改R2genGPT的，所以代码如下：

class Generator:
def __init__(self):
pass
def generate(self, input_conv, img_list):
raise NotImplementedError
class R2genGPT_shallow(Generator):
def __init__(self):
super().__init__()
args = parser.parse_args()
# args.precision = "fp16"
args.delta_file = "../checkpoints/R2genGPT/shallow_checkpoint_step14102.pth"
args.vision_model = "microsoft/swin-base-patch4-window7-224"
args.llama_model = "../checkpoints/Llama-2-7b-chat-hf"
self.filed_parser = FieldParser(args)
self.model = R2GenGPT(args)
self.model.eval()
self.model.cuda()
print("device : ", self.model.device)
def adapt(self, query):
query = query.replace("<image>", " ")
return query
def get_image_tensor(self, img_file):
with Image.open(img_file) as pil:
array = np.array(pil, dtype=np.uint8)
if array.shape[-1] != 3 or len(array.shape) != 3:
array = np.array(pil.convert("RGB"), dtype=np.uint8)
image = self.filed_parser._parse_image(array)
image = image.to(self.model.device)
return image
def generate(self, query, img_list):
self.model.llama_tokenizer.padding_side = "right"
images = []
for img_file in img_list:
image = self.get_image_tensor(img_file)
images.append(image.unsqueeze(0))
self.model.prompt = self.adapt(query)
img_embeds, atts_img = self.model.encode_img(images)
img_embeds = self.model.layer_norm(img_embeds)
img_embeds, atts_img = self.model.prompt_wrap(img_embeds, atts_img)
batch_size = img_embeds.shape[0]
bos = torch.ones([batch_size, 1],
dtype=atts_img.dtype,
device=atts_img.device) * self.model.llama_tokenizer.bos_token_id
bos_embeds = self.model.embed_tokens(bos)
atts_bos = atts_img[:, :1]
inputs_embeds = torch.cat([bos_embeds, img_embeds], dim=1)
attention_mask = torch.cat([atts_bos, atts_img], dim=1)
with torch.inference_mode():
with torch.cuda.amp.autocast():
outputs = self.model.llama_model.generate(
inputs_embeds=inputs_embeds,
num_beams=self.model.hparams.beam_size,
do_sample=self.model.hparams.do_sample,
min_new_tokens=self.model.hparams.min_new_tokens,
max_new_tokens=self.model.hparams.max_new_tokens,
repetition_penalty=self.model.hparams.repetition_penalty,
length_penalty=self.model.hparams.length_penalty,
temperature=self.model.hparams.temperature,
)
answer = self.model.decode(outputs[0])
return answer

复制代码

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

		自动登录	找回密码
密码			立即注册

关于 c10::Half 类型和float不匹配

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

0 个回复

快速回复

楼主热帖

标签云

浏览过的版块