相关错误
- # error-1 ; (all-no-half) self-attn RuntimeError: expected m1 and m2 to have the same dtype, but got: float != c10::Half
- # error-2 : (embed-half) self-attn RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
- # error-3 : model(half) embed(no-half) conv RuntimeError: Input type (float) and bias type (c10::Half) should be the same
- # error-4 : model(half) embed(half) conv RuntimeError: Input type (float) and bias type (c10::Half) should be the same
- # cuda + half all : RuntimeError: Input type (float) and bias type (c10::Half) should be the same
复制代码 我在跑大模子推理的时间,碰到了上面的错误。
起首有一个题目需要考虑:
- 我希望模子以半精度的方式推理,所以在from_pretrained的时间,是以float16的方式加载的
- self.llama_model = LlamaForCausalLM.from_pretrained(
- args.llama_model, torch_dtype=torch.float16, )
复制代码
- 我希望模子可以在gpu上面推理,但是我默认了模子会自动加载到gpu上面。。。
办理方法
- 检查llama模子是不是正确加载到gpu,一半出现 c10:Half 这个类型,模子很大概率是加载到CPU上面去推理的,所以只要修改到gpu上就不会报错了
- 模子推理的时间,记得加上autocase
- with torch.cuda.amp.autocast():
- ....
复制代码 末了代码
由于是修改R2genGPT的,所以代码如下:
- class Generator:
- def __init__(self):
- pass
- def generate(self, input_conv, img_list):
- raise NotImplementedError
-
-
- class R2genGPT_shallow(Generator):
- def __init__(self):
- super().__init__()
- args = parser.parse_args()
- # args.precision = "fp16"
- args.delta_file = "../checkpoints/R2genGPT/shallow_checkpoint_step14102.pth"
- args.vision_model = "microsoft/swin-base-patch4-window7-224"
- args.llama_model = "../checkpoints/Llama-2-7b-chat-hf"
- self.filed_parser = FieldParser(args)
- self.model = R2GenGPT(args)
- self.model.eval()
- self.model.cuda()
- print("device : ", self.model.device)
- def adapt(self, query):
- query = query.replace("<image>", " ")
- return query
- def get_image_tensor(self, img_file):
- with Image.open(img_file) as pil:
- array = np.array(pil, dtype=np.uint8)
- if array.shape[-1] != 3 or len(array.shape) != 3:
- array = np.array(pil.convert("RGB"), dtype=np.uint8)
- image = self.filed_parser._parse_image(array)
- image = image.to(self.model.device)
- return image
-
- def generate(self, query, img_list):
- self.model.llama_tokenizer.padding_side = "right"
- images = []
- for img_file in img_list:
- image = self.get_image_tensor(img_file)
- images.append(image.unsqueeze(0))
-
- self.model.prompt = self.adapt(query)
- img_embeds, atts_img = self.model.encode_img(images)
- img_embeds = self.model.layer_norm(img_embeds)
- img_embeds, atts_img = self.model.prompt_wrap(img_embeds, atts_img)
- batch_size = img_embeds.shape[0]
- bos = torch.ones([batch_size, 1],
- dtype=atts_img.dtype,
- device=atts_img.device) * self.model.llama_tokenizer.bos_token_id
- bos_embeds = self.model.embed_tokens(bos)
- atts_bos = atts_img[:, :1]
- inputs_embeds = torch.cat([bos_embeds, img_embeds], dim=1)
- attention_mask = torch.cat([atts_bos, atts_img], dim=1)
- with torch.inference_mode():
- with torch.cuda.amp.autocast():
- outputs = self.model.llama_model.generate(
- inputs_embeds=inputs_embeds,
- num_beams=self.model.hparams.beam_size,
- do_sample=self.model.hparams.do_sample,
- min_new_tokens=self.model.hparams.min_new_tokens,
- max_new_tokens=self.model.hparams.max_new_tokens,
- repetition_penalty=self.model.hparams.repetition_penalty,
- length_penalty=self.model.hparams.length_penalty,
- temperature=self.model.hparams.temperature,
- )
-
- answer = self.model.decode(outputs[0])
- return answer
-
复制代码 免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |