ToB企服应用市场:ToB评测及商务社交产业平台
标题:
关于 c10::Half 类型和float不匹配
[打印本页]
作者:
科技颠覆者
时间:
11 小时前
标题:
关于 c10::Half 类型和float不匹配
相关错误
# error-1 ; (all-no-half) self-attn RuntimeError: expected m1 and m2 to have the same dtype, but got: float != c10::Half
# error-2 : (embed-half) self-attn RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
# error-3 : model(half) embed(no-half) conv RuntimeError: Input type (float) and bias type (c10::Half) should be the same
# error-4 : model(half) embed(half) conv RuntimeError: Input type (float) and bias type (c10::Half) should be the same
# cuda + half all : RuntimeError: Input type (float) and bias type (c10::Half) should be the same
复制代码
我在跑大模子推理的时间,碰到了上面的错误。
起首有一个题目需要考虑:
我希望模子以半精度的方式推理,所以在from_pretrained的时间,是以float16的方式加载的
self.llama_model = LlamaForCausalLM.from_pretrained(
args.llama_model, torch_dtype=torch.float16, )
复制代码
我希望模子可以在gpu上面推理,但是我默认了模子会自动加载到gpu上面。。。
办理方法
检查llama模子是不是正确加载到gpu,一半出现 c10:Half 这个类型,模子很大概率是加载到CPU上面去推理的,所以只要修改到gpu上就不会报错了
模子推理的时间,记得加上autocase
with torch.cuda.amp.autocast():
....
复制代码
末了代码
由于是修改R2genGPT的,所以代码如下:
class Generator:
def __init__(self):
pass
def generate(self, input_conv, img_list):
raise NotImplementedError
class R2genGPT_shallow(Generator):
def __init__(self):
super().__init__()
args = parser.parse_args()
# args.precision = "fp16"
args.delta_file = "../checkpoints/R2genGPT/shallow_checkpoint_step14102.pth"
args.vision_model = "microsoft/swin-base-patch4-window7-224"
args.llama_model = "../checkpoints/Llama-2-7b-chat-hf"
self.filed_parser = FieldParser(args)
self.model = R2GenGPT(args)
self.model.eval()
self.model.cuda()
print("device : ", self.model.device)
def adapt(self, query):
query = query.replace("<image>", " ")
return query
def get_image_tensor(self, img_file):
with Image.open(img_file) as pil:
array = np.array(pil, dtype=np.uint8)
if array.shape[-1] != 3 or len(array.shape) != 3:
array = np.array(pil.convert("RGB"), dtype=np.uint8)
image = self.filed_parser._parse_image(array)
image = image.to(self.model.device)
return image
def generate(self, query, img_list):
self.model.llama_tokenizer.padding_side = "right"
images = []
for img_file in img_list:
image = self.get_image_tensor(img_file)
images.append(image.unsqueeze(0))
self.model.prompt = self.adapt(query)
img_embeds, atts_img = self.model.encode_img(images)
img_embeds = self.model.layer_norm(img_embeds)
img_embeds, atts_img = self.model.prompt_wrap(img_embeds, atts_img)
batch_size = img_embeds.shape[0]
bos = torch.ones([batch_size, 1],
dtype=atts_img.dtype,
device=atts_img.device) * self.model.llama_tokenizer.bos_token_id
bos_embeds = self.model.embed_tokens(bos)
atts_bos = atts_img[:, :1]
inputs_embeds = torch.cat([bos_embeds, img_embeds], dim=1)
attention_mask = torch.cat([atts_bos, atts_img], dim=1)
with torch.inference_mode():
with torch.cuda.amp.autocast():
outputs = self.model.llama_model.generate(
inputs_embeds=inputs_embeds,
num_beams=self.model.hparams.beam_size,
do_sample=self.model.hparams.do_sample,
min_new_tokens=self.model.hparams.min_new_tokens,
max_new_tokens=self.model.hparams.max_new_tokens,
repetition_penalty=self.model.hparams.repetition_penalty,
length_penalty=self.model.hparams.length_penalty,
temperature=self.model.hparams.temperature,
)
answer = self.model.decode(outputs[0])
return answer
复制代码
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。
欢迎光临 ToB企服应用市场:ToB评测及商务社交产业平台 (https://dis.qidao123.com/)
Powered by Discuz! X3.4