开端运行下报错了
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 270.00 MiB. GPU 0 has a total capacity of 23.53 GiB of which 158.56 MiB is free. Including non-PyTorch memory, this process has 23.36 GiB memory in use. Of the allocated memory 22.99 GiB is allocated by PyTorch, and 1.24 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
呆板有两张 NVIDIA 4090 显卡,每张显存 24GB,但当前模子加载时超出了单张 GPU 的显存容量(23.53GiB 已险些用尽)。由于 Qwen/QwQ-32B 是一个非常大的模子(未经量化可能需要 60GB+ 显存),单张 24GB 显卡无法完整加载,因此需要利用多卡并行加载模子。
以下是实现多卡运行的办理方案,结合你的硬件(2 张 4090,48GB 总显存):