人工智能-Datawhale X 魔搭 AI夏令营-AIGC文生图方向Task3条记

干翻全岛蛙蛙 发表于 2025-1-9 20:38:50

Datawhale X 魔搭 AI夏令营-AIGC文生图方向Task3条记

#目录#

一、ComfyUI安装及实践

                （一）什么是ComfyUI

                （二）ComfyUI焦点模块

                （三）ComfyUI图片天生流程

                （四）ComfyUI的优势

                （五）20分钟速通安装ComfyUI

                （六）浅尝ComfyUI工作流

二、LoRA安装及实践

                （一）什么是Lora微调

                （二）LoRA微调的原理

                （三）LoRA微调的优势

                （四）LoRA微调代码分析

                （五）UNet、VAE和文本编码器的协作关系

三、自学资源总计

一、ComfyUI安装及实践

（一）什么是ComfyUI

GUI 是 "Graphical User Interface"（图形用户界面）的缩写。简单来说，GUI 就是你在电脑屏幕上看到的那种有图标、按钮和菜单的交互方式。
ComfyUI 是GUI的一种，是基于节点工作的用户界面，重要用于操纵图像的天生技术，ComfyUI 的特殊之处在于它采用了一种模块化的计划，把图像天生的过程分解成了许多小的步骤，每个步骤都是一个节点。这些节点可以毗连起来形成一个工作流程，这样用户就可以根据需要定制本身的图像天生过程。
（二）ComfyUI焦点模块

焦点模块由模型加载器、提示词管理器、采样器、解码器。
https://i-blog.csdnimg.cn/direct/2f6d4fd0b11a4efc8bcc8e75590a1773.png模型加载器：Load Checkpoint用于加载基础的模型文件，包含了Model、CLIP、VAE三部门
https://i-blog.csdnimg.cn/direct/801be929ed6746b8ad0bfaa8d72b3f93.pngCLIP模块将文本类型的输入变为模型可以明确的latent space embedding作为模型的输入
https://i-blog.csdnimg.cn/direct/e0d60c032dcb4649ae2c5e10e8e3cbbc.png解码器：VAE模块的作用是将Latent space中的embedding解码为像素级别的图像
https://i-blog.csdnimg.cn/direct/9edfda9549264fc4a10315a85e2fb275.png
采样器：用于控制模型天生图像，不同的采样取值会影响终极输出图像的质量和多样性。采样器可以调节天生过程的速率和质量之间的均衡。
Stable Diffusion的基本原理
通过降噪的方式（如完全的噪声图像），将一个原本的噪声信号变为无噪声的信号（如人可以明确的图像）。其中的降噪过程涉及到多次的采样。采样的系数在KSampler中设置：
1）seed：控制噪声产生的随机种子
2）control_after_generate：控制seed在每次天生后的变化
3）steps：降噪的迭代步数，越多则信号越精准，相对的天生时间也越长
4）cfg：classifier free guidance决定了prompt对于终极天生图像的影响有多大。更高的值代表更多地展现prompt中的形貌。
5）denoise: 多少内容会被噪声覆盖 sampler_name、scheduler：降噪参数。
（三）ComfyUI图片天生流程

https://i-blog.csdnimg.cn/direct/2db434d84c3d4bb885a4e90f3875e80f.png
（四）ComfyUI的优势

1）模块化和机动性：ComfyUI 提供了一个模块化的体系，用户可以通过拖放不同的模块来构建复杂的工作流程。这种机动性答应用户根据本身的需求自由组合和调解模型、输入、输出、和其他处置惩罚步骤。
2）可视化界面：ComfyUI 提供了直观的图形界面，使得用户能够更清晰地明确和操纵复杂的 AI 模型和数据流。这对没有编程背景的用户特殊有资助，使他们能够轻松构建和管理工作流程。
3）多模型支持：ComfyUI 支持多个不同的天生模型，用户可以在同一平台上集成和切换使用不同的模型，从而实现更广泛的应用场景。
4）调试和优化：通过其可视化界面，ComfyUI 使得调试天生过程变得更简单。用户可以轻松地追踪数据流，识别并解决题目，从而优化天生结果。
5）开放和可扩展：ComfyUI 是一个开源项目，具有高度的可扩展性。开辟者可以根据需要编写新的模块或插件，扩展体系功能，并根据项目需求进行定制。
6）用户友好性：尽管其功能强大，但 ComfyUI 仍然保持了用户友好性，即使对于复杂使命，也能以相对简单的方式完成，使其成为天生式 AI 工作流程管理的有力工具。
（五）20分钟速通安装ComfyUI
依旧选择使用魔搭社区提供的Notebook和免费的GPU算力体验来体验ComfyUI。
step1 ：选择启动环境（大概2-3min）
https://i-blog.csdnimg.cn/direct/2f5573161b2d47c4bceb1945f7660068.png
step2 ：输入代码进行安装
git lfs install
git clone https://www.modelscope.cn/datasets/maochase/kolors_test_comfyui.git
mv kolors_test_comfyui/* ./
rm -rf kolors_test_comfyui/
mkdir -p /mnt/workspace/models/lightning_logs/version_0/checkpoints/
mv epoch=0-step=500.ckpt /mnt/workspace/models/lightning_logs/version_0/checkpoints/ https://i-blog.csdnimg.cn/direct/abeec7f73e4a4666ab7251bd014bbbdf.pnghttps://i-blog.csdnimg.cn/direct/250e7844cbe64a0499c5572cafd03374.png
step3 ：进入安装文件进行安装（大概15-20min）
https://i-blog.csdnimg.cn/direct/eaa9720b04ef46e78100a6bedf9967bb.png
https://i-blog.csdnimg.cn/direct/ad20d173bbb44c49ae03d9cc10895a0d.pnghttps://i-blog.csdnimg.cn/direct/5f9f5e63525b4c5298070c10b9e007ab.png
step4 ：复制链接进行访问
当实行到最后一个节点的内容输出了一个访问的链接的时候，复制链接到欣赏器中访问
https://i-blog.csdnimg.cn/direct/72c7bf31b68349adb3b7d53341dc76d2.pnghttps://i-blog.csdnimg.cn/direct/9bafd8fadb254ccbb995290fd8e04096.png
（六）浅尝ComfyUI工作流

{
"last_node_id": 15,
"last_link_id": 18,
"nodes": [
{
   "id": 11,
   "type": "VAELoader",
   "pos": [
   1323,
   240
   ],
   "size": {
   "0": 315,
   "1": 58
   },
   "flags": {},
   "order": 0,
   "mode": 0,
   "outputs": [
   {
      "name": "VAE",
      "type": "VAE",
      "links": [
         12
      ],
      "shape": 3
   }
   ],
   "properties": {
   "Node name for S&R": "VAELoader"
   },
   "widgets_values": [
   "sdxl.vae.safetensors"
   ]
},
{
   "id": 10,
   "type": "VAEDecode",
   "pos": [
   1368,
   369
   ],
   "size": {
   "0": 210,
   "1": 46
   },
   "flags": {},
   "order": 6,
   "mode": 0,
   "inputs": [
   {
      "name": "samples",
      "type": "LATENT",
      "link": 18
   },
   {
      "name": "vae",
      "type": "VAE",
      "link": 12,
      "slot_index": 1
   }
   ],
   "outputs": [
   {
      "name": "IMAGE",
      "type": "IMAGE",
      "links": [
         13
      ],
      "shape": 3,
      "slot_index": 0
   }
   ],
   "properties": {
   "Node name for S&R": "VAEDecode"
   }
},
{
   "id": 14,
   "type": "KolorsSampler",
   "pos": [
   1011,
   371
   ],
   "size": {
   "0": 315,
   "1": 222
   },
   "flags": {},
   "order": 5,
   "mode": 0,
   "inputs": [
   {
      "name": "kolors_model",
      "type": "KOLORSMODEL",
      "link": 16
   },
   {
      "name": "kolors_embeds",
      "type": "KOLORS_EMBEDS",
      "link": 17
   }
   ],
   "outputs": [
   {
      "name": "latent",
      "type": "LATENT",
      "links": [
         18
      ],
      "shape": 3,
      "slot_index": 0
   }
   ],
   "properties": {
   "Node name for S&R": "KolorsSampler"
   },
   "widgets_values": [
   1024,
   1024,
   1000102404233412,
   "fixed",
   25,
   5,
   "EulerDiscreteScheduler"
   ]
},
{
   "id": 6,
   "type": "DownloadAndLoadKolorsModel",
   "pos": [
   201,
   368
   ],
   "size": {
   "0": 315,
   "1": 82
   },
   "flags": {},
   "order": 1,
   "mode": 0,
   "outputs": [
   {
      "name": "kolors_model",
      "type": "KOLORSMODEL",
      "links": [
         16
      ],
      "shape": 3,
      "slot_index": 0
   }
   ],
   "properties": {
   "Node name for S&R": "DownloadAndLoadKolorsModel"
   },
   "widgets_values": [
   "Kwai-Kolors/Kolors",
   "fp16"
   ]
},
{
   "id": 3,
   "type": "PreviewImage",
   "pos": [
   1366,
   468
   ],
   "size": [
   535.4001724243165,
   562.2001106262207
   ],
   "flags": {},
   "order": 7,
   "mode": 0,
   "inputs": [
   {
      "name": "images",
      "type": "IMAGE",
      "link": 13
   }
   ],
   "properties": {
   "Node name for S&R": "PreviewImage"
   }
},
{
   "id": 12,
   "type": "KolorsTextEncode",
   "pos": [
   519,
   529
   ],
   "size": [
   457.2893696934723,
   225.28656056301645
   ],
   "flags": {},
   "order": 4,
   "mode": 0,
   "inputs": [
   {
      "name": "chatglm3_model",
      "type": "CHATGLM3MODEL",
      "link": 14,
      "slot_index": 0
   }
   ],
   "outputs": [
   {
      "name": "kolors_embeds",
      "type": "KOLORS_EMBEDS",
      "links": [
         17
      ],
      "shape": 3,
      "slot_index": 0
   }
   ],
   "properties": {
   "Node name for S&R": "KolorsTextEncode"
   },
   "widgets_values": [
   "cinematic photograph of an astronaut riding a horse in space |\nillustration of a cat wearing a top hat and a scarf|\nphotograph of a goldfish in a bowl |\nanime screencap of a red haired girl",
   "",
   1
   ]
},
{
   "id": 15,
   "type": "Note",
   "pos": [
   200,
   636
   ],
   "size": [
   273.5273818969726,
   149.55464588512064
   ],
   "flags": {},
   "order": 2,
   "mode": 0,
   "properties": {
   "text": ""
   },
   "widgets_values": [
   "Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine"
   ],
   "color": "#432",
   "bgcolor": "#653"
},
{
   "id": 13,
   "type": "DownloadAndLoadChatGLM3",
   "pos": [
   206,
   522
   ],
   "size": [
   274.5334274291992,
   58
   ],
   "flags": {},
   "order": 3,
   "mode": 0,
   "outputs": [
   {
      "name": "chatglm3_model",
      "type": "CHATGLM3MODEL",
      "links": [
         14
      ],
      "shape": 3
   }
   ],
   "properties": {
   "Node name for S&R": "DownloadAndLoadChatGLM3"
   },
   "widgets_values": [
   "fp16"
   ]
}
],
"links": [
[
   12,
   11,
   0,
   10,
   1,
   "VAE"
],
[
   13,
   10,
   0,
   3,
   0,
   "IMAGE"
],
[
   14,
   13,
   0,
   12,
   0,
   "CHATGLM3MODEL"
],
[
   16,
   6,
   0,
   14,
   0,
   "KOLORSMODEL"
],
[
   17,
   12,
   0,
   14,
   1,
   "KOLORS_EMBEDS"
],
[
   18,
   14,
   0,
   10,
   0,
   "LATENT"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
   "scale": 1.1,
   "offset": {
   "0": -114.73954010009766,
   "1": -139.79705810546875
   }
}
},
"version": 0.4
} https://i-blog.csdnimg.cn/direct/07931321aa7c47cca059f410905c40cb.pnghttps://i-blog.csdnimg.cn/direct/c3c08209ff41494f9535089fd6fb0481.png
下面是我本身调解关键词和数据后天生的一些AI生图作品
https://i-blog.csdnimg.cn/direct/54ad4d8a20e345eda207bde4f4ed904d.pnghttps://i-blog.csdnimg.cn/direct/9a3d49c8098c43caa729137767a3da20.png
https://i-blog.csdnimg.cn/direct/2fc4f39ff7fe4bc9aeba9e45ad69f5c4.pnghttps://i-blog.csdnimg.cn/direct/055d58772c2a4187984ae50ed2402c88.png
二、LoRA安装及实践

（一）什么是Lora微调

LoRA (Low-Rank Adaptation) 微调是一种用于在预训练模型上进行高效微调的技术。它可以通过高效且机动的方式实现模型的个性化调解，使其能够适应特定的使命或范畴，同时保持良好的泛化能力和较低的资源斲丧。这对于推动大规模预训练模型的实际应用至关重要。
（二）LoRA微调原理

LoRA通过在预训练模型的关键层中添加低秩矩阵来实现。这些低秩矩阵通常被计划成具有较低维度的参数空间，这样它们就可以在不改变模型整体结构的环境下进行微调。在训练过程中，只有这些新增的低秩矩阵被更新，而原始模型的大部门权重保持稳固。
（三）LoRA微调的优势

1）快速适应新使命：在特定范畴有少量标注数据的环境下，也可以有效地对模型进行个性化调解，可以迅速适应新的范畴或特定使命。
2）保持泛化能力：LoRA通过微调模型的一部门，有助于保持模型在未见过的数据上的泛化能力，同时还能学习到特定使命的知识。
3）资源效率：LoRA旨在通过仅微调模型的部门权重，而不是整个模型，从而淘汰所需的计算资源和存储空间。
（四）LoRA微调代码分析

import os
cmd = """
python DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py \ # 选择使用可图的Lora训练脚本DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py
--pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \ # 选择unet模型
--pretrained_text_encoder_path models/kolors/Kolors/text_encoder \ # 选择text_encoder
--pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \ # 选择vae模型
--lora_rank 16 \ # lora_rank 16 表示在权衡模型表达能力和训练效率时，选择了使用 16 作为秩，适合在不显著降低模型性能的前提下，通过 LoRA 减少计算和内存的需求
--lora_alpha 4.0 \ # 设置 LoRA 的 alpha 值，影响调整的强度
--dataset_path data/lora_dataset_processed \ # 指定数据集路径，用于训练模型
--output_path ./models \ # 指定输出路径，用于保存模型
--max_epochs 1 \ # 设置最大训练轮数为 1
--center_crop \ # 启用中心裁剪，这通常用于图像预处理
--use_gradient_checkpointing \ # 启用梯度检查点技术，以节省内存
--precision "16-mixed" # 指定训练时的精度为混合 16 位精度（half precision），这可以加速训练并减少显存使用
""".strip()
os.system(cmd) # 执行可图Lora训练（五）UNet、VAE和文本编码器的协作关系

UNet：负责根据输入的噪声和文本条件天生图像。在Stable Diffusion模型中，UNet吸取由VAE编码器产生的噪声和文本编码器转换的文本向量作为输入，并猜测去噪后的噪声，从而天生与文本形貌相符的图像
VAE：天生模型，用于将输入数据映射到潜伏空间，并从中采样以天生新图像。在Stable Diffusion中，VAE编码器首天赋生带有噪声的潜伏表示，这些表示随后与文本条件一起输入到UNet中
文本编码器：将文本输入转换为模型可以明确的向量表示。在Stable Diffusion模型中，文本编码器使用CLIP模型将文本提示转换为向量，这些向量与VAE天生的噪声一起输入到UNet中，指导图像的天生过程
三、自学资源总结

（一）公开的数据平台

1 ）ImageNet：包含数百万张图片，广泛用于分类使命，也可以用于天生使命。
2 ）Open Images：由Google维护，包含数千万张带有标签的图片。
3 ）Flickr：特殊是Flickr30kK和Flickr8K数据集，常用于图像形貌使命。
4 ）CelebA：专注于人脸图像的数据集。
5 ）LSUN (Large-scale Scene Understanding)：包含各种场景种别的大规模数据集。
（二）自学平台

1 ）在魔搭使用ComfyUI，玩转AIGC！
2 ）ComfyUI的官方地址
3 ）ComfyUI官方树模
4 ）别人的基础工作流树模
5 ）工作流分享网站
6 ）保举一个比较好的comfyui的github仓库网站

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

页: [1]

qidao123.com技术社区-IT企服评测·应用市场's Archiver

Datawhale X 魔搭 AI夏令营-AIGC文生图方向Task3条记