Stable Diffusion 3.5模型发布,图像生成更真实,性能提升,并专注于多样化输出和易用性。
StabilityAI昨天发布了其全新的Stable Diffusion 3.5系列 AI 图像模型,与之前的3.0版本相比,这次升级明显进步了图像的逼真度、对提示的响应能力以及文本渲染效果。
与 SD3.0 类似,Stable Diffusion 3.5有三个版本——大型版 (8B)、大型加速版 (8B Turbo) 和中型版 (2.6B)。这些模型都可以根据用户需求进行定制,并能在消费级硬件上运行,同时也可以通过稳定AI社区许可证使用。
简单来说,这一升级让任何用户都能更轻松地生成逼真的 AI 图像。在一份新闻稿中,StabilityAI承认今年6月发布的中型模型“未能完全到达我们的标准或社区的期望”。
公司进一步解释道:“在听取了宝贵的社区反馈后,我们决定花更多时间开辟一个可以大概推进我们改变视觉媒体使命的版本,而不是快速修补。”
我们的AI编辑Ryan Morrison已经测试了3.5版,他认为这次升级明显提升,甚至大概超过迩来发布的Flux 1.1 Pro的能力。
Stable Diffusion3.5有什么新功能?
StabilityAI 表现,新模型的重点是可定制性、高效性能和多样化输出。“Stable Diffusion3.5是我们迄今为止最强大的模型,体现了我们为创作者提供广泛可用且先辈工具的承诺。”公司发言人解释道。
这意味着图像可以进行精细调解,模型可以“开箱即用”在消费级硬件上运行,生成的图像会更加独特。
Ryan Morrison 对Stable Diffusion 3.5的大型版进行了快速测试,发现其生成速度快,可以大概准确响应提示,且风格控制能力强。相比3.0版尤其是中型版,这次升级明显。
新版本还加入了更多的风格选择,包括摄影、绘画等,甚至可以通过标签提示来指定特定风格,如波西米亚风格或时尚风格。此外,通过在提示中突出关键字,可以引导模型朝特定方向发展。
公司分析指出:“Stable Diffusion 3.5大型版在提示响应方面处于市场领先地位,图像质量也与更大规模的模型相媲美。”
“Stable Diffusion 3.5加速版提供了同级别中最快的推理速度,且在图像质量和提示响应上也保持了高度竞争力,即便与其他同规模非蒸馏模型相比。”
“Stable Diffusion 3.5中型版则在中型模型中体现优秀,兼顾了提示响应和图像质量,是高效且高质量体现的理想选择。”
该模型可供非商业用途免费使用,包括科研项目,以及年收入不超过100万美元的小型和中型企业使用。超过这一收入范围的企业则需获得企业许可证。
Github:https://github.com/Stability-AI/sd3.5
stable-diffusion-3.5-large
Huggingface: stabilityai/stable-diffusion-3.5-large
Stable Diffusion 3.5 Large 是一个多模式扩散变更器(MMDiT)文本到图像模型,在图像质量、排版、复杂提示理解和资源效率方面都有改进。
- ├── text_encoders/
- │ ├── README.md
- │ ├── clip_g.safetensors
- │ ├── clip_l.safetensors
- │ ├── t5xxl_fp16.safetensors
- │ └── t5xxl_fp8_e4m3fn.safetensors
- │
- ├── README.md
- ├── LICENSE
- ├── sd3_large.safetensors
- ├── SD3.5L_example_workflow.json
- └── sd3_large_demo.png
- ** File structure below is for diffusers integration**
- ├── scheduler/
- ├── text_encoder/
- ├── text_encoder_2/
- ├── text_encoder_3/
- ├── tokenizer/
- ├── tokenizer_2/
- ├── tokenizer_3/
- ├── transformer/
- ├── vae/
- └── model_index.json
复制代码 快速上手
- import torch
- from diffusers import StableDiffusion3Pipeline
- pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16)
- pipe = pipe.to("cuda")
- image = pipe(
- "A capybara holding a sign that reads Hello World",
- num_inference_steps=28,
- guidance_scale=3.5,
- ).images[0]
- image.save("capybara.png")
复制代码 我手头上24GB也 out of memory /(ㄒoㄒ)/~~
使用扩散器量化模型 减少 VRAM 使用量,让模型得当低 VRAM GPU
- from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
- from diffusers import StableDiffusion3Pipeline
- import torch
- model_id = "stabilityai/stable-diffusion-3.5-large"
- nf4_config = BitsAndBytesConfig(
- load_in_4bit=True,
- bnb_4bit_quant_type="nf4",
- bnb_4bit_compute_dtype=torch.bfloat16
- )
- model_nf4 = SD3Transformer2DModel.from_pretrained(
- model_id,
- subfolder="transformer",
- quantization_config=nf4_config,
- torch_dtype=torch.bfloat16
- )
- pipeline = StableDiffusion3Pipeline.from_pretrained(
- model_id,
- transformer=model_nf4,
- torch_dtype=torch.bfloat16
- )
- pipeline.enable_model_cpu_offload()
- prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
- image = pipeline(
- prompt=prompt,
- num_inference_steps=28,
- guidance_scale=4.5,
- max_sequence_length=512,
- ).images[0]
- image.save("whimsical.png")
复制代码 注意:至少12GB的显存,由于enable_model_cpu_offload,内存也最好多点,我 T4 的环境没起来,就在ram爆了。P100环境能运行,建议用专业平台。(RAM 21.2GB/29 GB VRAM 11GB/16GB )
stable-diffusion-3.5-large-turbo
Huggingface: stabilityai/stable-diffusion-3.5-large-turbo
Stable Diffusion 3.5 Large Turbo 是一款多模态扩散变更器 (MMDiT) 文本到图像模型,接纳了对抗扩散蒸馏 (ADD),在图像质量、排版、复杂提示理解和资源效率方面的性能都有所进步,重点是减少了推理步调。
- ├── text_encoders/ (text_encoder/text_encoder_1/text_encoder_2 are for diffusers)
- │ ├── README.md
- │ ├── clip_g.safetensors
- │ ├── clip_l.safetensors
- │ ├── t5xxl_fp16.safetensors
- │ └── t5xxl_fp8_e4m3fn.safetensors
- │
- ├── README.md
- ├── LICENSE
- ├── sd3_large_turbo.safetensors
- ├── SD3.5L_Turbo_example_workflow.json
- └── sd3_large_turbo_demo.png
- ** File structure below is for diffusers integration**
- ├── scheduler/
- ├── text_encoder/
- ├── text_encoder_2/
- ├── text_encoder_3/
- ├── tokenizer/
- ├── tokenizer_2/
- ├── tokenizer_3/
- ├── transformer/
- ├── vae/
- └── model_index.json
复制代码- import torch
- from diffusers import StableDiffusion3Pipeline
- pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large-turbo", torch_dtype=torch.bfloat16)
- pipe = pipe.to("cuda")
- image = pipe(
- "A capybara holding a sign that reads Hello Fast World",
- num_inference_steps=4,
- guidance_scale=0.0,
- ).images[0]
- image.save("capybara.png")
复制代码 90 也没起来
减少 VRAM 使用量,让模型得当低 VRAM GPU
- from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
- from diffusers import StableDiffusion3Pipeline
- from transformers import T5EncoderModel
- import torch
- model_id = "stabilityai/stable-diffusion-3.5-large-turbo"
- nf4_config = BitsAndBytesConfig(
- load_in_4bit=True,
- bnb_4bit_quant_type="nf4",
- bnb_4bit_compute_dtype=torch.bfloat16
- )
- model_nf4 = SD3Transformer2DModel.from_pretrained(
- model_id,
- subfolder="transformer",
- quantization_config=nf4_config,
- torch_dtype=torch.bfloat16
- )
- t5_nf4 = T5EncoderModel.from_pretrained("diffusers/t5-nf4", torch_dtype=torch.bfloat16)
- pipeline = StableDiffusion3Pipeline.from_pretrained(
- model_id,
- transformer=model_nf4,
- text_encoder_3=t5_nf4,
- torch_dtype=torch.bfloat16
- )
- pipeline.enable_model_cpu_offload()
- prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
- image = pipeline(
- prompt=prompt,
- num_inference_steps=4,
- guidance_scale=0.0,
- max_sequence_length=512,
- ).images[0]
- image.save("whimsical.png")
复制代码 注意:至少12GB的显存,由于enable_model_cpu_offload,内存也最好多点,我 T4 的环境没起来,就在ram爆了。P100环境能运行,建议用专业平台。(RAM 27.7GB/29 GB VRAM 8.3GB/16GB )
最后
跑分神器 SD family:
- T0: SD3.5
- T1: SD3
- T2: SDXL
- T3: SD 1.5/2.1
你的电脑在哪一级,是不是该换了?
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |