政安晨【零基础玩转各类开源AI项目】Wan 2.1 本地部署,基于ComfyUI运行,最强文生视频 图生视频,一键生成高质量影片
政安晨的个人主页:政安晨
欢迎 👍点赞✍评论⭐收藏
希望政安晨的博客能够对您有所裨益,如有不足之处,欢迎在评论区提出指正!
目录
下载项目
创建虚拟环境
安装项目依赖
尝试运行
依次下载模型
完成
我们今天要使用的Wan2.1模型,文生视频与图生视频,效果很不错,我以前的文章部署过comfyUI:
政安晨【零基础玩转各类开源AI项目】基于Ubuntu系统部署ComfyUI:功能最强大、模块化程度最高的Stable Diffusion图形用户界面和后台_comfyui ubuntu-CSDN博客文章浏览阅读1.4k次,点赞10次,收藏25次。ComfyUI这套框架可让您使用基于图形/节点/流程图的界面设计和执行高级稳定扩散管道。_comfyui ubuntuhttps://blog.csdn.net/snowdenkeke/article/details/140156889这次重新部署,完整演绎使用Wan 2.1的模型,看一下在消费级显卡上的使用效果。
当我们完成这次使用演绎后,相信您已经可以掌握一套这是能够商用的视频生成工具了。
下载项目
git clone git@github.com:comfyanonymous/ComfyUI.git
我们可以看到ComfyUI的结构:
至于ComfyUI的特性我这里就不过多赘述了,看我以前的文章。
今天我们就是要把这套彻底用好,用它生成酷炫的视频。。。!!
创建虚拟环境
因为我的AI工具比较多,我都是采用虚拟环境进行安装:
conda create -n comfyui python=3.10.16
创建之后,我们启动它:
conda activate comfyui
真心推荐使用N卡:
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu126
安装完毕,没有报错,如果出现错误,请照此解决:
If you get the "Torch not compiled with CUDA enabled" error, uninstall torch with:
pip uninstall torch
And install it again with the command above.
安装项目依赖
进入刚才下载的项目目录,执行:
pip install -r requirements.txt
尝试运行
基本框架安装完毕后,可以尝试运行一下,熟悉一下界面:
python main.py
本地浏览器打开:http://127.0.0.1:8188
在运行过程中查看服务器打印没有错误,就可以继续了。
此时,您会发现只能本地访问,默认情况下,ComfyUI 可能仅绑定到 127.0.0.1
(本地回环地址),导致局域网无法访问。需要将其绑定到 0.0.0.0
(监听所有网络接口)。
可以这样配置:
方法一:通过启动参数配置
在启动 ComfyUI 时添加 --listen
参数:
python main.py --listen 0.0.0.0 --port 8188
方法二:修改启动脚本
编辑启动脚本,增加如下语句:
@echo off
set PYTHONPATH=.
python main.py --listen 0.0.0.0 --port 8188
重新启动,现在您可以局域网中访问啦:
如果提示缺少模型,可以尝试点击下载,测试一下,注意保证网络通畅,嘻嘻。
依次下载模型
1. 下载文本编码器 :umt5_xxl_fp8_e4m3fn_scaled.safetensors 放入:ComfyUI/models/text_encoders/
放入这个位置:
2. 下载VAE文件 wan_2.1_vae.safetensors 放入:ComfyUI/models/vae/:
3、下载视频生成模型 【点击下载】
注意:建议使用 fp16 版本而不是 bf16 版本,因为它们会产生更好的结果。
质量等级(从高到低):fp16 > bf16 > fp8_scaled > fp8_e4m3fn
这些文件位于:ComfyUI/models/diffusion_models/
这些示例使用 16 位文件,但如果内存不足,则可以使用 fp8 文件。
根据您的显存情况进行选择:
4、文字转视频工作流:下载 Json 格式的工作流
工作流文件内容如下:
{
"last_node_id": 48,
"last_link_id": 95,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1210,
190
],
"size": [
210,
46
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
56,
93
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 39,
"type": "VAELoader",
"pos": [
866.3932495117188,
499.18597412109375
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
76
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
]
},
{
"id": 28,
"type": "SaveAnimatedWEBP",
"pos": [
1460,
190
],
"size": [
870.8511352539062,
643.7430419921875
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 56
}
],
"outputs": [],
"properties": {},
"widgets_values": [
"ComfyUI",
16,
false,
90,
"default",
""
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
413,
389
],
"size": [
425.27801513671875,
180.6060791015625
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
52
],
"slot_index": 0
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
12.94982624053955,
184.6981658935547
],
"size": [
390,
82
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
74,
75
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 40,
"type": "EmptyHunyuanLatentVideo",
"pos": [
520,
620
],
"size": [
315,
130
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
91
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "EmptyHunyuanLatentVideo"
},
"widgets_values": [
832,
480,
33,
1
]
},
{
"id": 47,
"type": "SaveWEBM",
"pos": [
2367.213134765625,
193.6114959716797
],
"size": [
315,
130
],
"flags": {},
"order": 10,
"mode": 4,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 93
}
],
"outputs": [],
"properties": {
"Node name for S&R": "SaveWEBM"
},
"widgets_values": [
"ComfyUI",
"vp9",
24,
32
]
},
{
"id": 3,
"type": "KSampler",
"pos": [
863,
187
],
"size": [
315,
262
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 95
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 46
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 52
},
{
"name": "latent_image",
"type": "LATENT",
"link": 91
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
35
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
82628696717253,
"randomize",
30,
6,
"uni_pc",
"simple",
1
]
},
{
"id": 48,
"type": "ModelSamplingSD3",
"pos": [
440,
50
],
"size": [
210,
58
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 94
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
95
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
8
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
20,
40
],
"size": [
346.7470703125,
82
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
94
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"wan2.1_t2v_1.3B_fp16.safetensors",
"default"
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415,
186
],
"size": [
422.84503173828125,
164.31304931640625
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
46
],
"slot_index": 0
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"a fox moving quickly in a beautiful winter scenery nature trees mountains daytime tracking camera"
],
"color": "#232",
"bgcolor": "#353"
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
46,
6,
0,
3,
1,
"CONDITIONING"
],
[
52,
7,
0,
3,
2,
"CONDITIONING"
],
[
56,
8,
0,
28,
0,
"IMAGE"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
91,
40,
0,
3,
3,
"LATENT"
],
[
93,
8,
0,
47,
0,
"IMAGE"
],
[
94,
37,
0,
48,
0,
"MODEL"
],
[
95,
48,
0,
3,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.1167815779425205,
"offset": [
-5.675057867608515,
8.013751263058214
]
}
},
"version": 0.4
}
把这个工作流文件拖进comfyUI中:
可以更改提示词:
反向提示词可以不用动,基本一样:
色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走
生成之后的效果如下:
也可以生成动画表情:
5. 下载图像转视频模型:
下载wan2.1_i2v_480p_14B_fp16.safetensors文件,将其放入:ComfyUI/models/diffusion_models/
下载 :
clip_vision_h.safetensors 放入:ComfyUI/models/clip_vision/
6.下载工作流
Json 格式的工作流
{
"last_node_id": 54,
"last_link_id": 111,
"nodes": [
{
"id": 8,
"type": "VAEDecode",
"pos": [
1210,
190
],
"size": [
210,
46
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 35
},
{
"name": "vae",
"type": "VAE",
"link": 76
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
56,
93
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 39,
"type": "VAELoader",
"pos": [
866.3932495117188,
499.18597412109375
],
"size": [
306.36004638671875,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
76,
99
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"wan_2.1_vae.safetensors"
]
},
{
"id": 28,
"type": "SaveAnimatedWEBP",
"pos": [
1460,
190
],
"size": [
870.8511352539062,
643.7430419921875
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 56
}
],
"outputs": [],
"properties": {},
"widgets_values": [
"ComfyUI",
16,
false,
90,
"default"
]
},
{
"id": 47,
"type": "SaveWEBM",
"pos": [
2367.213134765625,
193.6114959716797
],
"size": [
315,
130
],
"flags": {},
"order": 13,
"mode": 4,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 93
}
],
"outputs": [],
"properties": {
"Node name for S&R": "SaveWEBM"
},
"widgets_values": [
"ComfyUI",
"vp9",
24,
32
]
},
{
"id": 7,
"type": "CLIPTextEncode",
"pos": [
413,
389
],
"size": [
425.27801513671875,
180.6060791015625
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 75
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
98
],
"slot_index": 0
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 50,
"type": "WanImageToVideo",
"pos": [
673.0507202148438,
627.272705078125
],
"size": [
342.5999755859375,
210
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "positive",
"type": "CONDITIONING",
"link": 97
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 98
},
{
"name": "vae",
"type": "VAE",
"link": 99
},
{
"name": "clip_vision_output",
"type": "CLIP_VISION_OUTPUT",
"shape": 7,
"link": 107
},
{
"name": "start_image",
"type": "IMAGE",
"shape": 7,
"link": 106
}
],
"outputs": [
{
"name": "positive",
"type": "CONDITIONING",
"links": [
101
],
"slot_index": 0
},
{
"name": "negative",
"type": "CONDITIONING",
"links": [
102
],
"slot_index": 1
},
{
"name": "latent",
"type": "LATENT",
"links": [
103
],
"slot_index": 2
}
],
"properties": {
"Node name for S&R": "WanImageToVideo"
},
"widgets_values": [
512,
512,
33,
1
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
415,
186
],
"size": [
422.84503173828125,
164.31304931640625
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 74
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
97
],
"slot_index": 0
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"a cute anime girl with massive fennec ears and a big fluffy tail wearing a maid outfit turning around"
],
"color": "#232",
"bgcolor": "#353"
},
{
"id": 3,
"type": "KSampler",
"pos": [
863,
187
],
"size": [
315,
262
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 111
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 101
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 102
},
{
"name": "latent_image",
"type": "LATENT",
"link": 103
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
35
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KSampler"
},
"widgets_values": [
987948718394761,
"randomize",
20,
6,
"uni_pc",
"simple",
1
]
},
{
"id": 49,
"type": "CLIPVisionLoader",
"pos": [
20,
640
],
"size": [
315,
58
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP_VISION",
"type": "CLIP_VISION",
"links": [
94
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPVisionLoader"
},
"widgets_values": [
"clip_vision_h.safetensors"
]
},
{
"id": 51,
"type": "CLIPVisionEncode",
"pos": [
360,
640
],
"size": [
253.60000610351562,
78
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip_vision",
"type": "CLIP_VISION",
"link": 94
},
{
"name": "image",
"type": "IMAGE",
"link": 109
}
],
"outputs": [
{
"name": "CLIP_VISION_OUTPUT",
"type": "CLIP_VISION_OUTPUT",
"links": [
107
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPVisionEncode"
},
"widgets_values": [
"none"
]
},
{
"id": 52,
"type": "LoadImage",
"pos": [
20,
760
],
"size": [
315,
314
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
106,
109
],
"slot_index": 0
},
{
"name": "MASK",
"type": "MASK",
"links": null,
"slot_index": 1
}
],
"properties": {
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"flux_dev_example.png",
"image"
]
},
{
"id": 38,
"type": "CLIPLoader",
"pos": [
20,
190
],
"size": [
390,
82
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
74,
75
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"umt5_xxl_fp8_e4m3fn_scaled.safetensors",
"wan",
"default"
]
},
{
"id": 37,
"type": "UNETLoader",
"pos": [
20,
70
],
"size": [
346.7470703125,
82
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
110
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"wan2.1_i2v_480p_14B_fp16.safetensors",
"default"
]
},
{
"id": 54,
"type": "ModelSamplingSD3",
"pos": [
510,
70
],
"size": [
315,
58
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 110
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
111
],
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "ModelSamplingSD3"
},
"widgets_values": [
8
]
}
],
"links": [
[
35,
3,
0,
8,
0,
"LATENT"
],
[
56,
8,
0,
28,
0,
"IMAGE"
],
[
74,
38,
0,
6,
0,
"CLIP"
],
[
75,
38,
0,
7,
0,
"CLIP"
],
[
76,
39,
0,
8,
1,
"VAE"
],
[
93,
8,
0,
47,
0,
"IMAGE"
],
[
94,
49,
0,
51,
0,
"CLIP_VISION"
],
[
97,
6,
0,
50,
0,
"CONDITIONING"
],
[
98,
7,
0,
50,
1,
"CONDITIONING"
],
[
99,
39,
0,
50,
2,
"VAE"
],
[
101,
50,
0,
3,
1,
"CONDITIONING"
],
[
102,
50,
1,
3,
2,
"CONDITIONING"
],
[
103,
50,
2,
3,
3,
"LATENT"
],
[
106,
52,
0,
50,
4,
"IMAGE"
],
[
107,
51,
0,
50,
3,
"CLIP_VISION_OUTPUT"
],
[
109,
52,
0,
51,
1,
"IMAGE"
],
[
110,
37,
0,
54,
0,
"MODEL"
],
[
111,
54,
0,
3,
0,
"MODEL"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.015255979947749,
"offset": [
4.576817595742521,
-17.69629597715313
]
}
},
"version": 0.4
}
我准备使用这张图片生成一段动画:
完成
基于上面地图生视频工作流,生成地动画效果如下:
控制好风格后,应该会不错。
至此,您掌握了一套非凡工具,嘻嘻。