当前位置：首页 > article >正文

政安晨【零基础玩转各类开源AI项目】Wan 2.1 本地部署，基于ComfyUI运行，最强文生视频图生视频，一键生成高质量影片

article 2025/3/12 20:53:29

政安晨的个人主页：政安晨

欢迎 👍点赞✍评论⭐收藏

希望政安晨的博客能够对您有所裨益，如有不足之处，欢迎在评论区提出指正！

下载项目

创建虚拟环境

安装项目依赖

尝试运行

依次下载模型

完成

我们今天要使用的Wan2.1模型，文生视频与图生视频，效果很不错，我以前的文章部署过comfyUI：

政安晨【零基础玩转各类开源AI项目】基于Ubuntu系统部署ComfyUI：功能最强大、模块化程度最高的Stable Diffusion图形用户界面和后台_comfyui ubuntu-CSDN博客文章浏览阅读1.4k次，点赞10次，收藏25次。ComfyUI这套框架可让您使用基于图形/节点/流程图的界面设计和执行高级稳定扩散管道。_comfyui ubuntuhttps://blog.csdn.net/snowdenkeke/article/details/140156889这次重新部署，完整演绎使用Wan 2.1的模型，看一下在消费级显卡上的使用效果。

当我们完成这次使用演绎后，相信您已经可以掌握一套这是能够商用的视频生成工具了。

下载项目

git clone git@github.com:comfyanonymous/ComfyUI.git

我们可以看到ComfyUI的结构：

至于ComfyUI的特性我这里就不过多赘述了，看我以前的文章。

今天我们就是要把这套彻底用好，用它生成酷炫的视频。。。！！

创建虚拟环境

因为我的AI工具比较多，我都是采用虚拟环境进行安装：

conda create -n comfyui python=3.10.16

创建之后，我们启动它：

conda activate comfyui

真心推荐使用N卡：

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu126

安装完毕，没有报错，如果出现错误，请照此解决：

If you get the "Torch not compiled with CUDA enabled" error, uninstall torch with:

pip uninstall torch

And install it again with the command above.

安装项目依赖

进入刚才下载的项目目录，执行：

pip install -r requirements.txt

尝试运行

基本框架安装完毕后，可以尝试运行一下，熟悉一下界面：

python main.py

本地浏览器打开：http://127.0.0.1:8188

在运行过程中查看服务器打印没有错误，就可以继续了。

此时，您会发现只能本地访问，默认情况下，ComfyUI 可能仅绑定到 127.0.0.1（本地回环地址），导致局域网无法访问。需要将其绑定到 0.0.0.0（监听所有网络接口）。

可以这样配置：

方法一：通过启动参数配置

在启动 ComfyUI 时添加 --listen 参数：

python main.py --listen 0.0.0.0 --port 8188

方法二：修改启动脚本

编辑启动脚本，增加如下语句：

@echo off
set PYTHONPATH=.
python main.py --listen 0.0.0.0 --port 8188

重新启动，现在您可以局域网中访问啦：

如果提示缺少模型，可以尝试点击下载，测试一下，注意保证网络通畅，嘻嘻。

依次下载模型

1. 下载文本编码器 ：umt5_xxl_fp8_e4m3fn_scaled.safetensors 放入：ComfyUI/models/text_encoders/

放入这个位置：

2. 下载VAE文件 wan_2.1_vae.safetensors 放入：ComfyUI/models/vae/：

3、下载视频生成模型【点击下载】

注意：建议使用 fp16 版本而不是 bf16 版本，因为它们会产生更好的结果。

质量等级（从高到低）：fp16 > bf16 > fp8_scaled > fp8_e4m3fn

这些文件位于：ComfyUI/models/diffusion_models/

这些示例使用 16 位文件，但如果内存不足，则可以使用 fp8 文件。

根据您的显存情况进行选择：

4、文字转视频工作流：下载 Json 格式的工作流

工作流文件内容如下:

{
  "last_node_id": 48,
  "last_link_id": 95,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1210,
        190
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            56,
            93
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        866.3932495117188,
        499.18597412109375
      ],
      "size": [
        306.36004638671875,
        58
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            76
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "wan_2.1_vae.safetensors"
      ]
    },
    {
      "id": 28,
      "type": "SaveAnimatedWEBP",
      "pos": [
        1460,
        190
      ],
      "size": [
        870.8511352539062,
        643.7430419921875
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 56
        }
      ],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "ComfyUI",
        16,
        false,
        90,
        "default",
        ""
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        413,
        389
      ],
      "size": [
        425.27801513671875,
        180.6060791015625
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 75
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            52
          ],
          "slot_index": 0
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 38,
      "type": "CLIPLoader",
      "pos": [
        12.94982624053955,
        184.6981658935547
      ],
      "size": [
        390,
        82
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            74,
            75
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader"
      },
      "widgets_values": [
        "umt5_xxl_fp8_e4m3fn_scaled.safetensors",
        "wan",
        "default"
      ]
    },
    {
      "id": 40,
      "type": "EmptyHunyuanLatentVideo",
      "pos": [
        520,
        620
      ],
      "size": [
        315,
        130
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            91
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyHunyuanLatentVideo"
      },
      "widgets_values": [
        832,
        480,
        33,
        1
      ]
    },
    {
      "id": 47,
      "type": "SaveWEBM",
      "pos": [
        2367.213134765625,
        193.6114959716797
      ],
      "size": [
        315,
        130
      ],
      "flags": {},
      "order": 10,
      "mode": 4,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 93
        }
      ],
      "outputs": [],
      "properties": {
        "Node name for S&R": "SaveWEBM"
      },
      "widgets_values": [
        "ComfyUI",
        "vp9",
        24,
        32
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        863,
        187
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 95
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 46
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 52
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 91
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            35
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        82628696717253,
        "randomize",
        30,
        6,
        "uni_pc",
        "simple",
        1
      ]
    },
    {
      "id": 48,
      "type": "ModelSamplingSD3",
      "pos": [
        440,
        50
      ],
      "size": [
        210,
        58
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 94
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            95
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingSD3"
      },
      "widgets_values": [
        8
      ]
    },
    {
      "id": 37,
      "type": "UNETLoader",
      "pos": [
        20,
        40
      ],
      "size": [
        346.7470703125,
        82
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            94
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader"
      },
      "widgets_values": [
        "wan2.1_t2v_1.3B_fp16.safetensors",
        "default"
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415,
        186
      ],
      "size": [
        422.84503173828125,
        164.31304931640625
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 74
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            46
          ],
          "slot_index": 0
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "a fox moving quickly in a beautiful winter scenery nature trees mountains daytime tracking camera"
      ],
      "color": "#232",
      "bgcolor": "#353"
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      46,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      52,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      56,
      8,
      0,
      28,
      0,
      "IMAGE"
    ],
    [
      74,
      38,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      75,
      38,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      91,
      40,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      93,
      8,
      0,
      47,
      0,
      "IMAGE"
    ],
    [
      94,
      37,
      0,
      48,
      0,
      "MODEL"
    ],
    [
      95,
      48,
      0,
      3,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.1167815779425205,
      "offset": [
        -5.675057867608515,
        8.013751263058214
      ]
    }
  },
  "version": 0.4
}

把这个工作流文件拖进comfyUI中：

可以更改提示词：

反向提示词可以不用动，基本一样：

色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走

生成之后的效果如下：

也可以生成动画表情：

5. 下载图像转视频模型：

下载wan2.1_i2v_480p_14B_fp16.safetensors文件，将其放入：ComfyUI/models/diffusion_models/

下载：

clip_vision_h.safetensors 放入：ComfyUI/models/clip_vision/

6.下载工作流

Json 格式的工作流

{
  "last_node_id": 54,
  "last_link_id": 111,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1210,
        190
      ],
      "size": [
        210,
        46
      ],
      "flags": {},
      "order": 11,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 35
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 76
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            56,
            93
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      },
      "widgets_values": []
    },
    {
      "id": 39,
      "type": "VAELoader",
      "pos": [
        866.3932495117188,
        499.18597412109375
      ],
      "size": [
        306.36004638671875,
        58
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            76,
            99
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "wan_2.1_vae.safetensors"
      ]
    },
    {
      "id": 28,
      "type": "SaveAnimatedWEBP",
      "pos": [
        1460,
        190
      ],
      "size": [
        870.8511352539062,
        643.7430419921875
      ],
      "flags": {},
      "order": 12,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 56
        }
      ],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "ComfyUI",
        16,
        false,
        90,
        "default"
      ]
    },
    {
      "id": 47,
      "type": "SaveWEBM",
      "pos": [
        2367.213134765625,
        193.6114959716797
      ],
      "size": [
        315,
        130
      ],
      "flags": {},
      "order": 13,
      "mode": 4,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 93
        }
      ],
      "outputs": [],
      "properties": {
        "Node name for S&R": "SaveWEBM"
      },
      "widgets_values": [
        "ComfyUI",
        "vp9",
        24,
        32
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        413,
        389
      ],
      "size": [
        425.27801513671875,
        180.6060791015625
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 75
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            98
          ],
          "slot_index": 0
        }
      ],
      "title": "CLIP Text Encode (Negative Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"
      ],
      "color": "#322",
      "bgcolor": "#533"
    },
    {
      "id": 50,
      "type": "WanImageToVideo",
      "pos": [
        673.0507202148438,
        627.272705078125
      ],
      "size": [
        342.5999755859375,
        210
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 97
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 98
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 99
        },
        {
          "name": "clip_vision_output",
          "type": "CLIP_VISION_OUTPUT",
          "shape": 7,
          "link": 107
        },
        {
          "name": "start_image",
          "type": "IMAGE",
          "shape": 7,
          "link": 106
        }
      ],
      "outputs": [
        {
          "name": "positive",
          "type": "CONDITIONING",
          "links": [
            101
          ],
          "slot_index": 0
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "links": [
            102
          ],
          "slot_index": 1
        },
        {
          "name": "latent",
          "type": "LATENT",
          "links": [
            103
          ],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "WanImageToVideo"
      },
      "widgets_values": [
        512,
        512,
        33,
        1
      ]
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        415,
        186
      ],
      "size": [
        422.84503173828125,
        164.31304931640625
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 74
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            97
          ],
          "slot_index": 0
        }
      ],
      "title": "CLIP Text Encode (Positive Prompt)",
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "a cute anime girl with massive fennec ears and a big fluffy tail wearing a maid outfit turning around"
      ],
      "color": "#232",
      "bgcolor": "#353"
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        863,
        187
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 10,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 111
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 101
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 102
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 103
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            35
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        987948718394761,
        "randomize",
        20,
        6,
        "uni_pc",
        "simple",
        1
      ]
    },
    {
      "id": 49,
      "type": "CLIPVisionLoader",
      "pos": [
        20,
        640
      ],
      "size": [
        315,
        58
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP_VISION",
          "type": "CLIP_VISION",
          "links": [
            94
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPVisionLoader"
      },
      "widgets_values": [
        "clip_vision_h.safetensors"
      ]
    },
    {
      "id": 51,
      "type": "CLIPVisionEncode",
      "pos": [
        360,
        640
      ],
      "size": [
        253.60000610351562,
        78
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "clip_vision",
          "type": "CLIP_VISION",
          "link": 94
        },
        {
          "name": "image",
          "type": "IMAGE",
          "link": 109
        }
      ],
      "outputs": [
        {
          "name": "CLIP_VISION_OUTPUT",
          "type": "CLIP_VISION_OUTPUT",
          "links": [
            107
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPVisionEncode"
      },
      "widgets_values": [
        "none"
      ]
    },
    {
      "id": 52,
      "type": "LoadImage",
      "pos": [
        20,
        760
      ],
      "size": [
        315,
        314
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            106,
            109
          ],
          "slot_index": 0
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "slot_index": 1
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "flux_dev_example.png",
        "image"
      ]
    },
    {
      "id": 38,
      "type": "CLIPLoader",
      "pos": [
        20,
        190
      ],
      "size": [
        390,
        82
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            74,
            75
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPLoader"
      },
      "widgets_values": [
        "umt5_xxl_fp8_e4m3fn_scaled.safetensors",
        "wan",
        "default"
      ]
    },
    {
      "id": 37,
      "type": "UNETLoader",
      "pos": [
        20,
        70
      ],
      "size": [
        346.7470703125,
        82
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            110
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "UNETLoader"
      },
      "widgets_values": [
        "wan2.1_i2v_480p_14B_fp16.safetensors",
        "default"
      ]
    },
    {
      "id": 54,
      "type": "ModelSamplingSD3",
      "pos": [
        510,
        70
      ],
      "size": [
        315,
        58
      ],
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 110
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            111
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingSD3"
      },
      "widgets_values": [
        8
      ]
    }
  ],
  "links": [
    [
      35,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      56,
      8,
      0,
      28,
      0,
      "IMAGE"
    ],
    [
      74,
      38,
      0,
      6,
      0,
      "CLIP"
    ],
    [
      75,
      38,
      0,
      7,
      0,
      "CLIP"
    ],
    [
      76,
      39,
      0,
      8,
      1,
      "VAE"
    ],
    [
      93,
      8,
      0,
      47,
      0,
      "IMAGE"
    ],
    [
      94,
      49,
      0,
      51,
      0,
      "CLIP_VISION"
    ],
    [
      97,
      6,
      0,
      50,
      0,
      "CONDITIONING"
    ],
    [
      98,
      7,
      0,
      50,
      1,
      "CONDITIONING"
    ],
    [
      99,
      39,
      0,
      50,
      2,
      "VAE"
    ],
    [
      101,
      50,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      102,
      50,
      1,
      3,
      2,
      "CONDITIONING"
    ],
    [
      103,
      50,
      2,
      3,
      3,
      "LATENT"
    ],
    [
      106,
      52,
      0,
      50,
      4,
      "IMAGE"
    ],
    [
      107,
      51,
      0,
      50,
      3,
      "CLIP_VISION_OUTPUT"
    ],
    [
      109,
      52,
      0,
      51,
      1,
      "IMAGE"
    ],
    [
      110,
      37,
      0,
      54,
      0,
      "MODEL"
    ],
    [
      111,
      54,
      0,
      3,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.015255979947749,
      "offset": [
        4.576817595742521,
        -17.69629597715313
      ]
    }
  },
  "version": 0.4
}

我准备使用这张图片生成一段动画：