ComfyUI - ComfyUI 工作流中集成 SAM2 + GroundingDINO 处理图像与视频 教程
欢迎关注我的CSDN:https://spike.blog.csdn.net/
本文地址:https://spike.blog.csdn.net/article/details/143359538
免责声明:本文来源于个人知识与公开资料,仅用于学术交流,欢迎讨论,不支持转载。
SAM2 与 GroundingDINO 结合,在图像分割和目标检测领域带来显著的进展,SAM2 实现精确的图像分割,而 GroundingDINO 则强化模型的目标检测能力,提供更加准确和细致的物体识别。在实际应用中,能够有效提升各类复杂图像处理任务的性能,协同工作提高处理速度,还确保高精度和稳定性。
ComfyUI 部署节点的 3 个步骤:
- 准备 节点(Node) 工程,
git clone
,位于ComfyUI/custom_nodes
- 安装依赖包,进入工程,运行
pip install -r requirements.txt
- (可选) 模型提前下载,放入相应的文件夹中
- 重启服务,刷新页面,即可运行。
下载工程:ComfyUI-segment-anything-2、ComfyUI-Florence2、ComfyUI-KJNodes、ComfyUI-SAM2、ComfyUI-VideoHelperSuite
cd ComfyUI/custom_nodes
git clone https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite.git
git clone https://github.com/kijai/ComfyUI-segment-anything-2.git
git clone https://github.com/kijai/ComfyUI-Florence2.git
git clone https://github.com/kijai/ComfyUI-KJNodes.git
# v1.0 版本,被 ComfyUI-SAM2替代
# git clone https://github.com/storyicon/comfyui_segment_anything
git clone https://github.com/neverbiasu/ComfyUI-SAM2.git
pip install -r requirements.txt
1.ComfyUI-segment-anything-2
节点:ComfyUI-segment-anything-2
准备模型:
- SAM2 模型 -
ComfyUI/models/sam2
- Florence-2 模型
ComfyUI/models/LLM
,用于代替检测模型,例如 GroundingDINO,参考 ComfyUI-Florence2
支持处理视频流程,但是整体分割效果非常一般,而且 Points 效果也比较一般。
依赖节点:ComfyUI-Florence2、ComfyUI-KJNodes、ComfyUI-VideoHelperSuite
测试示例位于:https://github.com/kijai/ComfyUI-segment-anything-2/tree/main/examples
例如:points_segment_video_example.json
Load Video (Upload)
,加载视频节点Points Editor
,Point 编辑节点,使用shift + 左右键
,选择正负点。(Down)Load SAM2Model
,下载或加载模型,sam2.1_hiera_large-fp16.safetensors
,选择 fp16Sam2Segmentation
分割节点,注意,需要重新添加,默认流程有问题,接受正负点。Preview Animation
显示动画效果
即:
2.ComfyUI-SAM2
节点:ComfyUI-SAM2
准备模型:models/bert-base-uncased
、models/grounding-dino
、models/sams
GroundingDino + SAM2,只有 3 个节点,功能比较单一,检测效果较好。
GroundingDinoModelLoader (segment anything2)
,加载 DINO 模型SAM2ModelLoader (segment anything2)
,加载 SAM2 模型GroundingDinoSAM2Segment (segment anything2)
,合并,只有2个参数,Prompt 和 阈值
测试模型效果,支持多个词汇,例如 person 和 book,注意逗号分割,即:
效果如下:
Workflow1:
{"last_node_id":117,"last_link_id":62,"nodes":[{"id":113,"type":"Note","pos":{"0":56,"1":-415},"size":{"0":309.1065368652344,"1":177.01339721679688},"flags":{},"order":0,"mode":0,"inputs":[],"outputs":[],"properties":{"text":""},"widgets_values":["To get the image for the points editor, first create a canvas, then either input image/video (first frame is taken), or copy/paste an image while the node is selected, or drag&drop an image.\n\nWARNING: the image WILL BE SAVED to the node in compressed format, including when saving the workflow!\n\nClick the ? on the node for more information"],"color":"#432","bgcolor":"#653"},{"id":116,"type":"Reroute","pos":{"0":1066,"1":115},"size":[75,26],"flags":{},"order":5,"mode":0,"inputs":[{"name":"","type":"*","link":60,"label":"","widget":{"name":"value"}}],"outputs":[{"name":"","type":"STRING","links":[61],"slot_index":0,"label":""}],"properties":{"showOutputText":false,"horizontal":false}},{"id":112,"type":"ShowText|pysssss","pos":{"0":1166,"1":-429},"size":{"0":315,"1":100},"flags":{},"order":4,"mode":0,"inputs":[{"name":"text","type":"STRING","link":53,"widget":{"name":"text"},"label":"text"}],"outputs":[{"name":"STRING","type":"STRING","links":null,"shape":6,"label":"STRING"}],"properties":{"Node name for S&R":"ShowText|pysssss"},"widgets_values":["","[{\"x\": 256, \"y\": 256}, {\"x\": 237, \"y\": 463}, {\"x\": 321, \"y\": 138}]"]},{"id":117,"type":"ShowText|pysssss","pos":{"0":1163,"1":-277},"size":{"0":315,"1":76},"flags":{},"order":6,"mode":0,"inputs":[{"name":"text","type":"STRING","link":62,"widget":{"name":"text"},"label":"text"}],"outputs":[{"name":"STRING","type":"STRING","links":null,"shape":6,"label":"STRING"}],"properties":{"Node name for S&R":"ShowText|pysssss"},"widgets_values":["","[{\"x\": 0, \"y\": 0}, {\"x\": 426, \"y\": 242}]"]},{"id":102,"type":"VHS_LoadVideo","pos":{"0":14,"1":-59},"size":[363.24957275390625,619.2495727539062],"flags":{},"order":1,"mode":0,"inputs":[{"name":"meta_batch","type":"VHS_BatchManager","link":null,"shape":7,"label":"meta_batch"},{"name":"vae","type":"VAE","link":null,"shape":7,"label":"vae"}],"outputs":[{"name":"IMAGE","type":"IMAGE","links":[43,52,57],"slot_index":0,"shape":3,"label":"IMAGE"},{"name":"frame_count","type":"INT","links":null,"shape":3,"label":"frame_count"},{"name":"audio","type":"AUDIO","links":null,"shape":3,"label":"audio"},{"name":"video_info","type":"VHS_VIDEOINFO","links":null,"shape":3,"label":"video_info"}],"properties":{"Node name for S&R":"VHS_LoadVideo"},"widgets_values":{"video":"2851_1708350515(原视频).mp4","force_rate":0,"force_size":"512x?","custom_width":512,"custom_height":512,"frame_load_cap":16,"skip_first_frames":0,"select_every_nth":3,"choose video to upload":"image","videopreview":{"hidden":false,"paused":false,"params":{"frame_load_cap":16,"skip_first_frames":0,"force_rate":0,"filename":"2851_1708350515(原视频).mp4","type":"input","format":"video/mp4","select_every_nth":3,"force_size":"512x?"}}}},{"id":114,"type":"PointsEditor","pos":{"0":439,"1":-477},"size":[557,812],"flags":{"collapsed":false},"order":3,"mode":0,"inputs":[{"name":"bg_image","type":"IMAGE","link":52,"shape":7,"label":"bg_image"}],"outputs":[{"name":"positive_coords","type":"STRING","links":[53,55],"slot_index":0,"shape":3,"label":"positive_coords"},{"name":"negative_coords","type":"STRING","links":[60,62],"slot_index":1,"shape":3,"label":"negative_coords"},{"name":"bbox","type":"BBOX","links":null,"slot_index":2,"shape":3,"label":"bbox"},{"name":"bbox_mask","type":"MASK","links":null,"shape":3,"label":"bbox_mask"},{"name":"cropped_image","type":"IMAGE","links":null,"shape":3,"label":"cropped_image"}],"properties":{"Node name for S&R":"PointsEditor","imgData":{"name":"bg_image","base64":[""]},"points":"PointsEditor","neg_points":"PointsEditor"},"widgets_values":["{\"positive\":[{\"x\":256,\"y\":256},{\"x\":237,\"y\":463},{\"x\":321,\"y\":138}],\"negative\":[{\"x\":0,\"y\":0},{\"x\":426,\"y\":242}]}","[{\"x\":256,\"y\":256},{\"x\":237,\"y\":463},{\"x\":321,\"y\":138}]","[{\"x\":0,\"y\":0},{\"x\":426,\"y\":242}]","[{}]","[{}]","xyxy",512,512,false,null,null,null]},{"id":106,"type":"DownloadAndLoadSAM2Model","pos":{"0":459,"1":393},"size":{"0":315,"1":130},"flags":{},"order":2,"mode":0,"inputs":[],"outputs":[{"name":"sam2_model","type":"SAM2MODEL","links":[56],"shape":3,"label":"sam2_model"}],"properties":{"Node name for S&R":"DownloadAndLoadSAM2Model"},"widgets_values":["sam2.1_hiera_large.safetensors","video","cuda","fp16"]},{"id":115,"type":"Sam2Segmentation","pos":{"0":898,"1":393},"size":{"0":315,"1":190},"flags":{},"order":7,"mode":0,"inputs":[{"name":"sam2_model","type":"SAM2MODEL","link":56,"label":"sam2_model"},{"name":"image","type":"IMAGE","link":57,"label":"image"},{"name":"bboxes","type":"BBOX","link":null,"shape":7,"label":"bboxes"},{"name":"mask","type":"MASK","link":null,"shape":7,"label":"mask"},{"name":"coordinates_positive","type":"STRING","link":55,"widget":{"name":"coordinates_positive"},"shape":7,"label":"coordinates_positive"},{"name":"coordinates_negative","type":"STRING","link":61,"widget":{"name":"coordinates_negative"},"shape":7,"label":"coordinates_negative"}],"outputs":[{"name":"mask","type":"MASK","links":[59],"slot_index":0,"label":"mask"}],"properties":{"Node name for S&R":"Sam2Segmentation"},"widgets_values":[true,"","",false]},{"id":107,"type":"PreviewAnimation","pos":{"0":1340,"1":-59},"size":{"0":514.92431640625,"1":577.3973999023438},"flags":{},"order":8,"mode":0,"inputs":[{"name":"images","type":"IMAGE","link":43,"shape":7,"label":"images"},{"name":"masks","type":"MASK","link":59,"slot_index":1,"shape":7,"label":"masks"}],"outputs":[],"title":"Preview Animation 16x512x512","properties":{"Node name for S&R":"PreviewAnimation"},"widgets_values":[16,null]}],"links":[[43,102,0,107,0,"IMAGE"],[52,102,0,114,0,"IMAGE"],[53,114,0,112,0,"STRING"],[55,114,0,115,4,"STRING"],[56,106,0,115,0,"SAM2MODEL"],[57,102,0,115,1,"IMAGE"],[59,115,0,107,1,"MASK"],[60,114,1,116,0,"*"],[61,116,0,115,5,"STRING"],[62,114,1,117,0,"STRING"]],"groups":[],"config":{},"extra":{"ds":{"scale":0.5131581182307067,"offset":[396.07947776523474,760.0658700441401]}},"version":0.4}
Workflow2:
{"last_node_id":8,"last_link_id":7,"nodes":[{"id":2,"type":"SAM2ModelLoader (segment anything2)","pos":{"0":109,"1":303},"size":{"0":441,"1":58},"flags":{},"order":0,"mode":0,"inputs":[],"outputs":[{"name":"SAM2_MODEL","type":"SAM2_MODEL","links":[1],"slot_index":0,"label":"SAM2_MODEL"}],"properties":{"Node name for S&R":"SAM2ModelLoader (segment anything2)"},"widgets_values":["sam2_1_hiera_large.pt"]},{"id":3,"type":"LoadImage","pos":{"0":110,"1":427},"size":{"0":315,"1":314},"flags":{},"order":1,"mode":0,"inputs":[],"outputs":[{"name":"IMAGE","type":"IMAGE","links":[3],"slot_index":0,"label":"IMAGE"},{"name":"MASK","type":"MASK","links":null,"label":"MASK"}],"properties":{"Node name for S&R":"LoadImage"},"widgets_values":["IMG_5539.JPG","image"]},{"id":7,"type":"MaskPreview+","pos":{"0":921,"1":433},"size":[210,246],"flags":{},"order":5,"mode":0,"inputs":[{"name":"mask","type":"MASK","link":7,"label":"mask"}],"outputs":[],"properties":{"Node name for S&R":"MaskPreview+"}},{"id":1,"type":"GroundingDinoModelLoader (segment anything2)","pos":{"0":104,"1":186},"size":{"0":554.4000244140625,"1":58},"flags":{},"order":2,"mode":0,"inputs":[],"outputs":[{"name":"GROUNDING_DINO_MODEL","type":"GROUNDING_DINO_MODEL","links":[2],"slot_index":0,"label":"GROUNDING_DINO_MODEL"}],"properties":{"Node name for S&R":"GroundingDinoModelLoader (segment anything2)"},"widgets_values":["GroundingDINO_SwinB (938MB)"]},{"id":6,"type":"PreviewImage","pos":{"0":575,"1":433},"size":[308.81640625,299.23828125],"flags":{},"order":4,"mode":0,"inputs":[{"name":"images","type":"IMAGE","link":6,"label":"images"}],"outputs":[],"properties":{"Node name for S&R":"PreviewImage"}},{"id":4,"type":"GroundingDinoSAM2Segment (segment anything2)","pos":{"0":683,"1":183},"size":{"0":554.4000244140625,"1":122},"flags":{},"order":3,"mode":0,"inputs":[{"name":"sam_model","type":"SAM2_MODEL","link":1,"label":"sam_model"},{"name":"grounding_dino_model","type":"GROUNDING_DINO_MODEL","link":2,"label":"grounding_dino_model"},{"name":"image","type":"IMAGE","link":3,"label":"image"}],"outputs":[{"name":"IMAGE","type":"IMAGE","links":[6],"slot_index":0,"label":"IMAGE"},{"name":"MASK","type":"MASK","links":[7],"slot_index":1,"label":"MASK"}],"properties":{"Node name for S&R":"GroundingDinoSAM2Segment (segment anything2)"},"widgets_values":["person,book",0.3]}],"links":[[1,2,0,4,0,"SAM2_MODEL"],[2,1,0,4,1,"GROUNDING_DINO_MODEL"],[3,3,0,4,2,"IMAGE"],[6,4,0,6,0,"IMAGE"],[7,4,1,7,0,"MASK"]],"groups":[],"config":{},"extra":{"ds":{"scale":0.8264462809917354,"offset":[-12.505597656249961,-82.9064101562497]}},"version":0.4}