Vision - 开源视觉分割算法框架 Grounded SAM2 设置与推理教程 (1) ...

惊落一身雪 · 2024-11-4 18:23:03

接待关注我的CSDN：https://spike.blog.csdn.net/
本文地址：https://spike.blog.csdn.net/article/details/143388189
免责声明：本文来源于个人知识与公开资料，仅用于学术交换，接待讨论，不支持转载。

Grounded SAM2 集成多个先辈模型的视觉 AI 框架，融合 GroundingDINO、Florence-2 和 SAM2 等模型，实现开放域目标检测、分割和跟踪等多项视觉使命的突破性希望，通过自然语言形貌来定位图像中的目标，天生精细的目标分割掩码，在视频序列中持续跟踪目标，保持 ID 的一致性。
Paper: Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks，SAM 版本由 1.0 升级至 2.0
1. 环境设置

GitHub: Grounded-SAM-2

git clone https://github.com/IDEA-Research/Grounded-SAM-2
cd Grounded-SAM-2

复制代码

准备 SAM 2.1 模型，格式是 pt 的，GroundingDINO 模型，格式是 pth 的，即：

wget https://huggingface.co/facebook/sam2.1-hiera-large/resolve/main/sam2.1_hiera_large.pt?download=true -O sam2.1_hiera_large.pt
wget https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swint_ogc.pth

复制代码

最新模型位置：

cd checkpoints
ln -s [your path]/llm/workspace_comfyui/ComfyUI/models/sam2/sam2_hiera_large.pt sam2_hiera_large.pt
cd gdino_checkpoints
ln -s [your path]/llm/workspace_comfyui/ComfyUI/models/grounding-dino/groundingdino_swinb_cogcoor.pth groundingdino_swinb_cogcoor.pth
ln -s [your path]/llm/workspace_comfyui/ComfyUI/models/grounding-dino/groundingdino_swint_ogc.pth groundingdino_swint_ogc.pth

复制代码

激活环境：

conda activate sam2

复制代码

测试 PyTorch：

import torch
print(torch.__version__) # 2.5.0+cu124
print(torch.cuda.is_available()) # True
exit()
echo $CUDA_HOME

复制代码

安装 Grounding DINO：

pip install --no-build-isolation -e grounding_dino
pip show groundingdino

复制代码

安装 SAM2：

pip install --no-build-isolation -e .
pip install --no-build-isolation -e ".[notebooks]" # 适配 Jupyter
pip show SAM-2

复制代码

设置参数：视觉分割开源算法 SAM2(Segment Anything 2) 设置与推理
依靠文件：

cd grounding_dino/
pip install -r requirements.txt --verbose

复制代码

2. 测试图像

测试脚本：grounded_sam2_local_demo.py
导入相关的依靠包：

import os
import cv2
import json
import torch
import numpy as np
import supervision as sv
import pycocotools.mask as mask_util
from pathlib import Path
from torchvision.ops import box_convert
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from grounding_dino.groundingdino.util.inference import load_model, load_image, predict
from PIL import Image
import matplotlib.pyplot as plt

复制代码

设置数据，以及依靠环境，其中包罗：

输入文本提示，例如袜子(socks) 和吉他(guitar)
输入图像
SAM2 模型 v2.1 版本，以及设置
GroundingDINO (DETR with Improved deNoising anchOr boxes, 改进的去噪锚框的DETR) 模型，以及设置
Box 阈值、文本阈值
输出文件夹与Json

即：

TEXT_PROMPT = "socks. guitar."
#IMG_PATH = "notebooks/images/truck.jpg"
IMG_PATH = "[your path]/llm/vision_test_data/image2.png"
image = Image.open(IMG_PATH)
plt.figure(figsize=(9, 6))
plt.title(f"annotated_frame")
plt.imshow(image)
SAM2_CHECKPOINT = "./checkpoints/sam2.1_hiera_large.pt"
SAM2_MODEL_CONFIG = "configs/sam2.1/sam2.1_hiera_l.yaml"
GROUNDING_DINO_CONFIG = "grounding_dino/groundingdino/config/GroundingDINO_SwinT_OGC.py"
GROUNDING_DINO_CHECKPOINT = "gdino_checkpoints/groundingdino_swint_ogc.pth"
BOX_THRESHOLD = 0.35
TEXT_THRESHOLD = 0.25
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
OUTPUT_DIR = Path("outputs/grounded_sam2_local_demo")
DUMP_JSON_RESULTS = True
# create output directory
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

复制代码

加载 SAM2 模型，得到 sam2_predictor，即：

# build SAM2 image predictor
sam2_checkpoint = SAM2_CHECKPOINT
model_cfg = SAM2_MODEL_CONFIG
sam2_model = build_sam2(model_cfg, sam2_checkpoint, device=DEVICE)
sam2_predictor = SAM2ImagePredictor(sam2_model)

复制代码

加载 GroundingDINO 模型，得到 grounding_model，即：

# build grounding dino model
grounding_model = load_model(
model_config_path=GROUNDING_DINO_CONFIG,
model_checkpoint_path=GROUNDING_DINO_CHECKPOINT,
device=DEVICE
)

复制代码

SAM2 加载图像数据，即：

text = TEXT_PROMPT
img_path = IMG_PATH
# image(原图), image_transformed(正则化图像)
image_source, image = load_image(img_path)
sam2_predictor.set_image(image_source)

复制代码

GroudingDINO 猜测 Bounding Box，输入模型、图像、文本、Box和Text阈值，即：

load_image() 和 predict() 都来自于 GroundingDINO，数据和模型匹配。

boxes, confidences, labels = predict(
model=grounding_model,
image=image,
caption=text,
box_threshold=BOX_THRESHOLD,
text_threshold=TEXT_THRESHOLD,
)

复制代码

适配不同 Box 的格式：

h, w, _ = image_source.shape
boxes = boxes * torch.Tensor([w, h, w, h])
input_boxes = box_convert(boxes=boxes, in_fmt="cxcywh", out_fmt="xyxy").numpy()

复制代码

SAM2 依靠的 PyTorch 设置：

# FIXME: figure how does this influence the G-DINO model
torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()
if torch.cuda.get_device_properties(0).major >= 8:
# turn on tfloat32 for Ampere GPUs (https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices)
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True

复制代码

SAM2 猜测图像：

masks, scores, logits = sam2_predictor.predict(
point_coords=None,
point_labels=None,
box=input_boxes,
multimask_output=False,
)

复制代码

后处理惩罚猜测结果：

"""
Post-process the output of the model to get the masks, scores, and logits for visualization
"""
# convert the shape to (n, H, W)
if masks.ndim == 4:
masks = masks.squeeze(1)
confidences = confidences.numpy().tolist()
class_names = labels
class_ids = np.array(list(range(len(class_names))))
labels = [
f"{class_name} {confidence:.2f}"
for class_name, confidence
in zip(class_names, confidences)
]

复制代码

输出结果可视化：

"""
Visualize image with supervision useful API
"""
img = cv2.imread(img_path)
detections = sv.Detections(
xyxy=input_boxes, # (n, 4)
mask=masks.astype(bool), # (n, h, w)
class_id=class_ids
)
box_annotator = sv.BoxAnnotator()
annotated_frame = box_annotator.annotate(scene=img.copy(), detections=detections)
label_annotator = sv.LabelAnnotator()
annotated_frame = label_annotator.annotate(scene=annotated_frame, detections=detections, labels=labels)
cv2.imwrite(os.path.join(OUTPUT_DIR, "groundingdino_annotated_image.jpg"), annotated_frame)
plt.figure(figsize=(9, 6))
plt.title(f"annotated_frame")
plt.imshow(annotated_frame[:,:,::-1])
mask_annotator = sv.MaskAnnotator()
annotated_frame = mask_annotator.annotate(scene=annotated_frame, detections=detections)
cv2.imwrite(os.path.join(OUTPUT_DIR, "grounded_sam2_annotated_image_with_mask.jpg"), annotated_frame)
plt.figure(figsize=(9, 6))
plt.title(f"annotated_frame")
plt.imshow(annotated_frame[:,:,::-1])

复制代码

GroundingDINO 的 Box 效果，准确检测出袜子和吉他，两类实体：

SAM2 的分割效果，如下：

转换成 COCO 数据格式：

def single_mask_to_rle(mask):
rle = mask_util.encode(np.array(mask[:, :, None], order="F", dtype="uint8"))[0]
rle["counts"] = rle["counts"].decode("utf-8")
return rle
if DUMP_JSON_RESULTS:
# convert mask into rle format
mask_rles = [single_mask_to_rle(mask) for mask in masks]
input_boxes = input_boxes.tolist()
scores = scores.tolist()
# save the results in standard format
results = {
"image_path": img_path,
"annotations" : [
{
"class_name": class_name,
"bbox": box,
"segmentation": mask_rle,
"score": score,
}
for class_name, box, mask_rle, score in zip(class_names, input_boxes, mask_rles, scores)
],
"box_format": "xyxy",
"img_width": w,
"img_height": h,
}
with open(os.path.join(OUTPUT_DIR, "grounded_sam2_local_image_demo_results.json"), "w") as f:
json.dump(results, f, indent=4)

复制代码

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

		自动登录	找回密码
密码			立即注册

Vision - 开源视觉分割算法框架 Grounded SAM2 设置与推理教程 (1) ...

本帖子中包含更多资源

0 个回复

快速回复

楼主热帖

标签云

Vision - 开源视觉分割算法框架 Grounded SAM2 设置与推理 教程 (1) ...

本帖子中包含更多资源

0 个回复

快速回复

楼主热帖

标签云

Vision - 开源视觉分割算法框架 Grounded SAM2 设置与推理教程 (1) ...