当前位置: 首页 > article >正文

使用transformers调用owlv2实现开放目标检测

目录

  • 安装
  • Demo

安装

pip install transformers

Demo

from PIL import Image, ImageDraw, ImageFont
import numpy as np
import torch
from transformers import AutoProcessor, Owlv2ForObjectDetection
from transformers.utils.constants import OPENAI_CLIP_MEAN, OPENAI_CLIP_STD

processor = AutoProcessor.from_pretrained("/home/share3/mayunchuan/google/owlv2-large-patch14-ensemble")
model = Owlv2ForObjectDetection.from_pretrained("/home/share3/mayunchuan/google/owlv2-large-patch14-ensemble").cuda()

image = Image.open('/home/mayunchuan/lavad/dataset/Thumos14_25fps/frames/video_test_0000293/004902.jpg')
# image = Image.open('/home/mayunchuan/lavad/dataset/Thumos14_25fps/frames/video_validation_0000990/001388.jpg')
# texts = [["a photo of a volleyball", "a photo of a man"]]
texts = [[" javelin"]]
inputs = processor(text=texts, images=image, return_tensors="pt")
inputs['input_ids'] = inputs['input_ids'].cuda()
inputs['attention_mask'] = inputs['attention_mask'].cuda()
inputs['pixel_values'] = inputs['pixel_values'].cuda()
# forward pass
with torch.no_grad():
    outputs = model(**inputs)

# Note: boxes need to be visualized on the padded, unnormalized image
# hence we'll set the target image sizes (height, width) based on that

def get_preprocessed_image(pixel_values):
    pixel_values = pixel_values.squeeze().cpu().numpy()
    unnormalized_image = (pixel_values * np.array(OPENAI_CLIP_STD)[:, None, None]) + np.array(OPENAI_CLIP_MEAN)[:, None, None]
    unnormalized_image = (unnormalized_image * 255).astype(np.uint8)
    unnormalized_image = np.moveaxis(unnormalized_image, 0, -1)
    unnormalized_image = Image.fromarray(unnormalized_image)
    return unnormalized_image

unnormalized_image = get_preprocessed_image(inputs.pixel_values)

target_sizes = torch.Tensor([unnormalized_image.size[::-1]])
# Convert outputs (bounding boxes and class logits) to final bounding boxes and scores
results = processor.post_process_object_detection(
    outputs=outputs, threshold=0.2, target_sizes=target_sizes
)

i = 0  # Retrieve predictions for the first image for the corresponding text queries
text = texts[i]
boxes, scores, labels = results[i]["boxes"], results[i]["scores"], results[i]["labels"]

for box, score, label in zip(boxes, scores, labels):
    box = [round(i, 2) for i in box.tolist()]
    print(f"Detected {text[label]} with confidence {round(score.item(), 3)} at location {box}")

# 绘制边界框
draw = ImageDraw.Draw(unnormalized_image)

for score, label, box in zip(scores, labels, boxes):
    box = [round(i, 2) for i in box.tolist()]
    x, y, x2, y2 = tuple(box)
    draw.rectangle((x, y, x2, y2), outline="red", width=1)
    draw.text((x, y), text[label.item()], font_size=20, fill="black")

# 保存标记好的图片
unnormalized_image.save("marked_image.jpg")

http://www.kler.cn/news/331755.html

相关文章:

  • 数据结构:并查集
  • Axure大屏可视化模板在不同领域中的实际应用案例
  • 基于物联网的智能环境监测系统(论文+源码)
  • Redis数据库与GO(一):安装,string,hash
  • HUAWEI WATCH GT 系列安装第三方应用
  • 【正则表达式】粗浅学习
  • Linux编辑器Vim与Nano之全面比较
  • 【60天备战2024年11月软考高级系统架构设计师——第36天:系统安全设计——数据加密】
  • C++-vector模拟实现
  • Visual Studio 小技巧记录
  • springboot 集成 camunda
  • UNIAPP 动态菜单实现方法
  • Nginx性能优化全攻略:打造高性能Web服务器
  • 订餐点餐|订餐系统基于java的订餐点餐系统小程序设计与实现(源码+数据库+文档)
  • C# 构造方法执行流程深度解析:从实例化到对象初始化
  • ElasticSearch 备考 -- 备份和恢复
  • 通过ProviewR在ARMxy边缘计算网关上实现能源管理
  • 【HarmonyOS NEXT】实现防截屏功能
  • 《Linux从小白到高手》理论篇(十二):深入理解Linux的计划任务/定时任务
  • SpringBoot整合JPA 基础使用