标量化rknn的输入输出向量转换处理
这是一篇技术探索。yolo11模型生成后,我发现它无法在rknn环境正确识别出目标对象。而在宿主机上,或者直接调用.pt转换过的.onnx模型是可以得到正确结果的。这篇文章对应近乎一天的工作。最终的结论就是。这是一个模型量化的问题,与yolo的版本关系不大。
我现在已经能够让float32格式,未量化的.rknn模型在rk3588上正确执行,并得到正确的结果。量化后,首先是识别效果下降,然后是识别框的位置也不对。然后,整篇文章有几点结论:
1.在调用yolo模型转换的rknn模型时,一般的rknn.config的mean_value应该是(0,0,0),std_value应该是(1,1,1)。
2.在rknn提供的标准.onnx2rknn中,输入输出接口即使经过标准量化,也仍然是float32格式。但是,rknn-toolkit输出rknn模型时,建议在实际工程部署时进行量化处理。
3.yolo11相对yolo5,在我查询到的.pt->.onnx->.rknn链路处理完毕后,输出参数会变为6个。我看到的例程里,后续的画框处理与yolo5一致,只用到了3个(1,x,n,n)的矩阵。所以似乎可以进一步简化输入输出参数。
4.yolo11还有一个改进是,输出矩阵的维度由原有的254 class,改为与用户的模型的class个数一致,缩减了输出矩阵的规模。
5.我现在遭遇的问题是,模型的原始参数表离散度太大,无法有效进行标量化处理。rknn提供的自定义标量处理流程中,.cfg文件无法找到常量参数表对应的标量化参数配置。
6.然后注意在.pt->.onnx时,可能需要指定opset。这些细节的参数可能是不可省略的,时间有限,我无法做针对性的测试了。
我下一步需要解决的就是问题5。他应该有明确的解决方案。我会对照比如yolo10的rknn移植代码来观察一下之前的同志是如何做适配处理的。因为这个问题,不是到yolo11才出现的。
依据我已经掌握的信息,感觉它可能会有如下的处理策略:
- 修改模型,加入额外的参数。
- 调整模型的标量化策略,比如由线性的离散修改为其他的方式,
- 可能涉及对.rknn的转换后的标量模型,进行二次训练。
- 调整特定的标量参数,比如由uint8,改为int16
- 先将.onnx转为一个已经进行量化处理的模型,比如onnx.runtime
- 修改yolo的训练模型,考虑在训练时即使用标量化的模型。
- 查找rknn-toolkit用户手册,因为这是一个通用的问题,昨天随手找到的two stage 转换之类的策略,rknn那里又该会有比较稳妥的解决方案。
这个部分今天可能会深入处理。
文中一些无效的信息:
1.1.无需考量,这是量化模型本身的内部处理逻辑。只用来了解量化时的一些配置参数。
1.问题
当onnx模型尚未标量化(quantize)之前,自训练数据集能够有效识别目标对象,但是,rknn模型,通常是标量化转换过的,就是输入输出参数已经从float32->u8。那么如何处理丢给rknn模型的输入输出参数,然后进行反标量化(dequantize)?
一般的处理,是需要拿到0点和缩放scale。
1.1 通用的反标量化过程 - 例程
import numpy as np
# 假设这是从模型量化过程中获取到的缩放因子和零点
scales = [0.1, 0.2, 0.15, 0.25, 0.12, 0.18]
zero_points = [128, 127, 129, 126, 128, 127]
# 假设这是模型输出的 uint8 数据
outputs_u8 = {
'reg1': np.random.randint(0, 256, size=(1, 10), dtype=np.uint8),
'cls1': np.random.randint(0, 256, size=(1, 5), dtype=np.uint8),
'reg2': np.random.randint(0, 256, size=(1, 10), dtype=np.uint8),
'cls2': np.random.randint(0, 256, size=(1, 5), dtype=np.uint8),
'reg3': np.random.randint(0, 256, size=(1, 10), dtype=np.uint8),
'cls3': np.random.randint(0, 256, size=(1, 5), dtype=np.uint8)
}
# 反量化函数
def dequantize(u8_data, scale, zero_point):
return (u8_data.astype(np.float32) - zero_point) * scale
# 对每个输出进行反量化
outputs_f32 = {}
output_names = ['reg1', 'cls1', 'reg2', 'cls2', 'reg3', 'cls3']
for i, name in enumerate(output_names):
outputs_f32[name] = dequantize(outputs_u8[name], scales[i], zero_points[i])
# 打印反量化后的结果
for name, data in outputs_f32.items():
print(f"{name}: {data.dtype}, shape={data.shape}")
1.1.1 为了确认模型的传入传出参数,我们对模型传入传出前做了log日志:
你可能会疑惑,这些关于输入输出响亮的类型信息是如何得到的,相关完整代码参见附录A,相关的调用模型,打印输入输出信息的代码片段如下:
pred_results = ubuntu_detect(ONNX_MODEL, image, CLASSES)
print(f'onnx model{ONNX_MODEL}')
print(f'<<inputs array, shape{image.shape}, datatype={image.dtype}')
for i in range(len(pred_results)):
print(f'>>output array[i], shape{pred_results[i].shape}, datatype={pred_results[i].dtype}')
调用模型前的输入输出参数如下:
<<inputs array, shape(1, 3, 640, 640), datatype=float32
>>output array[i], shape(1, 64, 80, 80), datatype=float32
>>output array[i], shape(1, 80, 80, 80), datatype=float32
>>output array[i], shape(1, 64, 40, 40), datatype=float32
>>output array[i], shape(1, 80, 40, 40), datatype=float32
>>output array[i], shape(1, 64, 20, 20), datatype=float32
>>output array[i], shape(1, 80, 20, 20), datatype=float32
注意:这组模型及调用是能够对目标对象进行识别的:
detectResult: 16
obj num is : 2
2.rknn-toolkit2.3文档查找 - 找到标量化的参数
2.1 尝试先跑通非量化版本
完整代码参见附录B,相关输出信息:
我们先尝试把这个输入输出的default dtype修改为float32,看看有没有可能行。有的,那就是onnx模型转换时不量化,把onnx2rknn.py中的:
rknn.build(do_quantization=False, dataset=DATASET)
此时的onnx2rknn输出信息:
然后将 detet_by_onnx.py中原本对onnx模型的调用改为对rknn模型的调用:
RKNN_MODEL = r'/home/firefly/app/git/yolo11_rknn_test/models/yolov11s-640-640_rk3588-float-20250219.rknn'
IMG_PATH = '/home/firefly/app/git/yolo11_rknn_test/images/cake26.jpg'
QUANTIZE_ON = False
def rk3588_detect(model, pic, classes):
rknn = RKNN(verbose=True)
rknn.config(mean_values=[[0,0,0]],
std_values=[[255,255,255]],
quant_img_RGB2BGR=False,
target_platform='rk3588')
#image.net's
#another [0,0,0] [255,255,255]
#mean_values = [0.485, 0.456, 0.406] 和 std_values = [0.229, 0.224, 0.225]
rknn.load_rknn(path=model)
rknn.init_runtime(target="rk3588", core_mask=RKNN.NPU_CORE_AUTO)
outputs = rknn.inference(inputs=[pic], data_format=['nchw'])
return outputs
注意:传入的pic实际是:nchw格式(1-3-640-640)即:number- channel - height - width格式,但是结果不对:
firefly@firefly:~/app/git/yolo11_rknn_test/detect$ python3 ./detect_rk3588.py
This is main ....
<class 'numpy.ndarray'>
I rknn-toolkit2 version: 2.3.0
I target set by user is: rk3588
onnx model/home/firefly/app/git/yolo11_rknn_test/models/yolov11s-640-640_rk3588-float-20250219.rknn
<<inputs array, shape(1, 3, 640, 640), datatype=float32
>>output array[i], shape(1, 64, 80, 80), datatype=float32
>>output array[i], shape(1, 80, 80, 80), datatype=float32
>>output array[i], shape(1, 64, 40, 40), datatype=float32
>>output array[i], shape(1, 80, 40, 40), datatype=float32
>>output array[i], shape(1, 64, 20, 20), datatype=float32
>>output array[i], shape(1, 80, 20, 20), datatype=float32
detectResult: 0
obj num is : 0
2.1.1 bufix
问题出在meanvalue, stdvalue, 对float32而言,这两个数值建议设置为:
rknn.config(mean_values=[[0,0,0]],
std_values=[[1,1,1]],
2.2 量化
量化过程中注意这个提示,依照《03_Rockchip_RKNPU_API_Reference_RKNN_Toolkit2_V2.3.0_EN.pdf》2:12:1应该被这样处理:
def rk3588_onnx2rknn(model, pic, classes):
rknn = RKNN(verbose=True)
rknn.config(mean_values=[[0,0,0]],
std_values=[[1,1,1]],
quant_img_RGB2BGR=False,
#force_float32_nodes=['onnx::conv_1016', 'onnx::conv_1217'], # 指定某层使用 float32 模式
target_platform='rk3588')
#image.net's
#another [0,0,0] [255,255,255]
#mean_values = [0.485, 0.456, 0.406] 和 std_values = [0.229, 0.224, 0.225]
rknn.load_onnx(model=model)
if QUANTIZE_ON:
rknn.hybrid_quantization_step1(dataset='./dataset.txt')
ret = rknn.hybrid_quantization_step2(
model_input='./moonpie_80_0_11s_Feb19.model',
data_input='./moonpie_80_0_11s_Feb19.data',
model_quantization_cfg='./moonpie_80_0_11s_Feb19.quantization.cfg')
else:
rknn.build(do_quantization=False, dataset='dataset.txt', rknn_batch_size=1)
rknn.export_rknn("./onnx2rknn.rknn");
#rknn.load_rknn("./onnx2rknn.rknn")
rknn.init_runtime(target="rk3588", core_mask=RKNN.NPU_CORE_AUTO)
outputs = rknn.inference(inputs=[pic], data_format=['nchw'])
return outputs
理论上上面这段代码需要执行两次。用户需要修改:.cfg文件中越界的那部分layer.将其类型人为改为float32。比如对输入输出参数的修改:
import torch
try:
dummy_input = torch.randn(1,3,640,640)
input_names=['data']
output_names=['reg1', 'cls1','reg2', 'cls2','reg3', 'cls3']
torch.onnx.export(self.model, dummy_input, '/app/rk3588_build/rknn_models/moonpie_80_0_11s_Feb19.onnx',
verbose=False, input_names=input_names, output_names=output_names, opset_version=11)
except RuntimeError:
print('error occur when onnx export.')
#do nothing
#pass
finally:
print("==================onnx self gened==========")
比如,依据onnx2rknn的提示:
你至少应该对输入输出参数进行量化处理,相应的位置在:
的:
我们试着将其修改为int16,看看效果:
嗯....传入传出参数的类型不是在这里改的。
附录A detect_by_onnx.py
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
import os
import sys
from math import exp
import cv2
import numpy as np
import onnxruntime as ort
ROOT = os.getcwd()
if str(ROOT) not in sys.path:
sys.path.append(str(ROOT))
ONNX_MODEL = r'/home/firefly/app/git/yolo11_rknn_test/models/moonpie_80_0_11s_Feb19.onnx'
IMG_PATH = '/home/firefly/app/git/yolo11_rknn_test/images/cake26.jpg'
QUANTIZE_ON = False
CLASSES = ['moonpie', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush']
meshgrid = []
class_num = len(CLASSES)
headNum = 3
strides = [8, 16, 32]
mapSize = [[80, 80], [40, 40], [20, 20]]
nmsThresh = 0.45
objectThresh = 0.5
input_imgH = 640
input_imgW = 640
def ubuntu_detect(model, pic, classes):
# 明确指定执行提供者
providers = ['AzureExecutionProvider', 'CPUExecutionProvider']
ort_session = ort.InferenceSession(model, providers=providers)
pred_results = (ort_session.run(None, {'data': pic}))
return pred_results
class DetectBox:
def __init__(self, classId, score, xmin, ymin, xmax, ymax):
self.classId = classId
self.score = score
self.xmin = xmin
self.ymin = ymin
self.xmax = xmax
self.ymax = ymax
class YOLOV11DetectObj:
def __init__(self):
pass
def GenerateMeshgrid(self):
for index in range(headNum):
for i in range(mapSize[index][0]):
for j in range(mapSize[index][1]):
meshgrid.append(j + 0.5)
meshgrid.append(i + 0.5)
def IOU(self, xmin1, ymin1, xmax1, ymax1, xmin2, ymin2, xmax2, ymax2):
xmin = max(xmin1, xmin2)
ymin = max(ymin1, ymin2)
xmax = min(xmax1, xmax2)
ymax = min(ymax1, ymax2)
innerWidth = xmax - xmin
innerHeight = ymax - ymin
innerWidth = innerWidth if innerWidth > 0 else 0
innerHeight = innerHeight if innerHeight > 0 else 0
innerArea = innerWidth * innerHeight
area1 = (xmax1 - xmin1) * (ymax1 - ymin1)
area2 = (xmax2 - xmin2) * (ymax2 - ymin2)
total = area1 + area2 - innerArea
return innerArea / total
def NMS(self, detectResult):
predBoxs = []
sort_detectboxs = sorted(detectResult, key=lambda x: x.score, reverse=True)
for i in range(len(sort_detectboxs)):
xmin1 = sort_detectboxs[i].xmin
ymin1 = sort_detectboxs[i].ymin
xmax1 = sort_detectboxs[i].xmax
ymax1 = sort_detectboxs[i].ymax
classId = sort_detectboxs[i].classId
if sort_detectboxs[i].classId != -1:
predBoxs.append(sort_detectboxs[i])
for j in range(i + 1, len(sort_detectboxs), 1):
if classId == sort_detectboxs[j].classId:
xmin2 = sort_detectboxs[j].xmin
ymin2 = sort_detectboxs[j].ymin
xmax2 = sort_detectboxs[j].xmax
ymax2 = sort_detectboxs[j].ymax
iou = self.IOU(xmin1, ymin1, xmax1, ymax1, xmin2, ymin2, xmax2, ymax2)
if iou > nmsThresh:
sort_detectboxs[j].classId = -1
return predBoxs
def sigmoid(self, x):
return 1 / (1 + exp(-x))
def postprocess(self, out, img_h, img_w):
print('postprocess ... ')
detectResult = []
output = []
for i in range(len(out)):
print(out[i].shape)
output.append(out[i].reshape((-1)))
scale_h = img_h / input_imgH
scale_w = img_w / input_imgW
gridIndex = -2
cls_index = 0
cls_max = 0
for index in range(headNum):
reg = output[index * 2 + 0]
cls = output[index * 2 + 1]
for h in range(mapSize[index][0]):
for w in range(mapSize[index][1]):
gridIndex += 2
if 1 == class_num:
cls_max = sigmoid(cls[0 * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w])
cls_index = 0
else:
for cl in range(class_num):
cls_val = cls[cl * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w]
if 0 == cl:
cls_max = cls_val
cls_index = cl
else:
if cls_val > cls_max:
cls_max = cls_val
cls_index = cl
cls_max = self.sigmoid(cls_max)
if cls_max > objectThresh:
regdfl = []
for lc in range(4):
sfsum = 0
locval = 0
for df in range(16):
temp = exp(reg[((lc * 16) + df) * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w])
reg[((lc * 16) + df) * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w] = temp
sfsum += temp
for df in range(16):
sfval = reg[((lc * 16) + df) * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w] / sfsum
locval += sfval * df
regdfl.append(locval)
x1 = (meshgrid[gridIndex + 0] - regdfl[0]) * strides[index]
y1 = (meshgrid[gridIndex + 1] - regdfl[1]) * strides[index]
x2 = (meshgrid[gridIndex + 0] + regdfl[2]) * strides[index]
y2 = (meshgrid[gridIndex + 1] + regdfl[3]) * strides[index]
xmin = x1 * scale_w
ymin = y1 * scale_h
xmax = x2 * scale_w
ymax = y2 * scale_h
xmin = xmin if xmin > 0 else 0
ymin = ymin if ymin > 0 else 0
xmax = xmax if xmax < img_w else img_w
ymax = ymax if ymax < img_h else img_h
box = DetectBox(cls_index, cls_max, xmin, ymin, xmax, ymax)
detectResult.append(box)
# NMS
print('detectResult:', len(detectResult))
predBox = self.NMS(detectResult)
return predBox
def precess_image(self, img_src, resize_w, resize_h):
print(f'{type(img_src)}')
image = cv2.resize(img_src, (resize_w, resize_h), interpolation=cv2.INTER_LINEAR)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = image.astype(np.float32)
image /= 255.0
return image
def detect(self, img_path):
self.GenerateMeshgrid()
orig = cv2.imread(img_path)
if orig is None:
print(f"无法读取图像: {img_path}")
return
img_h, img_w = orig.shape[:2]
image = self.precess_image(orig, input_imgW, input_imgH)
image = image.transpose((2, 0, 1))
image = np.expand_dims(image, axis=0)
#image = np.ones((1, 3, 640, 640), dtype=np.uint8)
# print(image.shape)
#ort_session = ort.InferenceSession(ONNX_MODEL)
#pred_results = (ort_session.run(None, {'data': image}))
pred_results = ubuntu_detect(ONNX_MODEL, image, CLASSES)
print(f'onnx model{ONNX_MODEL}')
print(f'<<inputs array, shape{image.shape}, datatype={image.dtype}')
for i in range(len(pred_results)):
print(f'>>output array[i], shape{pred_results[i].shape}, datatype={pred_results[i].dtype}')
out = []
for i in range(len(pred_results)):
out.append(pred_results[i])
predbox = self.postprocess(out, img_h, img_w)
np.save('onnx_out1.npy', out[1])
np.save('onnx_out3.npy', out[3])
np.save('onnx_out5.npy', out[5])
print('obj num is :', len(predbox))
for i in range(len(predbox)):
xmin = int(predbox[i].xmin)
ymin = int(predbox[i].ymin)
xmax = int(predbox[i].xmax)
ymax = int(predbox[i].ymax)
classId = predbox[i].classId
score = predbox[i].score
cv2.rectangle(orig, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)
ptext = (xmin, ymin)
title = CLASSES[classId] + "%.2f" % score
cv2.putText(orig, title, ptext, cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2, cv2.LINE_AA)
cv2.imwrite('./test_onnx_result.jpg', orig)
if __name__ == '__main__':
print('This is main ....')
img_path = IMG_PATH
obj = YOLOV11DetectObj()
obj.detect(img_path)
附录B onnx2rknn.py
import cv2
import numpy as np
from rknn.api import RKNN
import os
onnx_model = './rknn_models/moonpie_80_0_11s_Feb19.onnx' #onnx路径
save_rknn_dir = './rknn_models/moonpie_80_0_11s_Feb19.rknn'#rknn保存路径
IMG_PATH = '/app/rk3588_build/cake26.jpg'
DATASET = './dataset_cake.txt'
if __name__ == '__main__':
platform = 'rk3588'
exp = 'yolov11'
Width = 640
Height = 640
# Model from https://github.com/airockchip/rknn_model_zoo
MODEL_PATH = onnx_model
NEED_BUILD_MODEL = True
# NEED_BUILD_MODEL = False
im_file = IMG_PATH
# Create RKNN object
rknn = RKNN()
OUT_DIR = "rknn_models"
RKNN_MODEL_PATH = './{}/{}_{}.rknn'.format(
OUT_DIR, exp+'-'+str(Width)+'-'+str(Height), platform)
if NEED_BUILD_MODEL:
rknn.config(mean_values=[[0, 0, 0]], std_values=[
[255, 255, 255]], target_platform=platform)
# Load model
print('--> Loading model')
ret = rknn.load_onnx(MODEL_PATH)
if ret != 0:
print('load model failed!')
exit(ret)
print('done')
# Build model
print('--> Building model')
ret = rknn.build(do_quantization=True, dataset=DATASET)
if ret != 0:
print('build model failed.')
exit(ret)
print('done')
# Export rknn model
if not os.path.exists(OUT_DIR):
os.mkdir(OUT_DIR)
print('--> Export RKNN model: {}'.format(RKNN_MODEL_PATH))
ret = rknn.export_rknn(RKNN_MODEL_PATH)
if ret != 0:
print('Export rknn model failed.')
exit(ret)
print('done')
else:
ret = rknn.load_rknn(RKNN_MODEL_PATH)
rknn.release()