当前位置：首页 > article >正文

Sophon边缘盒数据校验及量化

article 2025/1/24 9:47:42

Sophon边缘盒

1 问题定位

1.1 数据校验

在使用Sophon盒子推理时遇到问题：关于识别等模型，使用bmodel和onnx的推理结果差距较大，此时需要确定是否是bmodel推理本身有问题，所以需要数据校验。🎯需要使用tpu-mlir工具

将程序读取的图像数据保存到txt中：

bm_device_mem_t input_dev_mem;
//从多个内存连续的image中得到连续内存的device memory信息
bm_image_get_contiguous_device_mem(image_n, m_converto_imgs.data(), &input_dev_mem);
std::vector<cv::Mat> convert_img_vec;
for(auto convertto_img : m_converto_imgs)
{
    float *buffer = new float[convertto_img.width * convertto_img.height *3];
    // 将bm_image类型的convertto_img从NPU拷贝至CPU,然后遍历保存
    // 为什么不直接读取convertto_img进行保存,因为其数值在NPU中的，所以其存储方式为bbbbbbbbbgggggggggggrrrrrrrrrrrrrr的形式
    bm_image_copy_device_to_host(convertto_img, (void**)&buffer);
    int64_t start = (int64_t)(std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now().time_since_epoch()).count());
    std::ofstream outfile(std::to_string(start)+ "_ng.txt");

    outfile << convertto_img.height << " " << convertto_img.width << " " << 3 << std::endl;
    // 其数据维度为(height, width*3)，且由于是从NPU上拷贝的，所以其数据格式为
    for (int i = 0; i < convertto_img.height; ++i)
    {
        for (int j = 0; j < convertto_img.width; ++j)
        {
            for (int c = 0; c < 3; ++c)
            {
                // 按照121*channel为一行的像素排列方式,遍历换行时需要考虑channel
                // 虽然每次写入三个数据,但实际上的存储是全存b,然后全存g,最后全存r的方式,并不是交叉
                outfile << (float)(*(buffer + i * convertto_img.width * 3 + j * 3 + c))<< " ";
            }
        }
        outfile << std::endl;
    }
    outfile.close();
    delete[] buffer;

    // bm_image_get_data(convertto_img, buffer);
    // cv::Mat resized_img(convertto_img.height, convertto_img.width, CV_32FC3, buffer);
    // convert_img_vec.push_back(resized_img);
}

将txt转换为npz：

import numpy as np

# 读取txt文件中的数据
data = np.loadtxt('./txt_file/1713921805509691_ng.txt')
# 将数据保存为npz格式
np.savez('./txt_file/1713921805509691_ng.npz', data=data)

分别调用onnx和bmodel加载图像数据进行推理，校验bmodel是否有问题：

# 样例
python model_runner.py --input input.npz --model xxx.bmodel --output bmodelout.npz
python model_runner.py --input input.npz --model xxx.onnx --output onnx.npz
python npz_tool.py compare onnxout.npz bmodelout.npz

4.由于刚才保存图像数据时是二维的形式，而模型的输入要求4维，所以可能会报错onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid rank for input: images Got: 2 Expected: 4 Please fix either the inputs or the model.则需要在model_runner.py中将输入的数据拉伸为4维。(由于图像数据保存的形式是bbbbbbbbgggggggggrrrrrrrr，所以可以直接拉伸为3通道的h*w图像)

data[name] = data[name].flatten()
data[name] = data[name].reshape(1, 3, 32, 128)  # 根据onnx模型的实际输入进行拉伸

5.分别执行推理：

# 实际(model_runner.py在./tpu-mlir_v1.7.beta.103-g4d1b1430b-20240507/python/tools路径下)
python model_runner.py --input 1713923372730797_ng.npz --model asset_tag_rec_int_20240415.onnx --output onnx_tag_int_20240415.npz
python model_runner.py --input 1713923372730797_ng.npz --model asset_tag_rec_int_20240415_BM1684.bmodel --output bmodel_tag_int_20240415.npz

# 直接打开生成的两个推理npz格式的推理结果进行比对即可。

2 模型量化

Tpu-mlir参考文档：Tpu-Mlir参考文档

2.1 Tpu-Mlir量化流程

生成mlir文件：

python3 ./tpu-mlir_v1.7.beta.103-g4d1b1430b-20240507/python/tools/model_transform.py \
    --model_name plate_rec_20230406 \
    --model_def ./plate_rec_20230406.onnx \
    --input_shapes [[1,3,96,288]] \
    --mean 123.675,116.28,103.53 \
    --scale 0.017125,0.017507,0.017429 \
    --keep_aspect_ratio \
    --pixel_format rgb \
    --mlir ./plate_rec_20230406.mlir

2.使用部分训练集构建量化校验表cali_table(100-200张即可)：

python ./tpu-mlir_v1.7.beta.103-g4d1b1430b-20240507/python/tools/run_calibration.py plate_rec_20230406.mlir \
    --dataset ./workspace/sophon_model_quality/calibration_npz/plate_rec_npz/ \
    --input_num 200 \
    --tune_num 5 \
    -o ./plate_rec_20230406_cali_table

3.挑选敏感层(挑选出(F32->INT8)中影响较大的层，不进行量化，避免过大的精度损失)生成qtable：

自动挑选(调用run_sensitive_layer.py生成qtable):

python ./workspace/tpu-mlir_v1.7.beta.103-g4d1b1430b-20240507/python/tools/run_sensitive_layer.py face_fea_20230406.mlir --dataset train_img/ --max_float_layers 10 --calibration_table face_fea_cali --chip bm1684x --fp_type F32 -o face_fea_qtable

手动挑选：
- 首先在onnx中搜索Concat，然后找到最后一个Concat节点。
- 找到此节点前的Conv层，然后将这些Conv层的完整"name"，注意不是"outputs"在mlir中进行搜索，记录匹配的层，然后当做参数运行fp_forward.py脚本生成qtable。
- 若找到的层在mlir中没有被匹配到，则可以直接查看第一步中生成的mlir文件，由后向前查找Conv层。`⚠️指定的层之后的层不会参与量化，仍保持FP32精度。所以高精度就指定稍微偏前面的Conv层，高速度就执行最后的Conv。

 # 指定的层使用","隔开
  python ./workspace/tpu-mlir_v1.7.beta.103-g4d1b1430b-20240507/python/tools/fp_forward.py ./fire_extinguisher_dete_20240222.mlir --fpfwd_outputs onnx::Concat_350_Conv,onnx::Concat_357_Conv,onnx::Concat_335_Conv --fp_type F32 -o ./fire_ext_v8_qtable --chip bm1684x

4.利用处理后的量化表生成模型：

python ./workspace/tpu-mlir_v1.7.beta.103-g4d1b1430b-20240507/python/tools/model_deploy.py \
    --mlir ./wl_light_flash_dete_v24.10.22.mlir \
    --quantize INT8 \
    --calibration_table ./wl_light_flash_dete_v24.10.22_cali_table \
    --chip bm1684x \
    --quantize_table wl_light_flash_dete_v24.10.22_qtable \
    --tolerance 0.99,0.99 \
    --model ./wl_light_flash_dete_v24.10.22_BM1684X.bmodel