当前位置：首页 > article >正文

将 YOLO 格式的标注文件（.txt）转换为 VOC 格式的 XML 标注文件

article 2025/3/19 20:21:29

1. 函数定义和注释

def makexml(picPath, txtPath, xmlPath):
"""此函数用于将yolo格式txt标注文件转换为voc格式xml标注文件
在自己的标注图片文件夹下建三个子文件夹，分别命名为picture、txt、xml
"""

函数接收三个参数：

picPath：图片所在文件夹路径。

txtPath：YOLO 格式的 .txt 文件所在文件夹路径。xmlPath：生成的 .xml 文件保存路径。

2. 类别字典

dic = {'0': "light debris", # 创建字典用来对类型进行转换
'1': "nest", # 此处的字典要与自己的classes.txt文件中的类对应，且顺序要一致
}

定义了一个字典 dic，将 YOLO 格式中的类别 ID（0, 1）映射到具体的类别名称（"light debris", "nest"）。

注意：这里的类别顺序需要与 YOLO 的 classes.txt 文件一致。

3. 遍历 txt 文件

files = os.listdir(txtPath)
for i, name in enumerate(files):

os.listdir(txtPath) 获取 txtPath 文件夹中的所有文件名。

使用 enumerate 遍历文件列表，i 是索引，name 是文件名（例如 image1.txt）。

4. 创建 XML 结构

xmlBuilder = Document()
annotation = xmlBuilder.createElement("annotation") # 创建annotation标签
xmlBuilder.appendChild(annotation)

使用 xml.dom.minidom.Document 创建一个 XML 文档对象。

创建根标签 <annotation>，并将其添加到文档中。

5. 读取图片尺寸

txtFile = open(txtPath + '\\' + name)
txtList = txtFile.readlines()
for root, dirs, filename in os.walk(picPath):
img = cv2.imread(root + '\\' + filename[i])
Pheight, Pwidth, Pdepth = img.shape

打开当前处理的 .txt 文件，读取所有行到 txtList。

使用 os.walk 遍历 picPath 中的图片文件。

使用 OpenCV 的 cv2.imread 读取图片，img.shape 返回图片的高度 (Pheight)、宽度 (Pwidth) 和通道数 (Pdepth，通常为 3，表示 RGB）。

注意：这里假设 filename[i] 对应的图片与 name（.txt 文件名）匹配。

6. 添加基本信息到 XML

folder = xmlBuilder.createElement("folder")
foldercontent = xmlBuilder.createTextNode("driving_annotation_dataset")
folder.appendChild(foldercontent)
annotation.appendChild(folder)

filename = xmlBuilder.createElement("filename")
filenamecontent = xmlBuilder.createTextNode(name[0:-4] + ".jpg")
filename.appendChild(filenamecontent)
annotation.appendChild(filename)

size = xmlBuilder.createElement("size")
width = xmlBuilder.createElement("width")
widthcontent = xmlBuilder.createTextNode(str(Pwidth))
width.appendChild(widthcontent)
size.appendChild(width)
# 类似地添加 height 和 depth
annotation.appendChild(size)

创建 <folder> 标签，内容固定为 "driving_annotation_dataset"。

创建 <filename> 标签，内容为去掉 .txt 后缀并加上 .jpg 的文件名（假设图片格式为 .jpg）。

创建 <size> 标签，包含 <width>、<height> 和 <depth> 子标签，分别记录图片的宽度、高度和通道数。

7. 处理 YOLO 标注并转换为 VOC 格式

for j in txtList:
oneline = j.strip().split(" ")
object = xmlBuilder.createElement("object")
picname = xmlBuilder.createElement("name")
namecontent = xmlBuilder.createTextNode(dic[oneline[0]])
picname.appendChild(namecontent)
object.appendChild(picname)

遍历 txtList 中的每一行（每行是一个目标的标注）。

YOLO 格式的 .txt 文件每行通常是：类别ID 中心x 中心y 宽度高度（归一化坐标）。

oneline[0] 是类别 ID，通过 dic 转换为类别名称，写入 <name> 标签。

创建 <object> 标签，表示一个目标。

8. 添加其他目标属性

pose = xmlBuilder.createElement("pose")
posecontent = xmlBuilder.createTextNode("Unspecified")
pose.appendChild(posecontent)
object.appendChild(pose)

truncated = xmlBuilder.createElement("truncated")
truncatedContent = xmlBuilder.createTextNode("0")
truncated.appendChild(truncatedContent)
object.appendChild(truncated)

difficult = xmlBuilder.createElement("difficult")
difficultcontent = xmlBuilder.createTextNode("0")
difficult.appendChild(difficultcontent)
object.appendChild(difficult)

<pose>：目标姿态，这里固定为 "Unspecified"。

<truncated>：是否被截断，0 表示未截断。

<difficult>：是否难以识别，0 表示不难。

9. 计算边界框坐标

bndbox = xmlBuilder.createElement("bndbox")
xmin = xmlBuilder.createElement("xmin")
mathData = int(((float(oneline[1])) * Pwidth + 1) - (float(oneline[3])) * 0.5 * Pwidth)
xminContent = xmlBuilder.createTextNode(str(mathData))
xmin.appendChild(xminContent)
bndbox.appendChild(xmin)

YOLO 的坐标是归一化的（0 到 1），需要转换为 VOC 的绝对像素坐标。
YOLO 格式：
oneline[1]：中心点 x 坐标（归一化）。
oneline[2]：中心点 y 坐标（归一化）。
oneline[3]：宽度（归一化）。
oneline[4]：高度（归一化）。
计算公式：
xmin = (中心x * 图片宽度 + 1) - (宽度 * 图片宽度 / 2)。
ymin = (中心y * 图片高度 + 1) - (高度 * 图片高度 / 2)。
xmax = (中心x * 图片宽度 + 1) + (宽度 * 图片宽度 / 2)。
ymax = (中心y * 图片高度 + 1) + (高度 * 图片高度 / 2)。
+1 是为了避免边界问题。

10. 保存 XML 文件

f = open(xmlPath + '\\' + name[0:-4] + ".xml", 'w')
xmlBuilder.writexml(f, indent='\t', newl='\n', addindent='\t', encoding='utf-8')
f.close()

将生成的 XML 文件保存到 xmlPath，文件名与 .txt 文件相同（去掉 .txt 后缀，改为 .xml）。writexml 格式化输出，带缩进和换行，编码为 UTF-8。

11. 主程序

if __name__ == "__main__":
picPath = "D:\\w-dataset\\net\\yolonet\\images\\val\\"
txtPath = "D:\\w-dataset\\net\\yolonet\\labels\\val\\"
xmlPath = "D:\\w-dataset\\tielu_voc\\Annotations\\"
makexml(picPath, txtPath, xmlPath)

完整代码

from xml.dom.minidom import Document
import os
import cv2


# def makexml(txtPath, xmlPath, picPath):  # txt所在文件夹路径，xml文件保存路径，图片所在文件夹路径
def makexml(picPath, txtPath, xmlPath):  # txt所在文件夹路径，xml文件保存路径，图片所在文件夹路径
    """此函数用于将yolo格式txt标注文件转换为voc格式xml标注文件
    在自己的标注图片文件夹下建三个子文件夹，分别命名为picture、txt、xml
    """
    dic = {'0': "light debris",  # 创建字典用来对类型进行转换
           '1': "nest",  # 此处的字典要与自己的classes.txt文件中的类对应，且顺序要一致
           }
    files = os.listdir(txtPath)
    for i, name in enumerate(files):
        xmlBuilder = Document()
        annotation = xmlBuilder.createElement("annotation")  # 创建annotation标签
        xmlBuilder.appendChild(annotation)
        txtFile = open(txtPath +'\\'+ name)
        txtList = txtFile.readlines()
        for root,dirs,filename in os.walk(picPath):
            img = cv2.imread(root+ '\\'+filename[i])
            Pheight, Pwidth, Pdepth = img.shape

        folder = xmlBuilder.createElement("folder")  # folder标签
        foldercontent = xmlBuilder.createTextNode("driving_annotation_dataset")
        folder.appendChild(foldercontent)
        annotation.appendChild(folder)  # folder标签结束

        filename = xmlBuilder.createElement("filename")  # filename标签
        filenamecontent = xmlBuilder.createTextNode(name[0:-4] + ".jpg")
        filename.appendChild(filenamecontent)
        annotation.appendChild(filename)  # filename标签结束

        size = xmlBuilder.createElement("size")  # size标签
        width = xmlBuilder.createElement("width")  # size子标签width
        widthcontent = xmlBuilder.createTextNode(str(Pwidth))
        width.appendChild(widthcontent)
        size.appendChild(width)  # size子标签width结束

        height = xmlBuilder.createElement("height")  # size子标签height
        heightcontent = xmlBuilder.createTextNode(str(Pheight))
        height.appendChild(heightcontent)
        size.appendChild(height)  # size子标签height结束

        depth = xmlBuilder.createElement("depth")  # size子标签depth
        depthcontent = xmlBuilder.createTextNode(str(Pdepth))
        depth.appendChild(depthcontent)
        size.appendChild(depth)  # size子标签depth结束

        annotation.appendChild(size)  # size标签结束

        for j in txtList:
            oneline = j.strip().split(" ")
            object = xmlBuilder.createElement("object")  # object 标签
            picname = xmlBuilder.createElement("name")  # name标签
            namecontent = xmlBuilder.createTextNode(dic[oneline[0]])
            picname.appendChild(namecontent)
            object.appendChild(picname)  # name标签结束

            pose = xmlBuilder.createElement("pose")  # pose标签
            posecontent = xmlBuilder.createTextNode("Unspecified")
            pose.appendChild(posecontent)
            object.appendChild(pose)  # pose标签结束

            truncated = xmlBuilder.createElement("truncated")  # truncated标签
            truncatedContent = xmlBuilder.createTextNode("0")
            truncated.appendChild(truncatedContent)
            object.appendChild(truncated)  # truncated标签结束

            difficult = xmlBuilder.createElement("difficult")  # difficult标签
            difficultcontent = xmlBuilder.createTextNode("0")
            difficult.appendChild(difficultcontent)
            object.appendChild(difficult)  # difficult标签结束

            bndbox = xmlBuilder.createElement("bndbox")  # bndbox标签
            xmin = xmlBuilder.createElement("xmin")  # xmin标签
            mathData = int(((float(oneline[1])) * Pwidth + 1) - (float(oneline[3])) * 0.5 * Pwidth)
            xminContent = xmlBuilder.createTextNode(str(mathData))
            xmin.appendChild(xminContent)
            bndbox.appendChild(xmin)  # xmin标签结束

            ymin = xmlBuilder.createElement("ymin")  # ymin标签
            mathData = int(((float(oneline[2])) * Pheight + 1) - (float(oneline[4])) * 0.5 * Pheight)
            yminContent = xmlBuilder.createTextNode(str(mathData))
            ymin.appendChild(yminContent)
            bndbox.appendChild(ymin)  # ymin标签结束

            xmax = xmlBuilder.createElement("xmax")  # xmax标签
            mathData = int(((float(oneline[1])) * Pwidth + 1) + (float(oneline[3])) * 0.5 * Pwidth)
            xmaxContent = xmlBuilder.createTextNode(str(mathData))
            xmax.appendChild(xmaxContent)
            bndbox.appendChild(xmax)  # xmax标签结束

            ymax = xmlBuilder.createElement("ymax")  # ymax标签
            mathData = int(((float(oneline[2])) * Pheight + 1) + (float(oneline[4])) * 0.5 * Pheight)
            ymaxContent = xmlBuilder.createTextNode(str(mathData))
            ymax.appendChild(ymaxContent)
            bndbox.appendChild(ymax)  # ymax标签结束

            object.appendChild(bndbox)  # bndbox标签结束

            annotation.appendChild(object)  # object标签结束

        f = open(xmlPath +'\\'+ name[0:-4] + ".xml", 'w')
        xmlBuilder.writexml(f, indent='\t', newl='\n', addindent='\t', encoding='utf-8')
        f.close()


if __name__ == "__main__":
    picPath = "D:\\w-dataset\\net\\yolonet\\images\\val\\"  # 图片所在文件夹路径，后面的/一定要带上
    txtPath = "D:\\w-dataset\\net\\yolonet\\labels\\val\\"  # yolo txt所在文件夹路径，后面的/一定要带上
    xmlPath = "D:\\w-dataset\\tielu_voc\\Annotations\\"  # xml文件保存路径，后面的/一定要带上
    makexml(picPath, txtPath, xmlPath)

查看全文

http://www.kler.cn/a/591608.html