深度学习,统计训练集每类数据的数量

最近在准备训练新的数据,但是我觉得比较重要的是要清楚每类数据集的数量是多少,这样才能做到心中有数。但是有时候一个xml文件中有好几类,人工数有点不现实,所以写个小脚本也是比较好的。

import os
import cv2 as cv
import xml.etree.ElementTree as ET
import collections #统计模块

#-----读取xml文件来统计训练数据集中每类的个数--------
#classes列表中放入需要训练的类别
classes = ["laji","Heapmaterial","Illegal_parking","FlowManagement"]

def Statistic(xmls_path):
    classNum = []
    xmls_list = os.listdir(xmls_path)  # 读取xml列表
    for xml_name in xmls_list:
        temp2 = xml_name.split('.')[0]  # xml名
        temp2_ = xml_name.split('.')[1]
        if temp2_ != 'xml':
            continue
        xml_path = os.path.join(xmls_path, xml_name)
        root = ET.parse(xml_path).getroot()
        for obj in root.iter('object'):
            difficult = obj.find('difficult').text
            cls = obj.find('name').text
            if cls not in classes or int(difficult) == 1:
                continue
            classNum.append(cls)

    List = collections.Counter(classNum)
    print(List)

if __name__ == '__main__':
	#要读取的xml文件夹路径
    xmls_path = r"C:\Users\WJY\Desktop\L"
    Statistic (xmls_path)

好用的话给个赞呗

你可能感兴趣的:(python,python,深度学习,机器学习)