label-studio自动训练模型

文章目录

    • 1. 一些补充
      • 1.1 路径修改
      • 1.2 脚本
    • 2. 自动训练
      • 2.1 目录结构
      • 2.2 `train.py`
      • 2.3 `model.py`
      • 2.4 训练

1. 一些补充

在本地数据集的自动标注中详细给出了如何在本地部署一个label-studio,并且使用本地某个目录下的数据集进行标注,最后导出,还给出了如何写一个简单的启动bat脚本,以及后端的model.py文件。但是还有一些需要进行优化和补充的地方。

1.1 路径修改

在本地数据集的自动标注的model.py中,限制了best.pt模型的路径是python脚本的同级目录,如果只是进行标注,就不需要进行修改了,但是这篇教程中将要进行自动化模型训练,为了使目录结构清晰,并且方便切换,更改一下路径。在python脚本的同级目录下新建一个model文件夹,专门用于存储用来进行标注的训练好的模型。

./ml_backend_test1/
|-- Dockerfile
|-- README.md
|-- __pycache__
|   |-- model.cpython-312.pyc
|-- _wsgi.py
|-- docker-compose.yml
|-- model
|   |-- best.pt
|-- model.py
|-- requirements-base.txt
|-- requirements-test.txt
|-- requirements.txt
|-- test_api.py
|-- yolo11

对应的代码中将来也需要进行处理。

1.2 脚本

在本地数据集的自动标注中给出了启动label-studio的脚本,这里再给出如何启动一个后端服务的脚本。如启动ml_backend_test1的脚本,脚本需要放在后端的根目录,如c:\dl\label_studio_backend

按照如下的目录结构:

./label_studio_backend/
|-- cache.db
|-- load_test1_backend.bat
|-- ml_backend_test1

这里的load_test1_backend.bat脚本的内容:

@echo off

:: 打印信息
echo 开始启动 label-studio-ml

:: 配置区域
set CONDA_ENV=label_studio
set BACKEND_NAME=ml_backend_test1
set BACKEND_PORT=9090

:: 激活conda环境
echo.
echo [1/2] 正在激活 Conda 环境 %CONDA_ENV%
call conda activate %CONDA_ENV%

:: 检查激活是否成功
if %errorlevel% neq 0 (
    echo.
    echo 激活 Conda 环境 %CONDA_ENV% 失败!
    echo 请检查是否没有创建这个环境
    pause
    exit /b
)

:: 启动 label-studio-ml
echo.
echo [2/2] 启动 label-studio-ml
label-studio-ml start %BACKEND_NAME% -p %BACKEND_PORT%

:: 检查是否成功启动
if %errorlevel% neq 0 (
    echo.
    echo 启动 label-studio-ml 失败!
    echo 请检查安装情况,或者是否调用错误!
) else (
    echo.
    echo label-studio-ml 运行成功
    echo 可通过地址直接访问: http://localhost:%BACKEND_PORT%
)

:: 保持命令行窗口打开
pause

同样的,Linux下的脚本暂无待补。

2. 自动训练

将两个脚本启动之后,在label-studio的前端洋面,还是在设置页面操作,就可以实现自动化训练了。

label-studio自动训练模型_第1张图片

最后我们发现最终会调用到model.py中的def fit(self, event, data, **kwargs)函数。

2.1 目录结构

ml_backend_test1目录下,新建一个yolo11文件夹,这里面将放置我们训练的脚本和最终整理好的数据集。

./yolo11/
|-- datasets
|   |-- training_data_1749458061
|   |-- training_data_1749458247
|   |-- training_data_1749458684
|   |-- training_data_1749458911
|-- train.py
|-- yolo11s.pt

这里的train.py是训练的文件,这里的yolo11s.pt是预训练模型,可以用来我们训练的初始权重,也可以轻易更换为其他的权重。

2.2 train.py

train.py的内容如下:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import argparse
from ultralytics import YOLO


def parse_opt():
    """解析命令行参数"""
    parser = argparse.ArgumentParser()
    parser.add_argument("--weights", type=str, default="", help="初始权重路径")
    parser.add_argument("--cfg", type=str, default="", help="模型.yaml路径")
    parser.add_argument("--data", type=str, default="", help="dataset.yaml路径")
    parser.add_argument("--epochs", type=int, default=100)
    parser.add_argument("--batch_size", type=int, default=16, help="总批量大小")
    parser.add_argument("--imgsz", "--img", "--img-size",
                        type=int, default=640, help="训练/验证图像大小")
    parser.add_argument("--rect", action="store_true", help="矩形训练")
    parser.add_argument("--resume", action="store_true", help="恢复最近的训练")
    parser.add_argument("--nosave", action="store_true", help="仅保存最终检查点")
    parser.add_argument("--noval", action="store_true", help="仅验证最终epoch")
    parser.add_argument(
        "--noautoanchor", action="store_true", help="禁用AutoAnchor")
    parser.add_argument("--evolve", type=int, nargs="?",
                        const=300, help="演化超参数 x代")
    parser.add_argument("--cache", type=str, nargs="?",
                        const="ram", help="图像缓存: 'ram' 或 'disk'")
    parser.add_argument("--image-weights",
                        action="store_true", help="使用加权图像选择进行训练")
    parser.add_argument("--device", default="",
                        help="cuda 设备, 例如 0 或 0,1,2,3 或 cpu")
    parser.add_argument(
        "--multi-scale", action="store_true", help="变化图像大小 +/-50%")
    parser.add_argument("--single-cls", action="store_true", help="训练为单类别数据集")
    parser.add_argument("--optimizer", type=str,
                        choices=["SGD", "Adam"], default="SGD", help="优化器")
    parser.add_argument("--sync-bn", action="store_true",
                        help="使用SyncBatchNorm, 仅适用于DDP训练")
    parser.add_argument("--workers", type=int, default=8, help="最大数据加载器工作进程")
    parser.add_argument("--project", default="runs/train",
                        help="保存项目到 project/name")
    parser.add_argument("--name", default="exp", help="保存项目到 project/name")
    parser.add_argument("--exist-ok", action="store_true",
                        help="现有项目/名称正常, 不递增")
    parser.add_argument("--quad", action="store_true", help="四边形数据加载器")
    parser.add_argument("--linear-lr", action="store_true", help="线性学习率")
    parser.add_argument("--label-smoothing", type=float,
                        default=0.0, help="标签平滑 epsilon")
    parser.add_argument("--patience", type=int, default=100,
                        help="EarlyStopping 耐心 (无改善的epoch)")
    parser.add_argument("--freeze", type=int, default=0, help="冻结层数")
    parser.add_argument("--save_period", type=int,
                        default=-1, help="每x个epoch保存检查点")
    parser.add_argument("--local_rank", type=int,
                        default=-1, help="DDP参数, 请勿修改")
    parser.add_argument("--world_size", type=int,
                        default=1, help="DDP参数, 请勿修改")
    parser.add_argument("--eval_interval", type=int,
                        default=1, help="验证间隔(epoch)")

    # 标签工作室集成参数
    parser.add_argument("--label-studio-url", type=str,
                        default="", help="Label Studio服务器URL")
    parser.add_argument("--label-studio-api-key", type=str,
                        default="", help="Label Studio API密钥")
    parser.add_argument("--update-progress",
                        action="store_true", help="将训练进度发送到Label Studio")

    opt = parser.parse_args()

    return opt


def main(opt, callbacks=None):
    model = YOLO(opt.weights)
    results = model.train(data=opt.data, device=opt.device, project=opt.project, name=opt.name,
                          epochs=opt.epochs, workers=opt.workers, batch=opt.batch_size)


if __name__ == "__main__":
    opt = parse_opt()
    main(opt)

有很多参数都给了默认值,根据自己的需要进行配置。

2.3 model.py

然后就是核心的文件model.py,其内容如下:

from typing import List, Dict, Optional
from label_studio_ml.model import LabelStudioMLBase
from label_studio_ml.response import ModelResponse

################## 修改 ##################
# 添加需要的函数
from label_studio_ml.utils import get_image_local_path

# 添加yolo需要的依赖
from ultralytics import YOLO

# 添加额外处理需要的依赖
import re
import time
import datetime
from urllib.parse import unquote
from PIL import Image

# 进程、线程处理
import subprocess
import threading

# 日志
import logging

# 训练
import torch

# 文件处理
from pathlib import Path
import zipfile
import random
import shutil

########################## 一些预定义的参数 ###########################
logger = logging.getLogger(__name__)
script_dir = Path(__file__).resolve().parent
datasets_root_dir = Path('c:/dl/datasets')
project_name = 'test1'
label_studio_url = 'http://localhost:8080'
api_key = 'xxxxxxxxxxxx'


def custom_get_local_path(url):
    try:
        from label_studio_ml.utils import get_image_local_path
        return get_image_local_path(url)
    except:
        pass

    if url.startswith('/data/local-files'):
        # 提取相对路径部分
        match = re.search(r'd=(.*?)(?:&|$)', url)
        if match:
            relative_path = unquote(match.group(1))
            img_path = datasets_root_dir / relative_path
            return img_path

    if url.startswith('/data/upload'):
        relative_path = url.replace('/data/upload/', '').lstrip('/')
        img_path = datasets_root_dir / relative_path
        return img_path

    return url  # 退回原始URL


class NewModel(LabelStudioMLBase):

    def setup(self):
        self.set("model_version", "0.0.1")
        self._model_path = script_dir / 'model' / 'best.pt'
        self._model = YOLO(self._model_path)
        self._labels = self._model.names

        self._is_training = False
        self._training_thread = None
        self._last_trained_model = None
        self._training_process = 0
        self._pretrained_path = self._model_path
        # self._pretrained_path = script_dir / 'yolo11' / 'yolo11s.pt'

        self.config = self.parsed_label_config

    def predict(self,
                tasks: List[Dict],
                context: Optional[Dict] = None,
                **kwargs) -> ModelResponse:
        print(f'''\
        Run prediction on {tasks}
        Received context: {context}
        Project ID: {self.project_id}
        Label config: {self.label_config}
        Parsed JSON Label config: {self.parsed_label_config}
        Extra params: {self.extra_params}''')

        ################## 修改 ##################
        results = []
        for task in tasks:
            image_path = custom_get_local_path(task['data']['image'])
            logger.info(f"图片地址: {image_path}")
            image = Image.open(image_path)
            img_width, img_height = image.size
            pred = self._model(image)
            predictions = []
            for box in pred[0].boxes:
                x_min, y_min, x_max, y_max = map(float, box.xyxy[0].tolist())
                label = self._labels[int(box.cls.item())]
                predictions.append({
                    "from_name": "label",
                    "to_name": "image",
                    "type": "rectanglelabels",
                    "value": {
                        "x": x_min / img_width * 100,
                        "y": y_min / img_height * 100,
                        "width": (x_max - x_min) / img_width * 100,
                        "height": (y_max - y_min) / img_height * 100,
                        "rectanglelabels": [label]
                    },
                    "score": float(box.conf.item())
                })
            results.append({"result": predictions})
        return results
        # return ModelResponse(predictions=[])

    def load_model():
        return None

    def fit(self, event, data, **kwargs):
        if self._is_training:
            logger.info(f"已经有训练任务了,等待完成后再重试!")
            return {"status": "already_training"}

        self._is_training = True
        self._training_process = 0
        self._training_thread = threading.Thread(target=self.train_model,
                                                 args=(data, ))
        self._training_thread.start()

        return {"status": "training_started"}

    def train_model(self, data):
        try:
            logger.info("开始训练YOLO模型...")

            project_id = data['project']['id']
            export_dir = self.export_training_data(project_id)
            classes = self.get_classes_from_config()
            data_yaml = self.create_data_yaml(export_dir, classes)
            self._training_process = 10
            command = self.build_train_command(data_yaml)

            logger.info(f"执行命令: {' '.join(command)}")
            process = subprocess.Popen(
                command,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                universal_newlines=True
            )

            for line in process.stdout:
                logger.info(line.strip())
                if "epoch" in line and "mAP" in line:
                    # 解析进度(示例: Epoch 10/200)
                    parts = line.split()
                    epoch_index = parts.index("epoch")
                    current_epoch = int(parts[epoch_index + 1].split("/")[0])
                    total_epochs = int(parts[epoch_index + 1].split("/")[1])
                    self._training_process = 10 + \
                        int(90 * current_epoch / total_epochs)

            process.wait()

            self._training_process = 95
            new_model_path = self.find_new_model()
            logger.info(f"训练的最新模型: {new_model_path}")
            self._training_process = 100
            logger.info("训练完成!")
        except Exception as e:
            logger.error(f"训练失败: {e}")
        finally:
            self._is_training = False

    def export_training_data(self, project_id):
        from label_studio_sdk import Client

        ls = Client(url=label_studio_url, api_key=api_key)

        project = ls.get_project(project_id)
        export_name =  f'training_data_{int(time.time())}'
        export_dir = script_dir / 'yolo11' / 'datasets' / export_name
        export_file = script_dir / 'yolo11' / 'datasets' / f'{export_name}.zip'

        project.export_tasks(export_type='YOLO',
                             download_all_tasks=True,
                             download_resources=True,
                             export_location=export_file)
        self.convert_export_2_datasets(export_dir, export_file)
        export_file.unlink()
        return export_dir
    
    def convert_export_2_datasets(self, export_dir, export_file):
        f = zipfile.ZipFile(export_file)
        f.extractall(export_dir)
        f.close()

        old_label_path = export_dir / 'labels'
        old_label_files = list(old_label_path.glob('*.txt'))
        new_label_files = []
        for old_file in old_label_files:
            file_name = old_file.name
            print(f"文件名: {file_name}")

            if "%5C" in file_name:
                parts = re.split(r'%5C', file_name, flags=re.IGNORECASE)
                if len(parts) > 1:
                    new_path = old_file.with_name(parts[-1])
                    old_file.rename(new_path)
                    new_label_files.append(new_path.stem)
        
        if len(new_label_files) > 0:
            self.random_datasets(export_dir, new_label_files)
            
    def random_datasets(self, export_dir, label_list, train_ratio=0.8):
        for split in ['train', 'val']:
            for subdir in ['images', 'labels']:
                dst_path = export_dir / subdir / split
                dst_path.mkdir(parents=True)  # 创建新的空文件夹

        random.shuffle(label_list)
        n_total = len(label_list)
        n_train = int(n_total * train_ratio)
        train_list = label_list[:n_train]
        val_list = label_list[n_train:]
        key_map = {"train": train_list, "val": val_list}

        src_img_path = datasets_root_dir / project_name
        img_extension = {'.jpg', '.png', '.jpeg'}
        from collections import defaultdict
        img_map = defaultdict(list)

        for file_path in src_img_path.glob('*'):
            if file_path.is_file():
                ext = file_path.suffix.lower()
                if ext in img_extension:
                    img_map[file_path.stem] = file_path

        for key, files in key_map.items():
            for label_name in files:
                old_label_path = export_dir / 'labels' / f'{label_name}.txt'
                new_label_path = export_dir / 'labels' / key / f'{label_name}.txt'
                old_label_path.replace(new_label_path)
                new_img_path = export_dir / "images" / key
                shutil.copy2(str(img_map[label_name]), str(new_img_path) + "/")

    def get_classes_from_config(self):
        classes = []
        for tag_name, tag_config in self.config.items():
            if tag_config['type'] == 'RectangleLabels':
                classes = tag_config['labels']
                break
        return classes

    def create_data_yaml(self, export_dir, classes):
        train_path = export_dir / 'images' / 'train'
        val_path = export_dir / 'images' / 'val'
        config = {
            'train': str(train_path),
            'val': str(val_path),
            'nc': len(classes),
            'names': classes
        }

        config_path = export_dir / 'data.yaml'
        with open(config_path, 'w') as f:
            f.write(f"train: {config['train']}\n")
            f.write(f"val: {config['val']}\n")
            f.write(f"nc: {config['nc']}\n")
            f.write("names: \n")
            for i, name in enumerate(config['names']):
                f.write(f"  {i}: {name}\n")

        return config_path

    def build_train_command(self, data_yaml):
        train_path = script_dir / 'yolo11' / 'train.py'
        runs_path = script_dir / 'yolo11' / 'runs' / 'train'
        base_command = [
            "python", str(train_path), "--batch_size", "16", "--epochs", "100",
            "--data", str(data_yaml), "--device",
            "0" if torch.cuda.is_available() else "cpu", "--project",
            str(runs_path), "--name", f"yolo11_{int(time.time())}"
        ]

        base_command.extend(["--weights", str(self._pretrained_path)])
        return base_command

    def find_new_model(self):
        runs_dir = script_dir / 'yolo11' / 'runs' / 'train'
        if not runs_dir.exists():
            logger.error(f"目录不存在: {runs_dir}")
            return None
        best_pt_files = list(runs_dir.rglob('best.pt'))

        if not best_pt_files:
            logger.error(f"在 {runs_dir} 及其子目录中未找到 best.pt 文件")
            return None

        file_info = []
        for file_path in best_pt_files:
            try:
                stat = file_path.stat()
                ctime = datetime.datetime.fromtimestamp(stat.st_ctime)
                mtime = datetime.datetime.fromtimestamp(stat.st_mtime)

                file_info.append({
                    'path': file_path,
                    'ctime': ctime,  # 创建时间
                    'mtime': mtime,  # 修改时间
                    'size': stat.st_size,
                    'experiment': file_path.parent.name,
                    'experiment_path': file_path.parent
                })
            except OSError as e:
                logger.error(f"无法访问文件 {file_path}: {e}")

        file_info.sort(key=lambda x: x['ctime'], reverse=True)
        latest_file = file_info[0]
        print(f"找到 {len(file_info)} 个 best.pt 文件:")
        for i, info in enumerate(file_info[:5], 1):  # 只显示前5个
            print(
                f"{i}. {info['path']} - 创建于: {info['ctime']} - 大小: {info['size']/1024**2:.2f} MB")

        print("\n" + "=" * 70)
        print(f"最新 best.pt 文件: {latest_file['path']}")
        print(f"  创建时间: {latest_file['ctime']}")
        print(f"  修改时间: {latest_file['mtime']}")
        print(f"  文件大小: {latest_file['size']/1024**2:.2f} MB")
        print(f"  所属实验: {latest_file['experiment']}")
        print(f"  实验路径: {latest_file['experiment_path']}")
        print("=" * 70)

        return latest_file['path']

    def get_train_status(self):
        """获取训练状态(用于API查询)"""
        return {
            "is_training": self._is_training,
            "process": self._training_process,
            "last_trained_model": self._last_trained_model
        }

简单做一些说明:

  • def fit(self, event, data, **kwargs):

    在函数中创建了一个线程,去调用train_model函数进行处理。

  • def train_model(self, data):

    在函数中实现训练,包括准备训练数据、执行训练的命令和输出实时输出结果。

    最终是调用子进程来实现训练,也就是调用train.py

  • def export_training_data(self, project_id):

    函数中实现从label-studio上下载标注后的数据,但是,label-studio中没有提供同时下载图片和标签的接口,所以需要做了一些额外处理,这也是文件的初始全局变量的作用:

    script_dir = Path(__file__).resolve().parent 
    datasets_root_dir = Path('c:/dl/datasets')
    project_name = 'test1'
    label_studio_url = 'http://localhost:8080'
    api_key = 'xxxxxx'
    

    需要处理文件的重命名、训练测试集分类、拷贝文件等。

  • def create_data_yaml(self, export_dir, classes):

    创建一个yaml文件来描述训练数据。

  • def build_train_command(self, data_yaml):

    构建训练的命令,来调用train.py

都准备好了之后就可以开始自动训练了。

2.4 训练

按照前面将文件准备好了之后,重新启动后端,然后在前端点击Start Training之后,就可以看到开始训练了,等待足够长时间后就可以得到:

label-studio自动训练模型_第2张图片

你可能感兴趣的:(YOLO,标注,软件使用,YOLO,python,深度学习)