SWIFT depends on torch>=1.13, recommend torch>=2.0.0.
这里我是torch2.1.2+cu118 (单独创的环境)
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
下面牵扯到的脚本均在autodl-tmp目录下运行,sh **.sh
正如开头说的,这里模型不是lmms-lab下的。
huggingface-cli download --resume-download llava-hf/llava-onevision-qwen2-7b-ov-hf --local-dir llava-hf/llava-onevision-qwen2-7b-ov-hf
训练数据格式可以参照官方文档下的自定义数据集部分或者官方针对InternVL模型的多图数据格式如下所示。
{"query": "Image-1: \nImage-2: \nDescribe the two images in detail." , "response": "xxxxxxxxx", "history": [["Describe the image" , "xxxxxxx"], ["CCCCC", "DDDDD"]], "images": ["image_path1", "image_path2", "image_path3"]}
于是,我按照上述格式进行转换,下面为转换代码 (包括了转query, response和images (咱们的是image),还不报错,如果不改,还好我之前手改的数据形式,发现image输出的很少,跑的还快):
import json
from collections import OrderedDict
def convert(input_path, output_path):
key_order = ['id', 'image_id', 'width_list', 'height_list', 'query', 'response', 'images', 'condition',
'eimg_start_idx']
with open(input_path, 'r', encoding='utf-8') as f_in, \
open(output_path, 'w', encoding='utf-8') as f_out:
for line in f_in:
data = json.loads(line)
# 提取query和response
query = next(item['value'] for item in data['conversations'] if item['from'] == 'human')
response = next(item['value'] for item in data['conversations'] if item['from'] == 'gpt')
# 创建有序字典并按指定顺序添加键值对
new_data = OrderedDict()
for key in key_order:
if key == 'query':
new_data[key] = query
elif key == 'response':
new_data[key] = response
elif key == 'images': # 处理键名更改
new_data[key] = data['image']
else:
new_data[key] = data[key]
# 写入新的jsonl文件
json.dump(new_data, f_out, ensure_ascii=False)
f_out.write('\n')
convert('path/to/input.jsonl', 'path/to/output.jsonl')
位于swift/swift/llm/template/vision_utils.py
下的load_file
函数,如下图所示添加紫色代码以让代码能够找到image路径。
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model llava-hf/llava-onevision-qwen2-7b-ov-hf \
--model_type llava_onevision_hf \
--train_type lora \
--dataset data/annotations/train.jsonl \
--val_dataset data/annotations/validation.jsonl \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--learning_rate 1e-4 \
--lora_rank 8 \
--lora_alpha 32 \
--target_modules all-linear \
--gradient_accumulation_steps 16 \
--eval_steps 50 \
--save_steps 50 \
--save_total_limit 2 \
--logging_steps 5 \
--max_length 2048 \
--model_author swift \
--output_dir output \
--model_name swift-robot
以下参考官方文档对InternVL的Merged-LoRA
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir "output/v4-20241208-115323/checkpoint-1" \
--merge_lora true
以下参考官方文档对InternVL的Infer
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir "output/v4-20241208-115323/checkpoint-1-merged" \
--load_dataset_config true