C--G

6、LLaVA

简介

LLaVA官网

LLaVA使用Vicuna(LLaMA-2)作为LLM $f_\phi(·)$ ，使用预训练的CLIP图像编码器 ViT-L/14 $g(X_v)$ 。

输入图像 $X_v$ ，首先获取feature $Z_v=g(X_v)$ 。考虑到最后一层Transformer前后的网格特征，采用简单的线性层连接图像特征到词嵌入空间，即使用一个可训练的投影矩阵 W 将 $Z_v$ 转换为语言嵌入令牌 $H_v$ （与语言模型中词嵌入空间具有相同的维数）。

简单投影方案是轻量级的，它允许快速迭代以数据为中心的实验。还可以考虑更复杂的方案来连接图像和语言表征，例如Flamingo中的 gated cross-attention 和BLIP-2中的Q-former。

Training

对于每张图像 $X_v$ ，生成多回合对话数据 $(X^1_q, X^1_a，···，X^T_q, X^T_a)$ ，其中 T 为总回合数。将它们组织成一个序列，将所有的回答视为 assistant 响应，并将指令 $X^t_{instruct}$ 在第 t 个回合处为:

这导致了下表中所示的多模态指令跟随序列的统一格式。使用其原始的自回归训练目标对预测 token 执行LLM的指令调优。

具体来说，对于长度为 L 的序列，计算目标答案 $X_a$ 的概率为:

其中，θ 为可训练参数， $X_{instruct,}Xinstruct,<i$

对于上述公式中的条件，显式地添加了 $X_v$ ，以强调图像是基于所有答案的事实，并且为了更好的可读性，省略了 $X_{system-message}$ 和所有前面的 $< STOP >$ 。

对于LLaVA模型训练，考虑一个两阶段的指令调优过程。

Stage one：Pre-training for Feature Alignment

为了在 concept coverage 和训练效率之间取得平衡，将 CC3M 过滤到 595K 图像-文本对。将这些数据对转换为跟随指令的数据（如下图所示）。每个样本都可以视为单回合对话。为了构造输入 $X_{instruct}$ ，对于图像 $X_v$ ，随机采样一个问题 $X_q$ ，这是一个语言指令，要求 assistant 对图像进行简要描述。最基本的预测答案 $X_a$ 是原始的标题。在训练中，保持视觉编码器和LLM权值不变，并最大化公式(3)的似然值，只有可训练参数 θ = W(投影矩阵)。这样，图像特征 $H_v$ 可以与预训练的LLM词嵌入对齐。

这个阶段可以理解为为冻结的LLM训练一个兼容的视觉标记器。

利用仅语言的GPT-4或ChatGPT作为强大的教师(两者都只接受文本作为输入)，以创建包含视觉内容的指令跟随数据。

具体来说，为了将图像编码为其视觉特征以提示纯文本GPT，使用两种类型的符号表示:

通常从不同角度描述视觉场景的字幕;

边界框通常对场景中的物体进行定位，每个边界框对物体概念及其空间位置进行编码。

Stage two：Fine-tuning End-to-End.

保持视觉编码器权值不变，并不断更新投影层和LLM的预训练权值;即，在公式(3)中，可训练的参数是θ = {W， φ}。

考虑两个具体的用例场景:

Multimodal Chatbot.通过对158K语言图像指令跟踪数据进行微调来开发聊天机器人。在这三种类型的响应中，会话是多回合的，而其他两种是单回合的。它们在训练中被统一采样。

Science QA.在ScienceQA基准上研究，这是第一个大规模的多模态科学问题数据集，它用详细的讲座和解释注释了答案。每个问题都以自然语言或图像的形式提供上下文。该 assistant 以自然语言提供推理过程，并从多个选项中选择答案。对于(2)中的训练，将数据组织为单回合对话，问题和上下文作为 $X_{instruct}$ ，推理和答案作为 $X_a$ 。

模型搭建

github

环境搭建

Clone this repository and navigate to LLaVA folder

git clone https://github.com/haotian-liu/LLaVA.git cd LLaVA

Install Package

conda create -n llava python=3.10 -y conda activate llava pip install --upgrade pip # enable PEP 660 support pip install -e .

Install additional packages for training cases

pip install -e ".[train]" pip install flash-attn --no-build-isolation

下载预训练权重

liuhaotian/llava-v1.5-13b 权重

sudo apt-get install git-lfs git init # Make sure you have git-lfs installed (https://git-lfs.com) git lfs install git clone https://huggingface.co/liuhaotian/llava-v1.5-13b # if you want to clone without large files – just their pointers # prepend your git clone with the following env var: GIT_LFS_SKIP_SMUDGE=1

加载模型

# 分词器 tokenizer = AutoTokenizer.from_pretrained(model_base, use_fast=False) # 配置文件 cfg_pretrained = AutoConfig.from_pretrained(model_base) # config.json # 主model model = LlavaLlamaForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=cfg_pretrained, **kwargs) # 加载图像映射token 的权重 mm_projector_weights = torch.load(os.path.join(model_base, 'mm_projector.bin'), map_location='cpu') mm_projector_weights = {k: v.to(torch.float16) for k, v in mm_projector_weights.items()} model.load_state_dict(mm_projector_weights, strict=False) # 是否修改input embedding mm_use_im_start_end = getattr(model.config, "mm_use_im_start_end", False) # False mm_use_im_patch_token = getattr(model.config, "mm_use_im_patch_token", True)# False if mm_use_im_patch_token: tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN], special_tokens=True) if mm_use_im_start_end: tokenizer.add_tokens([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True) model.resize_token_embeddings(len(tokenizer)) # 加载CLIP 图像编码器 # LlavaMetaForCausalLM.get_vision_tower() --> LlavaLlamaForCausalLM.get_model() --> LlavaMetaModel.get_vision_tower() vision_tower = model.get_vision_tower() if not vision_tower.is_loaded: vision_tower.load_model() vision_tower.to(device=device, dtype=torch.float16) image_processor = vision_tower.image_processor if hasattr(model.config, "max_sequence_length"): context_len = model.config.max_sequence_length else: context_len = 2048 tokenizer, model, image_processor, context_len

config.json

{ "_name_or_path": "llava-v1.5-13b", "architectures": [ "LlavaLlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "freeze_mm_mlp_adapter": false, "freeze_mm_vision_resampler": false, "hidden_act": "silu", "hidden_size": 5120, "image_aspect_ratio": "pad", "initializer_range": 0.02, "intermediate_size": 13824, "max_length": 4096, "max_position_embeddings": 4096, "mm_hidden_size": 1024, "mm_projector_type": "mlp2x_gelu", "mm_resampler_type": null, "mm_use_im_patch_token": false, "mm_use_im_start_end": false, "mm_vision_select_feature": "patch", "mm_vision_select_layer": -2, "mm_vision_tower": "openai/clip-vit-large-patch14-336", "model_type": "llava", "num_attention_heads": 40, "num_hidden_layers": 40, "num_key_value_heads": 40, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "tune_mm_mlp_adapter": false, "tune_mm_vision_resampler": false, "unfreeze_mm_vision_tower": false, "use_cache": true, "use_mm_proj": true, "vocab_size": 32000 }

主model

LlavaLlamaForCausalLM
图像编码器+映射+大NLP

class LlavaConfig(LlamaConfig): model_type = "llava" # Llama NLP 模型图像编码器映射 class LlavaLlamaModel(LlavaMetaModel, LlamaModel): config_class = LlavaConfig def __init__(self, config: LlamaConfig): super(LlavaLlamaModel, self).__init__(config) # 主模型 class LlavaLlamaForCausalLM(LlamaForCausalLM, LlavaMetaForCausalLM): config_class = LlavaConfig def __init__(self, config): super(LlamaForCausalLM, self).__init__(config) # Llama NLP 模型图像编码器映射 self.model = LlavaLlamaModel(config) self.pretraining_tp = config.pretraining_tp # 1 self.vocab_size = config.vocab_size # 32000 self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False) # 5120 3200 # Initialize weights and apply final processing self.post_init() def get_model(self): return self.model def forward( self, input_ids: torch.LongTensor = None, attention_mask: Optional[torch.Tensor] = None, position_ids: Optional[torch.LongTensor] = None, past_key_values: Optional[List[torch.FloatTensor]] = None, inputs_embeds: Optional[torch.FloatTensor] = None, labels: Optional[torch.LongTensor] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, images: Optional[torch.FloatTensor] = None, return_dict: Optional[bool] = None, ) -> Union[Tuple, CausalLMOutputWithPast]: if inputs_embeds is None: ( input_ids, position_ids, attention_mask, past_key_values, inputs_embeds, labels ) = self.prepare_inputs_labels_for_multimodal( input_ids, position_ids, attention_mask, past_key_values, labels, images ) return super().forward( input_ids=input_ids, attention_mask=attention_mask, position_ids=position_ids, past_key_values=past_key_values, inputs_embeds=inputs_embeds, labels=labels, use_cache=use_cache, output_attentions=output_attentions, output_hidden_states=output_hidden_states, return_dict=return_dict ) def prepare_inputs_for_generation(self, input_ids, past_key_values=None, inputs_embeds=None, **kwargs): images = kwargs.pop("images", None) _inputs = super().prepare_inputs_for_generation( input_ids, past_key_values=past_key_values, inputs_embeds=inputs_embeds, **kwargs ) if images is not None: _inputs['images'] = images return _inputs AutoConfig.register("llava", LlavaConfig) AutoModelForCausalLM.register(LlavaConfig, LlavaLlamaForCausalLM)

图像编码器

LlavaMetaModel

class LlavaMetaModel: def __init__(self, config): super(LlavaMetaModel, self).__init__(config) if hasattr(config, "mm_vision_tower"): # clip 图像编码器 self.vision_tower = build_vision_tower(config, delay_load=True) # 图像特征映射到token self.mm_projector = build_vision_projector(config) def get_vision_tower(self): vision_tower = getattr(self, 'vision_tower', None) if type(vision_tower) is list: vision_tower = vision_tower[0] return vision_tower def initialize_vision_modules(self, model_args, fsdp=None): vision_tower = model_args.vision_tower mm_vision_select_layer = model_args.mm_vision_select_layer mm_vision_select_feature = model_args.mm_vision_select_feature pretrain_mm_mlp_adapter = model_args.pretrain_mm_mlp_adapter self.config.mm_vision_tower = vision_tower if self.get_vision_tower() is None: vision_tower = build_vision_tower(model_args) if fsdp is not None and len(fsdp) > 0: self.vision_tower = [vision_tower] else: self.vision_tower = vision_tower else: if fsdp is not None and len(fsdp) > 0: vision_tower = self.vision_tower[0] else: vision_tower = self.vision_tower vision_tower.load_model() self.config.use_mm_proj = True self.config.mm_projector_type = getattr(model_args, 'mm_projector_type', 'linear') self.config.mm_hidden_size = vision_tower.hidden_size self.config.mm_vision_select_layer = mm_vision_select_layer self.config.mm_vision_select_feature = mm_vision_select_feature if getattr(self, 'mm_projector', None) is None: self.mm_projector = build_vision_projector(self.config) else: # In case it is frozen by LoRA for p in self.mm_projector.parameters(): p.requires_grad = True if pretrain_mm_mlp_adapter is not None: mm_projector_weights = torch.load(pretrain_mm_mlp_adapter, map_location='cpu') def get_w(weights, keyword): return {k.split(keyword + '.')[1]: v for k, v in weights.items() if keyword in k} self.mm_projector.load_state_dict(get_w(mm_projector_weights, 'mm_projector'))

clip 图像编码器

build_vision_tower

def build_vision_tower(vision_tower_cfg, **kwargs): # openai/clip-vit-large-patch14-336 vision_tower = getattr(vision_tower_cfg, 'mm_vision_tower', getattr(vision_tower_cfg, 'vision_tower', None)) is_absolute_path_exists = os.path.exists(vision_tower) if is_absolute_path_exists or vision_tower.startswith("openai") or vision_tower.startswith("laion"): return CLIPVisionTower(vision_tower, args=vision_tower_cfg, **kwargs) raise ValueError(f'Unknown vision tower: {vision_tower}')

CLIPVisionTower

class CLIPVisionTower(nn.Module): def __init__(self, vision_tower, args, delay_load=False): super().__init__() self.is_loaded = False self.vision_tower_name = vision_tower self.select_layer = args.mm_vision_select_layer self.select_feature = getattr(args, 'mm_vision_select_feature', 'patch') if not delay_load: self.load_model() else: self.cfg_only = CLIPVisionConfig.from_pretrained(self.vision_tower_name) def feature_select(self, image_forward_outs): # 从前向传播的输出中获取隐藏状态（特征表示） image_features = image_forward_outs.hidden_states[self.select_layer] # 根据选择的特征进行处理 if self.select_feature == 'patch': # 选择除了第一个位置（CLS位置）之外的所有位置的特征 image_features = image_features[:, 1:] elif self.select_feature == 'cls_patch': # 选择所有位置的特征 image_features = image_features else: raise ValueError(f'Unexpected select feature: {self.select_feature}') return image_features @torch.no_grad() def forward(self, images): if type(images) is list: image_features = [] for image in images: image_forward_out = self.vision_tower(image.to(device=self.device, dtype=self.dtype).unsqueeze(0), output_hidden_states=True) image_feature = self.feature_select(image_forward_out).to(image.dtype) image_features.append(image_feature) else: # 对单个图像进行前向传播，获取隐藏状态 image_forward_outs = self.vision_tower(images.to(device=self.device, dtype=self.dtype), output_hidden_states=True) # 从隐藏状态中选择特征 image_features = self.feature_select(image_forward_outs).to(images.dtype) return image_features @torch.no_grad() def forward(self, images): if type(images) is list: image_features = [] for image in images: image_forward_out = self.vision_tower(image.to(device=self.device, dtype=self.dtype).unsqueeze(0), output_hidden_states=True) image_feature = self.feature_select(image_forward_out).to(image.dtype) image_features.append(image_feature) else: image_forward_outs = self.vision_tower(images.to(device=self.device, dtype=self.dtype), output_hidden_states=True) image_features = self.feature_select(image_forward_outs).to(images.dtype) return image_features @property def dummy_feature(self): return torch.zeros(1, self.hidden_size, device=self.device, dtype=self.dtype) @property def dtype(self): return self.vision_tower.dtype @property def device(self): return self.vision_tower.device @property def config(self): if self.is_loaded: return self.vision_tower.config else: return self.cfg_only @property def hidden_size(self): return self.config.hidden_size @property def num_patches(self): return (self.config.image_size // self.config.patch_size) ** 2

图像特征映射到token

build_vision_projector

def build_vision_projector(config, delay_load=False, **kwargs): projector_type = getattr(config, 'mm_projector_type', 'linear') if projector_type == 'linear': # mm_hidden_size：1024 # hidden_size 5120 return nn.Linear(config.mm_hidden_size, config.hidden_size) mlp_gelu_match = re.match(r'^mlp(\d+)x_gelu$', projector_type) if mlp_gelu_match: mlp_depth = int(mlp_gelu_match.group(1)) modules = [nn.Linear(config.mm_hidden_size, config.hidden_size)] for _ in range(1, mlp_depth): modules.append(nn.GELU()) modules.append(nn.Linear(config.hidden_size, config.hidden_size)) return nn.Sequential(*modules)

GELU激活函数

研究者表明，收到dropout、ReLU等机制的影响，它们都希望将不重要的激活信息规整为0，我们可以理解为，对于输入的值，我们根据它的情况乘上1或者0，更数学一点的描述是，对于每一个输入x，其服从标准的正太分布N(0,1) ,它会乘上一个伯努利分布 $Bernoulli(\phi(x))$ ,其中， $KaTeX parse error: Undefined control sequence: \leqx at position 12: \phi(x)=P(X\̲l̲e̲q̲x̲)$ 。

GELU ：高斯误差线性单元激活函数，随着x的降低，它被归零的概率会升高。对于ReLU来说，这个界限就是0，输入少于零就会被归为0，这一类激活函数，不仅保留了概率性，同时也保留了对输入的依赖性。

在最近的Transformer模型（谷歌的BERT和OpenAI的GPT-2）中得到了应用,GELU的论文来自2016年，但是最近才引起关注，这种激活函数的形式为：

一般情况下会使用：

可得出来，这就是某些函数（比如双曲正切函数tanh）与近似数值的组合，详细的介绍可以参看下面的链接：

On the GELU Activation Function

我们可以看看GELU到底长什么样子，其函数图像（左）及其导数图像（右）如下图所示：

主模型功能实现类（重点）

class LlavaMetaForCausalLM(ABC): @abstractmethod def get_model(self): pass def get_vision_tower(self): return self.get_model().get_vision_tower() def encode_images(self, images): image_features = self.get_model().get_vision_tower()(images) image_features = self.get_model().mm_projector(image_features) return image_features def prepare_inputs_labels_for_multimodal( self, input_ids, position_ids, attention_mask, past_key_values, labels, images ): vision_tower = self.get_vision_tower() if vision_tower is None or images is None or input_ids.shape[1] == 1: if past_key_values is not None and vision_tower is not None and images is not None and input_ids.shape[1] == 1: target_shape = past_key_values[-1][-1].shape[-2] + 1 attention_mask = torch.cat((attention_mask, torch.ones( (attention_mask.shape[0], target_shape - attention_mask.shape[1]), dtype=attention_mask.dtype, device=attention_mask.device )), dim=1) position_ids = torch.sum(attention_mask, dim=1).unsqueeze(-1) - 1 return input_ids, position_ids, attention_mask, past_key_values, None, labels if type(images) is list or images.ndim == 5: concat_images = torch.cat([image for image in images], dim=0) image_features = self.encode_images(concat_images) split_sizes = [image.shape[0] for image in images] image_features = torch.split(image_features, split_sizes, dim=0) image_features = [x.flatten(0, 1).to(self.device) for x in image_features] else: image_features = self.encode_images(images).to(self.device) # TODO: image start / end is not implemented here to support pretraining. if getattr(self.config, 'tune_mm_mlp_adapter', False) and getattr(self.config, 'mm_use_im_start_end', False): raise NotImplementedError # Let's just add dummy tensors if they do not exist, # it is a headache to deal with None all the time. # But it is not ideal, and if you have a better idea, # please open an issue / submit a PR, thanks. _labels = labels _position_ids = position_ids _attention_mask = attention_mask if attention_mask is None: attention_mask = torch.ones_like(input_ids, dtype=torch.bool) else: attention_mask = attention_mask.bool() if position_ids is None: position_ids = torch.arange(0, input_ids.shape[1], dtype=torch.long, device=input_ids.device) if labels is None: labels = torch.full_like(input_ids, IGNORE_INDEX) # remove the padding using attention_mask -- TODO: double check input_ids = [cur_input_ids[cur_attention_mask] for cur_input_ids, cur_attention_mask in zip(input_ids, attention_mask)] labels = [cur_labels[cur_attention_mask] for cur_labels, cur_attention_mask in zip(labels, attention_mask)] new_input_embeds = [] new_labels = [] cur_image_idx = 0 for batch_idx, cur_input_ids in enumerate(input_ids): num_images = (cur_input_ids == IMAGE_TOKEN_INDEX).sum() # 没有图像标记 if num_images == 0: cur_image_features = image_features[cur_image_idx] cur_input_embeds_1 = self.get_model().embed_tokens(cur_input_ids) # 连接文本嵌入和空的图像特征（占位符，确保张量的维度一致性） cur_input_embeds = torch.cat([cur_input_embeds_1, cur_image_features[0:0]], dim=0) new_input_embeds.append(cur_input_embeds) new_labels.append(labels[batch_idx]) cur_image_idx += 1 continue # 存在图像标记 ## 分割成不包含图像标记的子序列 image_token_indices = [-1] + torch.where(cur_input_ids == IMAGE_TOKEN_INDEX)[0].tolist() + [cur_input_ids.shape[0]] cur_input_ids_noim = [] cur_labels = labels[batch_idx] cur_labels_noim = [] for i in range(len(image_token_indices) - 1): cur_input_ids_noim.append(cur_input_ids[image_token_indices[i]+1:image_token_indices[i+1]]) cur_labels_noim.append(cur_labels[image_token_indices[i]+1:image_token_indices[i+1]]) ## 计算每个子序列的长度，用于后续的嵌入操作 split_sizes = [x.shape[0] for x in cur_labels_noim] ## 对不包含图像标记的文本标识符进行嵌入操作 cur_input_embeds = self.get_model().embed_tokens(torch.cat(cur_input_ids_noim)) ## 将嵌入结果按照子序列长度切分 cur_input_embeds_no_im = torch.split(cur_input_embeds, split_sizes, dim=0) # 构建新的嵌入和标签列表 cur_new_input_embeds = [] cur_new_labels = [] for i in range(num_images + 1): cur_new_input_embeds.append(cur_input_embeds_no_im[i]) cur_new_labels.append(cur_labels_noim[i]) # 添加相应的图像特征，并用 IGNORE_INDEX 填充对应的标签 if i < num_images: cur_image_features = image_features[cur_image_idx] cur_image_idx += 1 cur_new_input_embeds.append(cur_image_features) cur_new_labels.append(torch.full((cur_image_features.shape[0],), IGNORE_INDEX, device=cur_labels.device, dtype=cur_labels.dtype)) cur_new_input_embeds = torch.cat(cur_new_input_embeds) cur_new_labels = torch.cat(cur_new_labels) new_input_embeds.append(cur_new_input_embeds) new_labels.append(cur_new_labels) # Truncate sequences to max length as image embeddings can make the sequence longer tokenizer_model_max_length = getattr(self.config, 'tokenizer_model_max_length', None) if tokenizer_model_max_length is not None: new_input_embeds = [x[:tokenizer_model_max_length] for x in new_input_embeds] new_labels = [x[:tokenizer_model_max_length] for x in new_labels] # 对输入的嵌入进行填充，使得它们具有相同的长度 # Combine them max_len = max(x.shape[0] for x in new_input_embeds) batch_size = len(new_input_embeds) new_input_embeds_padded = [] new_labels_padded = torch.full((batch_size, max_len), IGNORE_INDEX, dtype=new_labels[0].dtype, device=new_labels[0].device) attention_mask = torch.zeros((batch_size, max_len), dtype=attention_mask.dtype, device=attention_mask.device) position_ids = torch.zeros((batch_size, max_len), dtype=position_ids.dtype, device=position_ids.device) for i, (cur_new_embed, cur_new_labels) in enumerate(zip(new_input_embeds, new_labels)): cur_len = cur_new_embed.shape[0] if getattr(self.config, 'tokenizer_padding_side', 'right') == "left": new_input_embeds_padded.append(torch.cat(( torch.zeros((max_len - cur_len, cur_new_embed.shape[1]), dtype=cur_new_embed.dtype, device=cur_new_embed.device), cur_new_embed ), dim=0)) if cur_len > 0: new_labels_padded[i, -cur_len:] = cur_new_labels attention_mask[i, -cur_len:] = True position_ids[i, -cur_len:] = torch.arange(0, cur_len, dtype=position_ids.dtype, device=position_ids.device) else: new_input_embeds_padded.append(torch.cat(( cur_new_embed, torch.zeros((max_len - cur_len, cur_new_embed.shape[1]), dtype=cur_new_embed.dtype, device=cur_new_embed.device) ), dim=0)) if cur_len > 0: new_labels_padded[i, :cur_len] = cur_new_labels attention_mask[i, :cur_len] = True position_ids[i, :cur_len] = torch.arange(0, cur_len, dtype=position_ids.dtype, device=position_ids.device) new_input_embeds = torch.stack(new_input_embeds_padded, dim=0) if _labels is None: new_labels = None else: new_labels = new_labels_padded if _attention_mask is None: attention_mask = None else: attention_mask = attention_mask.to(dtype=_attention_mask.dtype) if _position_ids is None: position_ids = None return None, position_ids, attention_mask, past_key_values, new_input_embeds, new_labels def initialize_vision_tokenizer(self, model_args, tokenizer): if model_args.mm_use_im_patch_token: tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN], special_tokens=True) self.resize_token_embeddings(len(tokenizer)) if model_args.mm_use_im_start_end: num_new_tokens = tokenizer.add_tokens([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True) self.resize_token_embeddings(len(tokenizer)) if num_new_tokens > 0: input_embeddings = self.get_input_embeddings().weight.data output_embeddings = self.get_output_embeddings().weight.data input_embeddings_avg = input_embeddings[:-num_new_tokens].mean( dim=0, keepdim=True) output_embeddings_avg = output_embeddings[:-num_new_tokens].mean( dim=0, keepdim=True) input_embeddings[-num_new_tokens:] = input_embeddings_avg output_embeddings[-num_new_tokens:] = output_embeddings_avg if model_args.tune_mm_mlp_adapter: for p in self.get_input_embeddings().parameters(): p.requires_grad = True for p in self.get_output_embeddings().parameters(): p.requires_grad = False if model_args.pretrain_mm_mlp_adapter: mm_projector_weights = torch.load(model_args.pretrain_mm_mlp_adapter, map_location='cpu') embed_tokens_weight = mm_projector_weights['model.embed_tokens.weight'] assert num_new_tokens == 2 if input_embeddings.shape == embed_tokens_weight.shape: input_embeddings[-num_new_tokens:] = embed_tokens_weight[-num_new_tokens:] elif embed_tokens_weight.shape[0] == num_new_tokens: input_embeddings[-num_new_tokens:] = embed_tokens_weight else: raise ValueError(f"Unexpected embed_tokens_weight shape. Pretrained: {embed_tokens_weight.shape}. Current: {input_embeddings.shape}. Numer of new tokens: {num_new_tokens}.") elif model_args.mm_use_im_patch_token: if model_args.tune_mm_mlp_adapter: for p in self.get_input_embeddings().parameters(): p.requires_grad = False for p in self.get_output_embeddings().parameters(): p.requires_grad = False

Trainging

两阶段训练超参数设置

训练就算了，pass

Finetuning

finetuning 才是主流，且有卡的情况。

Finetune LLaVA on Custom Datasets

json数据格式：A sample JSON for finetuning LLaVA for generating tag-style captions for Stable Diffusion

[ { "id": "997bb945-628d-4724-b370-b84de974a19f", "image": "part-000001/997bb945-628d-4724-b370-b84de974a19f.jpg", "conversations": [ { "from": "human", "value": "\nWrite a prompt for Stable Diffusion to generate this image." }, { "from": "gpt", "value": "a beautiful painting of chernobyl by nekro, pascal blanche, john harris, greg rutkowski, sin jong hun, moebius, simon stalenhag. in style of cg art. ray tracing. cel shading. hyper detailed. realistic. ue 5. maya. octane render. " }, ] }, ... ]

Training script with DeepSpeed ZeRO-3: finetune.sh
任务特定数据有限，使用 LoRA 从 LLaVA 检查点进行微调:finetune_lora.sh
任务的数据量足够，从 LLaVA（lora）检查点进行微调，然后进行全模型微调：finetune_task.sh（finetune_task_lora.sh）

这里使用finetune_task.sh

#!/bin/bash deepspeed llava/train/train_mem.py \ # config --deepspeed ./scripts/zero3.json \ # base bmodel --model_name_or_path liuhaotian/llava-v1.5-13b \ --version v1 \ # data --data_path ./playground/data/llava_v1_5_mix665k.json \ --image_folder ./playground/data \ # image encoding model --vision_tower openai/clip-vit-large-patch14-336 \ --mm_projector_type mlp2x_gelu \ ##clip-vit 取倒数第二层特征输出作为图像编码 --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ ## 将图片扩展为正方形再进行图像预处理 --image_aspect_ratio pad \ # 数据采样 --group_by_modality_length True \ --bf16 True \ --output_dir ./checkpoints/llava-v1.5-13b-task \ --num_train_epochs 1 \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 50000 \ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --dataloader_num_workers 4 \ --lazy_preprocess True \ --report_to wandb

zero3.json
训练相关

{ "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": "auto" }, "train_micro_batch_size_per_gpu": "auto", "train_batch_size": "auto", "gradient_accumulation_steps": "auto", "zero_optimization": { "stage": 3, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 1e9, "reduce_bucket_size": "auto", "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 1e9, "stage3_max_reuse_distance": 1e9, "stage3_gather_16bit_weights_on_model_save": true } }

train_mem.py

# 检查硬件 from llava.train.llama_flash_attn_monkey_patch import replace_llama_attn_with_flash_attn replace_llama_attn_with_flash_attn() from llava.train.train import train if __name__ == "__main__": train()

检查硬件

def replace_llama_attn_with_flash_attn(): cuda_major, cuda_minor = torch.cuda.get_device_capability() if cuda_major < 8: warnings.warn( "Flash attention is only supported on A100 or H100 GPU during training due to head dim > 64 backward." "ref: https://github.com/HazyResearch/flash-attention/issues/190#issuecomment-1523359593" ) # Disable the transformation of the attention mask in LlamaModel as the flash attention transformers.models.llama.modeling_llama.LlamaModel._prepare_decoder_attention_mask = ( _prepare_decoder_attention_mask ) # 修改Llama的attention transformers.models.llama.modeling_llama.LlamaAttention.forward = forward

模型fintune入口——train.py

def train(): global local_rank # 解析命令行参数并将其分配给相应的数据类中的对象 parser = transformers.HfArgumentParser( (ModelArguments, DataArguments, TrainingArguments)) model_args, data_args, training_args = parser.parse_args_into_dataclasses() # 当前进程所使用的 GPU 设备的索引 local_rank = training_args.local_rank # 统一数据类型 compute_dtype = (torch.float16 if training_args.fp16 else (torch.bfloat16 if training_args.bf16 else torch.float32)) bnb_model_from_pretrained_args = {} if training_args.bits in [4, 8]: from transformers import BitsAndBytesConfig bnb_model_from_pretrained_args.update(dict( device_map={"": training_args.device}, load_in_4bit=training_args.bits == 4, load_in_8bit=training_args.bits == 8, quantization_config=BitsAndBytesConfig( load_in_4bit=training_args.bits == 4, load_in_8bit=training_args.bits == 8, llm_int8_skip_modules=["mm_projector"], llm_int8_threshold=6.0, llm_int8_has_fp16_weight=False, bnb_4bit_compute_dtype=compute_dtype, bnb_4bit_use_double_quant=training_args.double_quant, bnb_4bit_quant_type=training_args.quant_type # {'fp4', 'nf4'} ) )) # 加载模型 ## openai/clip-vit-large-patch14-336 if model_args.vision_tower is not None: if 'mpt' in model_args.model_name_or_path: config = transformers.AutoConfig.from_pretrained(model_args.model_name_or_path, trust_remote_code=True) config.attn_config['attn_impl'] = training_args.mpt_attn_impl model = LlavaMPTForCausalLM.from_pretrained( model_args.model_name_or_path, config=config, cache_dir=training_args.cache_dir, **bnb_model_from_pretrained_args ) else: # liuhaotian/llava-v1.5-13b model = LlavaLlamaForCausalLM.from_pretrained( model_args.model_name_or_path, cache_dir=training_args.cache_dir, **bnb_model_from_pretrained_args ) else: model = transformers.LlamaForCausalLM.from_pretrained( model_args.model_name_or_path, cache_dir=training_args.cache_dir, **bnb_model_from_pretrained_args ) model.config.use_cache = False ## 冻结权重 if model_args.freeze_backbone: model.model.requires_grad_(False) if training_args.bits in [4, 8]: from peft import prepare_model_for_kbit_training model.config.torch_dtype=(torch.float32 if training_args.fp16 else (torch.bfloat16 if training_args.bf16 else torch.float32)) model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=training_args.gradient_checkpointing) ## 开启编码器映射层权重 if training_args.gradient_checkpointing: if hasattr(model, "enable_input_require_grads"): model.enable_input_require_grads() else: def make_inputs_require_grad(module, input, output): output.requires_grad_(True) model.get_input_embeddings().register_forward_hook(make_inputs_require_grad) if training_args.lora_enable: # 加载lora from peft import LoraConfig, get_peft_model lora_config = LoraConfig( r=training_args.lora_r, lora_alpha=training_args.lora_alpha, target_modules=find_all_linear_names(model), lora_dropout=training_args.lora_dropout, bias=training_args.lora_bias, task_type="CAUSAL_LM", ) if training_args.bits == 16: if training_args.bf16: model.to(torch.bfloat16) if training_args.fp16: model.to(torch.float16) rank0_print("Adding LoRA adapters...") model = get_peft_model(model, lora_config) # 加载分词器 if 'mpt' in model_args.model_name_or_path: tokenizer = transformers.AutoTokenizer.from_pretrained( model_args.model_name_or_path, cache_dir=training_args.cache_dir, model_max_length=training_args.model_max_length, padding_side="right" ) else: tokenizer = transformers.AutoTokenizer.from_pretrained( model_args.model_name_or_path, cache_dir=training_args.cache_dir, model_max_length=training_args.model_max_length, padding_side="right", use_fast=False, ) # v1 if model_args.version == "v0": if tokenizer.pad_token is None: smart_tokenizer_and_embedding_resize( special_tokens_dict=dict(pad_token="[PAD]"), tokenizer=tokenizer, model=model, ) elif model_args.version == "v0.5": tokenizer.pad_token = tokenizer.unk_token else: tokenizer.pad_token = tokenizer.unk_token if model_args.version in conversation_lib.conv_templates: conversation_lib.default_conversation = conversation_lib.conv_templates[model_args.version] else: conversation_lib.default_conversation = conversation_lib.conv_templates["vicuna_v1"] # 初始化图像编码器映射层 if model_args.vision_tower is not None: model.get_model().initialize_vision_modules( model_args=model_args, fsdp=training_args.fsdp ) vision_tower = model.get_vision_tower() vision_tower.to(dtype=torch.bfloat16 if training_args.bf16 else torch.float16, device=training_args.device) data_args.image_processor = vision_tower.image_processor data_args.is_multimodal = True model.config.image_aspect_ratio = data_args.image_aspect_ratio model.config.tokenizer_padding_side = tokenizer.padding_side model.config.tokenizer_model_max_length = tokenizer.model_max_length # 开启图像编码映射权重 model.config.tune_mm_mlp_adapter = training_args.tune_mm_mlp_adapter = model_args.tune_mm_mlp_adapter if model_args.tune_mm_mlp_adapter: model.requires_grad_(False) for p in model.get_model().mm_projector.parameters(): p.requires_grad = True # 关闭图像编码映射权重 model.config.freeze_mm_mlp_adapter = training_args.freeze_mm_mlp_adapter if training_args.freeze_mm_mlp_adapter: for p in model.get_model().mm_projector.parameters(): p.requires_grad = False if training_args.bits in [4, 8]: model.get_model().mm_projector.to(dtype=compute_dtype, device=training_args.device) model.config.mm_use_im_start_end = data_args.mm_use_im_start_end = model_args.mm_use_im_start_end model.config.mm_projector_lr = training_args.mm_projector_lr training_args.use_im_start_end = model_args.mm_use_im_start_end model.config.mm_use_im_patch_token = model_args.mm_use_im_patch_token model.initialize_vision_tokenizer(model_args, tokenizer=tokenizer) if training_args.bits in [4, 8]: from peft.tuners.lora import LoraLayer for name, module in model.named_modules(): if isinstance(module, LoraLayer): if training_args.bf16: module = module.to(torch.bfloat16) if 'norm' in name: module = module.to(torch.float32) if 'lm_head' in name or 'embed_tokens' in name: if hasattr(module, 'weight'): if training_args.bf16 and module.weight.dtype == torch.float32: module = module.to(torch.bfloat16) # 制作输入数据 data_module = make_supervised_data_module(tokenizer=tokenizer, data_args=data_args) trainer = LLaVATrainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) if list(pathlib.Path(training_args.output_dir).glob("checkpoint-*")): trainer.train(resume_from_checkpoint=True) else: trainer.train() trainer.save_state() model.config.use_cache = True if training_args.lora_enable: state_dict = get_peft_state_maybe_zero_3( model.named_parameters(), training_args.lora_bias ) non_lora_state_dict = get_peft_state_non_lora_maybe_zero_3( model.named_parameters() ) if training_args.local_rank == 0 or training_args.local_rank == -1: model.config.save_pretrained(training_args.output_dir) model.save_pretrained(training_args.output_dir, state_dict=state_dict) torch.save(non_lora_state_dict, os.path.join(training_args.output_dir, 'non_lora_trainables.bin')) else: safe_save_model_for_hf_trainer(trainer=trainer, output_dir=training_args.output_dir)

三个dataclass

制作数据集

make_supervised_data_module

def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer, data_args) -> Dict: """Make dataset for supervised fine-tuning.""" train_dataset = LazySupervisedDataset(tokenizer=tokenizer, data_path=data_args.data_path, # ./playground/data/llava_v1_5_mix665k.json data_args=data_args) """Make collator for supervised fine-tuning.""" data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer) return dict(train_dataset=train_dataset, eval_dataset=None, data_collator=data_collator)

train_dataset

class LazySupervisedDataset(Dataset): """Dataset for supervised fine-tuning.""" def __init__(self, data_path: str, tokenizer: transformers.PreTrainedTokenizer, data_args: DataArguments): super(LazySupervisedDataset, self).__init__() # 加载json文件 list_data_dict = json.load(open(data_path, "r")) rank0_print("Formatting inputs...Skip in lazy mode") # 分词器 self.tokenizer = tokenizer self.list_data_dict = list_data_dict self.data_args = data_args def __len__(self): return len(self.list_data_dict) @property def lengths(self): # 计算token长度 length_list = [] for sample in self.list_data_dict: img_tokens = 128 if 'image' in sample else 0 length_list.append(sum(len(conv['value'].split()) for conv in sample['conversations']) + img_tokens) return length_list @property def modality_lengths(self): # 计算modality token长度 length_list = [] for sample in self.list_data_dict: cur_len = sum(len(conv['value'].split()) for conv in sample['conversations']) cur_len = cur_len if 'image' in sample else -cur_len length_list.append(cur_len) return length_list def __getitem__(self, i) -> Dict[str, torch.Tensor]: sources = self.list_data_dict[i] if isinstance(i, int): sources = [sources] assert len(sources) == 1, "Don't know why it is wrapped to a list" # FIXME if 'image' in sources[0]: image_file = self.list_data_dict[i]['image'] image_folder = self.data_args.image_folder processor = self.data_args.image_processor image = Image.open(os.path.join(image_folder, image_file)).convert('RGB') # 图片预处理 if self.data_args.image_aspect_ratio == 'pad': # 图片转换为正方形 def expand2square(pil_img, background_color): width, height = pil_img.size if width == height: return pil_img elif width > height: result = Image.new(pil_img.mode, (width, width), background_color) result.paste(pil_img, (0, (width - height) // 2)) return result else: result = Image.new(pil_img.mode, (height, height), background_color) result.paste(pil_img, ((height - width) // 2, 0)) return result image = expand2square(image, tuple(int(x * 255) for x in processor.image_mean)) image = processor.preprocess(image, return_tensors='pt')['pixel_values'][0] else: image = processor.preprocess(image, return_tensors='pt')['pixel_values'][0] # 对conversations进行处理 sources = preprocess_multimodal( copy.deepcopy([e["conversations"] for e in sources]), self.data_args) else: sources = copy.deepcopy([e["conversations"] for e in sources]) # QA NLP 处理 data_dict = preprocess( sources, self.tokenizer, has_image=('image' in self.list_data_dict[i])) if isinstance(i, int): data_dict = dict(input_ids=data_dict["input_ids"][0], labels=data_dict["labels"][0]) # image exist in the data if 'image' in self.list_data_dict[i]: data_dict['image'] = image elif self.data_args.is_multimodal: # image does not exist in the data, but the model is multimodal crop_size = self.data_args.image_processor.crop_size data_dict['image'] = torch.zeros(3, crop_size['height'], crop_size['width']) return data_dict def preprocess_multimodal( sources: Sequence[str], data_args: DataArguments ) -> Dict: is_multimodal = data_args.is_multimodal if not is_multimodal: return sources for source in sources: for sentence in source: if DEFAULT_IMAGE_TOKEN in sentence['value']: # # 去掉和移除字符串的开头和结尾的空白字符（例如空格、制表符、换行符等） sentence['value'] = sentence['value'].replace(DEFAULT_IMAGE_TOKEN, '').strip() # 在句子开头加上 \n sentence['value'] = DEFAULT_IMAGE_TOKEN + '\n' + sentence['value'] # 移除字符串的开头和结尾的空白字符（例如空格、制表符、换行符等） sentence['value'] = sentence['value'].strip() if "mmtag" in conversation_lib.default_conversation.version: sentence['value'] = sentence['value'].replace(DEFAULT_IMAGE_TOKEN, '' + DEFAULT_IMAGE_TOKEN + '') replace_token = DEFAULT_IMAGE_TOKEN if data_args.mm_use_im_start_end: replace_token = DEFAULT_IM_START_TOKEN + replace_token + DEFAULT_IM_END_TOKEN sentence["value"] = sentence["value"].replace(DEFAULT_IMAGE_TOKEN, replace_token) return sources def preprocess( sources: Sequence[str], tokenizer: transformers.PreTrainedTokenizer, has_image: bool = False ) -> Dict: """ Given a list of sources, each is a conversation list. This transform: 1. Add signal '### ' at the beginning each sentence, with end signal '\n'; 2. Concatenate conversations together; 3. Tokenize the concatenated conversation; 4. Make a deepcopy as the target. Mask human words with IGNORE_INDEX. """ if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.PLAIN: return preprocess_plain(sources, tokenizer) if conversation_lib.default_conversation.sep_style == conversation_lib.SeparatorStyle.LLAMA_2: return preprocess_llama_2(sources, tokenizer, has_image=has_image) if conversation_lib.default_conversation.version.startswith("v1"): return preprocess_v1(sources, tokenizer, has_image=has_image) # 使用这个 if conversation_lib.default_conversation.version == "mpt": return preprocess_mpt(sources, tokenizer) ...# 后面的不重要 def preprocess_v1( sources, tokenizer: transformers.PreTrainedTokenizer, has_image: bool = False ) -> Dict: # 获取conversation模板类 conv = conversation_lib.default_conversation.copy() roles = {"human": conv.roles[0], "gpt": conv.roles[1]} # Apply prompt templates conversations = [] for i, source in enumerate(sources): if roles[source[0]["from"]] != conv.roles[0]: # Skip the first one if it is not from human source = source[1:] conv.messages = [] for j, sentence in enumerate(source): role = roles[sentence["from"]] assert role == conv.roles[j % 2], f"{i}" conv.append_message(role, sentence["value"]) conversations.append(conv.get_prompt()) # Tokenize conversations if has_image: input_ids = torch.stack( [tokenizer_image_token(prompt, tokenizer, return_tensors='pt') for prompt in conversations], dim=0) else: input_ids = tokenizer( conversations, return_tensors="pt", padding="longest", max_length=tokenizer.model_max_length, truncation=True, ).input_ids targets = input_ids.clone() assert conv.sep_style == conversation_lib.SeparatorStyle.TWO # Mask targets sep = conv.sep + conv.roles[1] + ": " for conversation, target in zip(conversations, targets): total_len = int(target.ne(tokenizer.pad_token_id).sum()) rounds = conversation.split(conv.sep2) cur_len = 1 target[:cur_len] = IGNORE_INDEX for i, rou in enumerate(rounds): if rou == "": break parts = rou.split(sep) if len(parts) != 2: break parts[0] += sep if has_image: round_len = len(tokenizer_image_token(rou, tokenizer)) instruction_len = len(tokenizer_image_token(parts[0], tokenizer)) - 2 else: round_len = len(tokenizer(rou).input_ids) instruction_len = len(tokenizer(parts[0]).input_ids) - 2 target[cur_len: cur_len + instruction_len] = IGNORE_INDEX cur_len += round_len target[cur_len:] = IGNORE_INDEX if cur_len < tokenizer.model_max_length: if cur_len != total_len: target[:] = IGNORE_INDEX print( f"WARNING: tokenization mismatch: {cur_len} vs. {total_len}." f" (ignored)" ) return dict( input_ids=input_ids, labels=targets, ) default_conversation = conv_vicuna_v1 conv_vicuna_v1 = Conversation( system="A chat between a curious user and an artificial intelligence assistant. " "The assistant gives helpful, detailed, and polite answers to the user's questions.", roles=("USER", "ASSISTANT"), version="v1", messages=(), offset=0, sep_style=SeparatorStyle.TWO, sep=" ", sep2="", )

data_collator

# 批量化、填充和限制输入长度 @dataclass class DataCollatorForSupervisedDataset(object): """Collate examples for supervised fine-tuning.""" tokenizer: transformers.PreTrainedTokenizer def __call__(self, instances: Sequence[Dict]) -> Dict[str, torch.Tensor]: input_ids, labels = tuple([instance[key] for instance in instances] for key in ("input_ids", "labels")) input_ids = torch.nn.utils.rnn.pad_sequence( input_ids, batch_first=True, padding_value=self.tokenizer.pad_token_id) labels = torch.nn.utils.rnn.pad_sequence(labels, batch_first=True, padding_value=IGNORE_INDEX) input_ids = input_ids[:, :self.tokenizer.model_max_length] labels = labels[:, :self.tokenizer.model_max_length] batch = dict( input_ids=input_ids, labels=labels, attention_mask=input_ids.ne(self.tokenizer.pad_token_id), ) if 'image' in instances[0]: images = [instance['image'] for instance in instances] if all(x is not None and x.shape == images[0].shape for x in images): batch['images'] = torch.stack(images) else: batch['images'] = images return batch

为什么wal会提升数据库性能浩澜大大数据库
由于对于一个数据库内会存在很多张表，那么当数据库更新表数据时（1）直接写入磁盘实际写入的位置，会根据表的不同对应到不同的磁盘位置，在写入数据的时候，就会不停的寻找磁盘地址，找到地址后再去写入，对于机械硬盘来说，无规律的寻址是非常耗时的，对应SSD来说虽然性能提升很多，但是也会消耗时间；（2）先写入日志，在写入磁盘（WAL）WAL的过程，由于总是按照在文件末尾追加，只要找到文件写入位置，写入修改后，
【Python】一文详细介绍 py格式文件高斯小哥 Python基础【高质量合集】python 新手入门学习
【Python】一文详细介绍py格式文件个人主页：高斯小哥高质量专栏：Matplotlib之旅：零基础精通数据可视化、Python基础【高质量合集】、PyTorch零基础入门教程希望得到您的订阅和支持~创作高质量博文(平均质量分92+)，分享更多关于深度学习、PyTorch、Python领域的优质内容！（希望得到您的关注~）文章目录一、py格式文件简介二、如何创建和编辑py格式文件三、如何运行py
Android和IOS应用开发-Flutter应用让屏幕在 app 运行期间保持常亮的方法江上清风山间明月 Flutter android ios flutter KeepAlive 屏幕常亮 wakelock 熄屏
文章目录Flutter应用让屏幕在app运行期间保持常亮的方法方法一：使用系统插件方法二：使用Widgets注意事项Flutter应用让屏幕在app运行期间保持常亮的方法在Flutter开发中，可以使用以下两种方法让屏幕在app运行期间保持常亮：方法一：使用系统插件Flutter社区中已经有很多相关插件可供使用，比如wakelock:https://pub.dev/packages/wakeloc
数据结构奇妙旅程之深入解析快速排序山间漫步人生路数据结构排序算法算法
快速排序（QuickSort）是一种高效的排序算法，它使用了分治法的策略来将一个数组排序。其基本思想是选择一个基准元素，通过一趟排序将待排序的数据分割成独立的两部分，其中一部分的所有数据都比基准元素小，另一部分的所有数据都比基准元素大，然后再按此方法对这两部分数据分别进行快速排序，整个排序过程可以递归进行，以此达到整个数据变成有序序列。工作原理选择基准：从待排序的序列中选一个元素作为基准（pivo
python抓包与解包_Python—网络抓包与解包（pcap、dpkt） weixin_39691055 python抓包与解包
pcap安装[root@localhost~]#pipinstallpypcap抓包与解包#-*-coding:utf-8-*-importpcap,dpktimportre,threading,requests__black_ip=['103.224.249.123','203.66.1.212']#抓包：param1eth_name网卡名，如：eth0,eth3。param2p_type日志捕
打印出1-100的奇数。（C语言）王多鱼001 C语言 c语言算法数据结构
代码：#includeintmain(){for(inti=1;i<101;i++){if(i%2==1){printf("%d,",i);}}return0;}
生活中的鸡毛蒜皮-----心情琐碎记录安家妈妈
陪孩子打预防针回来时的小发现今天天气特别的好，阳光灿烂，太阳晒得人暖融融的。可惜这么好的天气就不去郊游，而是去打预防针疫苗。孩子已经六岁了，这是最后的一次接种疫苗打针。昨天晚上接到电话，还有一点担心孩子会害怕，会不会紧张，来医院会不会怕到不敢进去。试想哪一个孩子听到打针会不紧张呢？结果过程居然顺利的不可思议，没有紧张也没有害怕，也没有反复的问。来到社区医院的大楼，还觉得非常有趣好玩的样子。为了让她
下载Android源码赛非斯
repoinit-uhttps://mirrors.tuna.tsinghua.edu.cn/git/AOSP/platform/manifest-bandroid-10.0.0_r411.首先下载repo：a）终端运行gitclonegit://codeaurora.org/tools/repo.gitb）mkdir~/binc）拷贝repo到~/bin下面，修改repo权限，chmoda+x~
读思001 ‖ 变负能为正能，变压力为动力你不懂夜的黑
今天起开始写一个言说文集连载，重点为读写思考收获和感想，也收录生活和工作中开悟到的点滴，仍然是一个碎片式的思考积累。希望这样的思考能启迪我的生活智慧，开悟我的思想境界，也算是一个修心的过程吧。这个连载不定期更新，重在积累生活和工作中的随思碎思，或许也是一厢情愿的一个梦。也或许这个梦是我坚持说下去的一个重要理由。读思001变负能为正能，变压力为动力1从来没有一种哲学能解决一切问题，也从来没有一种药能
华为OD机试 - 单向链表中间节点（Java & JS & Python & C & C++）华为OD题库华为od 链表 java
须知哈喽，本题库完全免费，收费是为了防止被爬，大家订阅专栏后可以私信联系退款。感谢支持文章目录须知题目描述输出描述解析代码题目描述给定一个单链表L，请编写程序输出L中间结点保存的数据。如果有两个中间结点，则输出第二个中间结点保存的数据。例如：给定L为1→7→5，则输出应该为7；给定L为1→2→3→4，则输出应该为3；输入描述每个输入包含1个测试用例。每个测试用例：第一行给出链表首结点的地址、结点总
llama.cpp 编译安装@Ubuntu skywalk8163 项目实践人工智能 llama ubuntu linux 人工智能
在Kylin和Ubuntu编译llama.cpp，具体参考：llama模型c语言推理@FreeBSD-CSDN博客现在代码并编译：gitclonehttps://github.com/ggerganov/llama.cppcdllama.cppmkdirbuildcdbuildcmake..cmake--build.--configRelease#可选安装makeinstall#或可选添加路径ex
python 推导式(派生、衍生) sanduo112 人工智能 python windows 开发语言
python推导式一、推导式(派生、衍生)1.Python推导式是一种独特的数据处理方式，可以从一个数据序列构建另一个新的数据序列的结构体。2.列表(list)推导式3.字典(dict)推导式4.集合(set)推导式5.元组(tuple)推导式二、代码概述一、推导式(派生、衍生)1.Python推导式是一种独特的数据处理方式，可以从一个数据序列构建另一个新的数据序列的结构体。Python支持各种数
uni-app实现步骤条夏夏的码农 uni-app
实现如图样式html部分代码如下投资期限与收益0?'active':'default'">募集开始1?'active':'default'">募集结束2?'active':'default'">产品成立3?'active':'default'">产品到期0?'active-step1':'step1'">1?'active-st
数据分析：低代码平台助力大数据时代的飞跃发展快乐非自愿数据分析低代码大数据
随着信息技术的突飞猛进，我们身处于一个数据量空前增长的时代——大数据时代。在这个时代背景下，数据分析已经成为企业决策、政策制定、科学研究等众多领域不可或缺的重要工具。然而，面对海量的数据和日益复杂多变的分析需求，传统的数据分析方法往往捉襟见肘，难以应对。幸运的是，低代码平台的兴起为大数据分析注入了新的活力，成为推动大数据时代发展的重要力量。低代码平台，顾名思义，是一种通过少量甚至无需编写代码，就能
数据挖掘|数据预处理|基于Python的数据标准化方法皖山文武数据挖掘数据建模与分析 python 数据挖掘开发语言
基于Python的数据标准化方法1.z-score方法2.极差标准化方法3.最大绝对值标准化方法在数据分析之前，通常需要先将数据标准化（Standardization），利用标准化后的数据进行数据分析，以避免属性之间不同度量和取值范围差异造成数据对分析结果的影响。1.z-score方法Z-score方法是基于原始数据的均值和标准差来进行数据标准化的，处理后的数据均值为0，方差为1，符合标准正态分布
为千佩蓉：为家庭放弃事业的男人没出息吗？北京朵多教育
为家庭放弃事业的男人没出息吗我们在搭档的过程中会遇到一些问题，因此要做好心理准备，这样才不会陷入抱怨或双输的局面。关键不是谁比谁能力强，而是夫妻之间如何取长补短。家庭要赢需要每个人的付出和牺牲，大家一起分担和负责，这样才能享受相爱的自由、安全和归属感。我们需要做的牺牲包括自己的时间、自己的一些爱好，甚至事业发展。为千认为做好父亲是非常重要的事，所以他不仅要出席在孩子们的生活中，还要积极地参与和带领
CSV指南：Python程序获取大型CSV文件行数孤独打铁匠Julian 笔记经验分享 python
本指南提供了几种使用Python来获取大型CSV文件行数的方法，并解释了每种方法的适用场景。方法1:使用csv.reader处理复杂CSV文件当你的CSV文件中包含多行字段（即某些字段的值中包含换行符）时，使用csv.reader是一个可靠的选择，因为它能够正确处理这些复杂情况。这个方法适用于大多数大小的CSV文件，但是对于非常大的文件，读取整个文件可能会占用较多的时间和内存。对于极大的文件，考虑
常用API---初始化，话题和服务对象创建，回旋/头函数，时间（如：获取当前时刻，持续时间执行频率，休眠，定时器等），运行频率，定时器，节点生命周期，日志相关API 李卓璐机器人人工智能 c++
初始化ROS节点ros::init(argc,argv,“name”,options);作用：ros初始化函数参数： 1.argc—封装实参的个数（n+1个->有一个是自身）； 2.argv—封装参数的数组； 3.name—为节点命名（唯一性）； 4.options—节点启动项，解决节点重名问题。返回值：void使用： 1.argc和argv的使用：如果按照ROS中的特定格式传入实参，那么RO
helm 部署 Kube-Prometheus + Grafana + 钉钉告警部署 Kube-Prometheus zxj19880502 grafana prometheus
背景角色IPK8S版本容器运行时k8s-master-1172.16.16.108v1.24.1containerd://1.6.8k8s-node-1172.16.16.109v1.24.1containerd://1.6.8k8s-node-2172.16.16.110v1.24.1containerd://1.6.8安装kube-prometheusmkdir-p/data/yaml/kub
yarn的安装和使用全网最详细教程 zxj19880502 yarn npm
一、yarn的简介：Yarn是facebook发布的一款取代npm的包管理工具。二、yarn的特点：速度超快。Yarn缓存了每个下载过的包，所以再次使用时无需重复下载。同时利用并行下载以最大化资源利用率，因此安装速度更快。超级安全。在执行代码之前，Yarn会通过算法校验每个安装包的完整性。超级可靠。使用详细、简洁的锁文件格式和明确的安装算法，Yarn能够保证在不同系统上无差异的工作。三、yarn的
直返APP所属的公司是何时成立的?它的发展历程和业务范围好项目高省
直返APP为我们带来了返利购物的便利，那么这款APP所属的公司是如何成立的呢？它的背后又有怎样的发展历程和业务范围呢？让我们一起探寻。【高省】APP（高佣金领导者）是一个自用省钱佣金高，分享推广赚钱多的平台，百度有几百万篇报道，运行三年，稳定可靠。高省APP，是2021年推出的平台，0投资，0风险、高省APP佣金更高，模式更好，终端用户不流失。高省是公认的返利最高的软件。古楼导师高省邀请码5558
中原焦点团队38期王芳芳坚持分享第236天，20230630总约练134次，来访113次，咨8次，观察员13次芳芳王
学习焦点的初心是想拯救孩子，孩子由于沉迷游戏，成绩下滑，在学习的过程中发现是自己的教育方式出了状况。经过半年的学习，一些焦点的基本技巧，如接纳、欣赏、倾听、同理心、尊重等都有了一定的了解。但在实际应用时仍然存在很多问题，感觉自己仍然没有放下对孩子成绩的期望，仍然把握不住对孩子管理的度。我该如何去陪伴好孩子？多用心去听课，并加强反思，多约练。去思考如何让自己快乐起来？
图论记录之最短路迪杰斯特拉 Just right 算法图论 java 开发语言
简述思想这个思想能用一句话来概括，精简到的极致:每次找到一个最短距离的点并更新起点到各个点的最短距离如果要可视化的话，B站搜索Dijksra算法，有视频讲解伪代码写到这里，其实是想整一个动画的，这样效果更好点，但由于种种原因所以就拖一下intdijkstr(){dist[1]=0;其余的点的距离全部初始化为真无穷，不要写成int的最大值迭代n次将不在s中的，且距离最近的点给tsj即先到t，再加上t
【转载】SSD测试第一神器——FIO running_sheep
转自：[http://www.ssdfans.com]对于SSD性能测试来说，最好的工具莫过于FIO了。FIO是Jens开发的一个开源测试工具，功能非常强大，本文就只介绍其中一些基本功能。线程，队列深度，Offset，同步异步，DirectIO，BIO使用FIO之前，首先要有一些SSD性能测试的基础知识。线程指的是同时有多少个读或写任务在并行执行，一般来说，CPU里面的一个核心同一时间只能运行一个
大前端-postcss安装使用指南黑夜照亮前行的路 postcss
PostCSS是一款强大的CSS处理工具，可以用来自动添加浏览器前缀、代码合并、代码压缩等，提升代码的可读性，并支持使用最新的CSS语法。以下是一份简化的PostCSS安装使用指南：一、安装PostCSS在你的项目目录中，通过npm（NodePackageManager）来安装PostCSS。打开命令行窗口，输入以下命令：bash复制代码npminstallpostcss--save-dev这将把
谷歌浏览器驱动Chromedriver（114-120版本）文件以及驱动下载教程 pigerr杨 Python python chrome drivers
ChromeDriver官方网站GitHub||GoogleChromeLabs/chrome-for-testingChromeDriver113-125_JSONChromeforTestingavailability123-125zip白月黑羽Python基础|进阶|Qt图形界面|Django|自动化测试|性能测试|JS语言|JS前端|原理与安装
通俗易懂：什么是Java虚拟机（JVM）？它的主要作用是什么？大龄下岗程序员 mysql java mysql spring
Java虚拟机（JavaVirtualMachine,JVM）是一种软件实现的抽象计算机，它负责执行Java字节码（Bytecode）。Java程序并不是直接在物理计算机上运行，而是先由Java编译器将源代码编译成与平台无关的字节码，然后由JVM负责读取字节码并在实际硬件架构上运行。JVM的主要作用包括以下几个方面：1.跨平台性-JVM是Java语言“一次编写，到处运行”（WriteOnce,Ru
大创项目推荐深度学习 opencv python 公式识别(图像识别机器视觉) laafeer python
文章目录0前言1课题说明2效果展示3具体实现4关键代码实现5算法综合效果6最后0前言优质竞赛项目系列，今天要分享的是基于深度学习的数学公式识别算法实现该项目较为新颖，适合作为竞赛课题方向，学长非常推荐！学长这里给一个题目综合评分(每项满分5分)难度系数：3分工作量：4分创新点：4分更多资料,项目分享：https://gitee.com/dancheng-senior/postgraduate1课题
标定系列——基于OpenCV实现普通相机、鱼眼相机不同标定板下的标定（五） JANGHIGH 标定 opencv
标定系列——基于OpenCV实现相机标定（五）说明代码解析VID5.xmlin_VID5.xmlcamera_calibration.cpp说明该程序可以实现多种标定板的相机标定工作代码解析VID5.xmlimages/CameraCalibration/VID5/xx1.jpgimages/CameraCalibration/VID5/xx2.jpgimages/CameraCalibratio
认识一个金苹果（一）紫藤11
群里来了一个有趣的人，很喜欢她的有才和有趣，就是嘴碎点，很好奇，她是男生？还是女生？是90后？还是80后，太多疑问，但我又不想问她，只在群里跟她斗嘴，享受这个过程。跟她斗嘴的过程，跟我跟我儿子一样，总是抬杠，但又喜欢。不管她是男生还是女生，她的文采真是很好，看她的文章，直接有读下去的欲望，有想看续集的想法，她很努力向上，也很有荣誉感，但又不想让别人看穿，总是一个吊儿郎当的样子。明明跟同桌，那么友爱
如何用ruby来写hadoop的mapreduce并生成jar包 wudixiaotie mapreduce
ruby来写hadoop的mapreduce，我用的方法是rubydoop。怎么配置环境呢： 1.安装rvm：不说了网上有 2.安装ruby：由于我以前是做ruby的，所以习惯性的先安装了ruby，起码调试起来比jruby快多了。 3.安装jruby： rvm install jruby然后等待安
java编程思想 -- 访问控制权限百合不是茶 java 访问控制权限单例模式
访问权限是java中一个比较中要的知识点,它规定者什么方法可以访问,什么不可以访问一:包访问权限; 自定义包: package com.wj.control; //包 public class Demo { //定义一个无参的方法 public void DemoPackage(){ System.out.println("调用
[生物与医学]请审慎食用小龙虾 comsci 生物
现在的餐馆里面出售的小龙虾,有一些是在野外捕捉的,这些小龙虾身体里面可能带有某些病毒和细菌,人食用以后可能会导致一些疾病,严重的甚至会死亡..... 所以,参加聚餐的时候,最好不要点小龙虾...就吃养殖的猪肉,牛肉,羊肉和鱼,等动物蛋白质
org.apache.jasper.JasperException: Unable to compile class for JSP: 商人shang maven 2.2 jdk1.8
环境： jdk1.8 maven tomcat7-maven-plugin 2.0 原因： tomcat7-maven-plugin 2.0 不知吃 jdk 1.8，换成 tomcat7-maven-plugin 2.2就行，即 <plugin>
你的垃圾你处理掉了吗?GC oloz GC
前序:本人菜鸟，此文研究学习来自网络，各位牛牛多指教　 1.垃圾收集算法的核心思想　　Java语言建立了垃圾收集机制，用以跟踪正在使用的对象和发现并回收不再使用(引用)的对象。该机制可以有效防范动态内存分配中可能发生的两个危险：因内存垃圾过多而引发的内存耗尽，以及不恰当的内存释放所造成的内存非法引用。　　垃圾收集算法的核心思想是：对虚拟机可用内存空间，即堆空间中的对象进行识别
shiro 和 SESSSION 杨白白 shiro
shiro 在web项目里默认使用的是web容器提供的session，也就是说shiro使用的session是web容器产生的，并不是自己产生的，在用于非web环境时可用其他来源代替。在web工程启动的时候它就和容器绑定在了一起，这是通过web.xml里面的shiroFilter实现的。通过session.getSession()方法会在浏览器cokkice产生JESSIONID，当关闭浏览器，此
移动互联网终端淘宝客如何实现盈利小桔子移動客戶端淘客淘寶App
2012年淘宝联盟平台为站长和淘宝客带来的分成收入突破30亿元，同比增长100%。而来自移动端的分成达1亿元，其中美丽说、蘑菇街、果库、口袋购物等App运营商分成近5000万元。可以看出，虽然目前阶段PC端对于淘客而言仍旧是盈利的大头，但移动端已经呈现出爆发之势。而且这个势头将随着智能终端(手机，平板)的加速普及而更加迅猛
wordpress小工具制作 aichenglong wordpress 小工具
wordpress 使用侧边栏的小工具，很方便调整页面结构小工具的制作过程 1 在自己的主题文件中新建一个文件夹(如widget)，在文件夹中创建一个php(AWP_posts-category.php) 小工具是一个类,想侧边栏一样，还得使用代码注册，他才可以再后台使用，基本的代码一层不变 <?php class AWP_Post_Category extends WP_Wi
JS微信分享 AILIKES js
// 所有功能必须包含在 WeixinApi.ready 中进行 WeixinApi.ready(function(Api) { // 微信分享的数据 var wxData = { &nb
封装探讨百合不是茶 JAVA面向对象封装
//封装属性方法将某些东西包装在一起，通过创建对象或使用静态的方法来调用，称为封装；封装其实就是有选择性地公开或隐藏某些信息，它解决了数据的安全性问题，增加代码的可读性和可维护性在 Aname类中申明三个属性，将其封装在一个类中：通过对象来调用例如 1： //属性将其设为私有姓名 name 可以公开
jquery radio/checkbox change事件不能触发的问题 bijian1013 JavaScript jquery
我想让radio来控制当前我选择的是机动车还是特种车，如下所示： <html> <head> <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js" type="text/javascript"><
AngularJS中安全性措施 bijian1013 JavaScript AngularJS 安全性 XSRF JSON漏洞
在使用web应用中，安全性是应该首要考虑的一个问题。AngularJS提供了一些辅助机制，用来防护来自两个常见攻击方向的网络攻击。一.JSON漏洞当使用一个GET请求获取JSON数组信息的时候（尤其是当这一信息非常敏感，
[Maven学习笔记九]Maven发布web项目 bit1129 maven
基于Maven的web项目的标准项目结构 user-project user-core user-service user-web src
【Hive七】Hive用户自定义聚合函数(UDAF) bit1129 hive
用户自定义聚合函数，用户提供的多个入参通过聚合计算(求和、求最大值、求最小值)得到一个聚合计算结果的函数。问题：UDF也可以提供输入多个参数然后输出一个结果的运算，比如加法运算add(3，5)，add这个UDF需要实现UDF的evaluate方法,那么UDF和UDAF的实质分别究竟是什么？ Double evaluate(Double a, Double b)
通过 nginx-lua 给 Nginx 增加 OAuth 支持 ronin47
前言：我们使用Nginx的Lua中间件建立了OAuth2认证和授权层。如果你也有此打算，阅读下面的文档，实现自动化并获得收益。SeatGeek 在过去几年中取得了发展，我们已经积累了不少针对各种任务的不同管理接口。我们通常为新的展示需求创建新模块，比如我们自己的博客、图表等。我们还定期开发内部工具来处理诸如部署、可视化操作及事件处理等事务。在处理这些事务中，我们使用了几个不同的接口来认证： &n
利用tomcat-redis-session-manager做session同步时自定义类对象属性保存不上的解决方法 bsr1983 session
在利用tomcat-redis-session-manager做session同步时，遇到了在session保存一个自定义对象时，修改该对象中的某个属性，session未进行序列化，属性没有被存储到redis中。在 tomcat-redis-session-manager的github上有如下说明： Session Change Tracking As noted in the &qu
《代码大全》表驱动法-Table Driven Approach-1 bylijinnan java 算法
关于Table Driven Approach的一篇非常好的文章： http://www.codeproject.com/Articles/42732/Table-driven-Approach package com.ljn.base; import java.util.Random; public class TableDriven { public
Sybase封锁原理 chicony Sybase
昨天在操作Sybase IQ12.7时意外操作造成了数据库表锁定，不能删除被锁定表数据也不能往其中写入数据。由于着急往该表抽入数据，因此立马着手解决该表的解锁问题。无奈此前没有接触过Sybase IQ12.7这套数据库产品，加之当时已属于下班时间无法求助于支持人员支持，因此只有借助搜索引擎强大的
java异常处理机制 CrazyMizzz java
java异常关键字有以下几个，分别为 try catch final throw throws 他们的定义分别为 try： Opening exception-handling statement. catch： Captures the exception. finally： Runs its code before terminating
hive 数据插入DML语法汇总 daizj hive DML 数据插入
Hive的数据插入DML语法汇总1、Loading files into tables语法：1) LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]解释：1)、上面命令执行环境为hive客户端环境下： hive>l
工厂设计模式 dcj3sjt126com 设计模式
使用设计模式是促进最佳实践和良好设计的好办法。设计模式可以提供针对常见的编程问题的灵活的解决方案。工厂模式工厂模式（Factory）允许你在代码执行时实例化对象。它之所以被称为工厂模式是因为它负责“生产”对象。工厂方法的参数是你要生成的对象对应的类名称。 Example #1 调用工厂方法（带参数） <?phpclass Example{
mysql字符串查找函数 dcj3sjt126com mysql
FIND_IN_SET(str,strlist) 假如字符串str 在由N 子链组成的字符串列表strlist 中，则返回值的范围在1到 N 之间。一个字符串列表就是一个由一些被‘,’符号分开的自链组成的字符串。如果第一个参数是一个常数字符串，而第二个是type SET列，则 FIND_IN_SET() 函数被优化，使用比特计算。如果str不在strlist 或st
jvm内存管理 easterfly jvm
一、JVM堆内存的划分分为年轻代和年老代。年轻代又分为三部分：一个eden,两个survivor。工作过程是这样的：e区空间满了后，执行minor gc，存活下来的对象放入s0, 对s0仍会进行minor gc，存活下来的的对象放入s1中，对s1同样执行minor gc，依旧存活的对象就放入年老代中；年老代满了之后会执行major gc，这个是stop the word模式，执行
CentOS-6.3安装配置JDK-8 gengzg centos
JAVA_HOME=/usr/java/jdk1.8.0_45 JRE_HOME=/usr/java/jdk1.8.0_45/jre PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib export JAVA_HOME
【转】关于web路径的获取方法 huangyc1210 Web 路径
假定你的web application 名称为news,你在浏览器中输入请求路径： http://localhost:8080/news/main/list.jsp 则执行下面向行代码后打印出如下结果： 1、 System.out.println(request.getContextPath()); //可返回站点的根路径。也就是项
php里获取第一个中文首字母并排序远去的渡口数据结构 PHP
很久没来更新博客了，还是觉得工作需要多总结的好。今天来更新一个自己认为比较有成就的问题吧。最近在做储值结算，需求里结算首页需要按门店的首字母A-Z排序。我的数据结构原本是这样的： Array ( [0] => Array ( [sid] => 2885842 [recetcstoredpay] =&g
java内部类 hm4123660 java 内部类匿名内部类成员内部类方法内部类
　在Java中，可以将一个类定义在另一个类里面或者一个方法里面，这样的类称为内部类。内部类仍然是一个独立的类，在编译之后内部类会被编译成独立的.class文件，但是前面冠以外部类的类名和$符号。内部类可以间接解决多继承问题,可以使用内部类继承一个类，外部类继承一个类，实现多继承。 &nb
Caused by: java.lang.IncompatibleClassChangeError: class org.hibernate.cfg.Exten zhb8015
maven pom.xml关于hibernate的配置和异常信息如下，查了好多资料，问题还是没有解决。只知道是包冲突，就是不知道是哪个包....遇到这个问题的分享下是怎么解决的。。 maven pom: <dependency> <groupId>org.hibernate</groupId> <ar
Spark 性能相关参数配置详解－任务调度篇 Stark_Summer spark cache cpu 任务调度 yarn
随着Spark的逐渐成熟完善, 越来越多的可配置参数被添加到Spark中来, 本文试图通过阐述这其中部分参数的工作原理和配置思路, 和大家一起探讨一下如何根据实际场合对Spark进行配置优化。由于篇幅较长，所以在这里分篇组织，如果要看最新完整的网页版内容，可以戳这里：http://spark-config.readthedocs.org/，主要是便
css3滤镜 wangkeheng html css
经常看到一些网站的底部有一些灰色的图标，鼠标移入的时候会变亮，开始以为是js操作src或者bg呢，搜索了一下，发现了一个更好的方法：通过css3的滤镜方法。 html代码： <a href='' class='icon'><img src='utv.jpg' /></a> css代码： .icon{-webkit-filter: graysc

6、LLaVA

简介

Training

Stage one：Pre-training for Feature Alignment

Stage two：Fine-tuning End-to-End.

模型搭建

环境搭建

下载预训练权重

加载模型

主model

图像编码器

clip 图像编码器

图像特征映射到token

主模型功能实现类（重点）

Trainging

Finetuning

模型fintune入口——train.py

制作数据集

你可能感兴趣的:(#,代码重建运行过程,python)