多类分类是指目标变量(标签)有超过两个类别的分类任务。例如:
Softmax将神经网络的原始输出(logits)转换为概率分布,公式为:
[
\sigma(\mathbf{z})_i = \frac{e{z_i}}{\sum_{j=1}K e^{z_j}}}, \quad i = 1, \dots, K
]
假设神经网络对3个类别的原始输出为 ( \mathbf{z} = [2.0, 1.0, 0.1] ):
[
\begin{aligned}
\sigma(\mathbf{z})_1 &= \frac{e{2.0}}{e{2.0} + e^{1.0} + e^{0.1}}} \approx 0.659 \
\sigma(\mathbf{z})_2 &= \frac{e{1.0}}{e{2.0} + e^{1.0} + e^{0.1}}} \approx 0.242 \
\sigma(\mathbf{z})_3 &= \frac{e{0.1}}{e{2.0} + e^{1.0} + e^{0.1}}} \approx 0.099 \
\end{aligned}
]
最终概率分布:[0.659, 0.242, 0.099]
→ 预测为第1类。
exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
softmax = exp_z / np.sum(exp_z, axis=1, keepdims=True)
Softmax与交叉熵的梯度计算被合并优化,梯度形式简洁:
[
\frac{\partial J}{\partial z_i} = \hat{y}_i - y_i
]
import numpy as np
def softmax(z):
exp_z = np.exp(z - np.max(z, axis=1, keepdims=True)) # 防溢出
return exp_z / np.sum(exp_z, axis=1, keepdims=True)
# 示例:3个样本,4个类别
logits = np.array([[1.0, 2.0, 3.0, 4.0],
[0.5, 1.0, 2.0, 3.0],
[-1.0, 0.0, 1.0, 2.0]])
probabilities = softmax(logits)
print("概率分布:\n", probabilities)
import torch
import torch.nn as nn
logits = torch.tensor([[1.0, 2.0, 3.0], [0.1, 0.2, 0.3]])
softmax = nn.Softmax(dim=1)
probabilities = softmax(logits)
print(probabilities)
from tensorflow.keras.layers import Softmax
logits = tf.constant([[1.0, 2.0, 3.0], [0.1, 0.2, 0.3]])
softmax = Softmax(axis=-1)
probabilities = softmax(logits)
print(probabilities.numpy())
任务类型 | 输出层激活函数 | 损失函数 | 标签格式 |
---|---|---|---|
二分类 | Sigmoid | 二元交叉熵 | 0或1 |
多类分类 | Softmax | 分类交叉熵 | One-hot编码 |
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'],
class_weight={0: 1, 1: 2, 2: 1}) # 类别1的权重加倍