机器学习2021李宏毅-学习笔记-第一集

机器学习2021李宏毅_学习笔记_第一集

  • 机器学习基本概念简介
    • Machine Learning
    • Different Types of Functions
    • How to Find the Function
    • To Modify the Model
    • Model Bias
  • 资料来源参考

机器学习基本概念简介

Machine Learning

≈ \approx Looking for Function

Different Types of Functions

(1) Regression回归

The function outputs a scalar.
eg. Prediction

(2) Classfication分类

Given options(classes), the function outputs the correct one.
eg. Spam filtering
eg. AlphaGo: Positions on the board → \rightarrow F → \rightarrow a position on the board, one class ∈ \in 19*19 classes

(3) Structured Learning创造

Create something with structure(such as image, document).

How to Find the Function

Step 0: Prepare Dataset准备数据集
Step 1: Define Model定义模型

Based on domain knowledge, we can start with a guess like Functions with Unknown Parameters: y = b + w * x 1 x^1 x1

  • y = b + w * x 1 x^1 x1 is called model模型
  • y : no. of views on 2/26
  • x 1 x^1 x1: no. of views on 2/25, called feature特征.
  • w and b are unknown parameters(learned from data). w is called weight权重, b is called bias偏置.

Step 2: Construct Loss from training data构造损失函数

  • Loss is a function of parameters. → \rightarrow L(b, w)
  • Loss evaluates how good a set of values is.
  • eg. Suppose that b = 0.5k, w = 1, which means L(0.5k, 1), then y = 0.5k + 1 * x 1 x^1 x1.
    If e 1 e_1 e1 = |y - y ^ \hat{y} y^|, then Loss can be: L = 1 N \frac{1}{N} N1 ∑ n \sum_{n} n e n e_n en, called Mean Absolute Error(MAE)
    If e 1 e_1 e1 = ( y − y ^ ) 2 (y - \hat{y})^2 (yy^)2, then Loss can be: L = 1 N \frac{1}{N} N1 ∑ n \sum_n n e n e_n en, called Mean Square Error(MSE)
  • If y and y ^ \hat{y} y^ are both probability distributions, we may use Cross-Entropy.

Step 3: Construct Optimizer构造优化器

Optimization: w ∗ w^* w, b ∗ b^* b = arg m i n w , b min_{w, b} minw,bL
We may use Gradient Descent梯度下降.
机器学习2021李宏毅-学习笔记-第一集_第1张图片

  • (1) One unknown parameter: w ∗ w^* w = arg m i n w min_{w} minwL
    • (Randomly) Pick an initial value w 0 w^0 w0
    • Compute ∂ L ∂ w \frac{\partial L}{\partial w} wL ∣ w = w 0 |_{w=w^0} w=w0
    • Update w iteratively.
      w 1 w^1 w1 ← w 0 − η ∂ L ∂ w \leftarrow w^0 - \eta \frac{\partial L}{\partial w} w0ηwL ∣ w = w 0 |_{w = w^0} w=w0
  • (2) Two unknown parameters: w ∗ w^* w, b ∗ b^* b = arg m i n w , b min_{w, b} minw,bL
    • (Randomly) Pick initial values w 0 w^0 w0, b 0 b^0 b0.
    • Compute ∂ L ∂ w \frac{\partial L}{\partial w} wL ∣ w = w 0 , b = b 0 |_{w=w^0, b=b^0} w=w0,b=b0, ∂ L ∂ b \frac{\partial L}{\partial b} bL ∣ w = w 0 , b = b 0 |_{w=w^0, b=b^0} w=w0,b=b0
    • Update w and b iteratively.
      w 1 w^1 w1 ← w 0 − η ∂ L ∂ w \leftarrow w^0 - \eta \frac{\partial L}{\partial w} w0ηwL ∣ w = w 0 , b = b 0 |_{w = w^0, b=b^0} w=w0,b=b0
      b 1 b^1 b1 ← b 0 − η ∂ L ∂ b \leftarrow b^0 - \eta \frac{\partial L}{\partial b} b0ηbL ∣ w = w 0 , b = b 0 |_{w = w^0, b=b^0} w=w0,b=b0

可见,更新步伐受 η \eta η和偏导数影响,其中 η \eta η叫做Learning Rate学习率。
此外,我们更新的时候可能会停留在Local Minima局部最小值,而我们的目标是Global Minima全局最小值。但实际上,在真正的机器学习过程中,我们很难遇到Local minima,这并不成为一个难题。
Hyperparameters超参数: The parameters need ourselves to set(such as η \eta η, sigmoid, batch size and layers.

To Modify the Model

Error Surface: The graph of Loss function.
机器学习2021李宏毅-学习笔记-第一集_第2张图片
这个地方相当有意思,从Error Surface图像可以看出,Views有一个周期性变化,大致为7天,低谷在星期五和星期六,星期天开始回升(之前在微博看到一个投票,看到一个投票,“你认为一周开始于什么时候,结束于什么时候”,高票答案是“始于星期天,结束于星期五”,网友纷纷表示星期天就开始焦虑了。本大学生深有体会,星期天开始赶一周的作业,谁学机器学习呀(李老师语气)。所以为了降低Loss,我们不妨尝试以7天为单位、28天为单位、56天为单位…如下表,我们发现,Loss确实有所下降,但56天和28天的L ′ \prime 几乎无差别。所以并不是越来越好,而应注意设置各种数据,比如说Learning Rate: η \eta η。注意观察图像,善于发现规律或异常,据此修改模型。

Model 2017-2020 2021
y = b + w * x L = 0.48k L ′ \prime = 0.58k
y = b + ∑ j = 1 7 \sum_{j=1}^{7} j=17 w j w_j wj x j x_j xj L = 0.38k L ′ \prime = 0.49k
y = b + ∑ j = 1 28 \sum_{j=1}^{28} j=128 w j w_j wj x j x_j xj L = 0.33k L ′ \prime = 0.46k
y = b + ∑ j = 1 56 \sum_{j=1}^{56} j=156 w j w_j wj x j x_j xj L = 0.32k L ′ \prime = 0.46k

Model Bias

As we can see, linear models are too simple and have severe limitations.
The fact that a model has limitations so it cannot describe the true situation well is called model bias.
We need more sophisticated modes and more flexible models.

资料来源参考

YouTube機器學習2021 Hung-yi Lee

你可能感兴趣的:(学习,机器学习)