学习DecisionTrees

本周比较忙碌,遇上节假日回家,本周学习任务原先还有朴素贝叶斯模块,但是没有完成,下周抓紧补上。好在之前接触过,现在的学习更加偏重原理性,所以有难度,但目前还可以接受。

简单回顾一下本周的学习内容:

1、学习了计算机在推荐方案上的思考模式

recommending apps


decission tree

2、Entropy 熵和计算公式

1


2

3、Imformation Gain信息增益

下面三种分割方法中,那种方法会使我们获得更多有关数据的信息

information gain

信息增益=熵的变化值

在决策树中的每一个节点处,我们可以计算父节点处数据的熵,然后计算两个子节点的熵,父节点的熵与子节点熵的平均值之间的差值即为信息增益。

信息增益的计算公式

在构建决策树的时候,选择得到信息增益最大的方法。

4、Hyperparameters

(1)Maximum depth

(2)Minimum mumber of samples per leaf

(3)Minimum number of samples per split

(4)Maximum number of features

5、Decission Tree in sklearn

>>> from sklearn.tree import DecisionTreeClassifier

>>> model = DecisionTreeClassifier()

>>> model.fit(x_values, y_values)


>>> print(model.predict([ [0.2, 0.8], [0.5, 0.4] ]))

[[ 0., 1.]]

Hyperparameters

When we define the model, we can specify the hyperparameters. In practice, the most common ones are

max_depth: The maximum number of levels in the tree.

min_samples_leaf: The minimum number of samples allowed in a leaf.

min_samples_split: The minimum number of samples required to split an internal node.

max_features : The number of features to consider when looking for the best split.

For example, here we define a model where the maximum depth of the trees max_depth is 7, and the minimum number of elements in each leaf min_samples_leaf is 10.

>>> model = DecisionTreeClassifier(max_depth = 7, min_samples_leaf = 10)

>>>from sklearn.metrics import accuracy_score

>>>acc = accuracy_score(y,y_pred)

你可能感兴趣的:(学习DecisionTrees)