决策树

决策树:(decision tree)是一种基本的分类和回归方法。 由结点(node)和有向边(directed edge)组成,结点分为内部节点(internal node)和叶节点(leaf node)。内部结点表示一个特征或属性,叶结点表示一个类。

决策树学习 本质 上是从训练数据集中归纳出一组分类规则。

$$ \dot {x} $$


计算信息熵

#to compute the information entropy
from math import log
def calcShannonEnt(dataSet):
	numEntries = len(dataSet)
	labelCounts = {}
	for featVec in dataSet:
		currentLabel = featVec[-1]
		if currentLabel not in labelCounts.keys():
			labelCounts[currentLabel] = 0
			labelCounts[currentLabel] += 1
	shannonEnt = 0.0
	for key in labelCounts:
		prob = float(labelCounts[key]) / numEntries
		shannonEnt -= prob * log(prob , 2)
	return shannonEnt
def createDataSet():
	dataSet = [[1,1,'yes'],[1,1,'yes'],[1,0,'no'],[0,1,'no'],[0,1,'no']]
	labels = ['no surfacing','flippers']
	return dataSet,labels 
#reload(trees.py)
myDat,labels = createDataSet()
#print myDat

测试代码

#from trees import *
#import trees
import trees

reload(trees)
myDat,labels = trees.createDataSet()
print myDat

print trees.calcShannonEnt(myDat)

myDat[0][-1] = 'maybe'
print trees.calcShannonEnt(myDat)


你可能感兴趣的:(机器学习,自然语言处理,决策树)