机器学习(3.2)--PCA降维鸢尾花数据降维演示

PCA(Principal components analysis)也称主成分分析,是机器学习中降维的一种方法
本例使用数据集简介:以鸢尾花的特征作为数据,共有数据集包含150个数据集,
分为3类setosa(山鸢尾), versicolor(变色鸢尾), virginica(维吉尼亚鸢尾)
每类50个数据,每条数据包含4个属性数据 和 一个类别数据.

本例通过这150个数据来演示降维后的最维效果,
因为每个鸢尾花的特征数据有4个属性,我们想看这150点的分布情况没办法绘制图像
因此我们可以通过PCA降维,4维属性降为2维,就可以在二维平面上表示出来。
在最后plt出图时点的分布的界线还是比较清晰的,其实最后的这个二维平面散点图,也可以帮助理解KNN算法

数据比较多,不太容易看出数据差别
如果想从数据变化上理解PCA降维,及更详细的PCA计算流程
点击查看 机器学习(3.1)--PCA降维基本原理

同时有另一篇文章同样使用鸢尾花的特征作为数据,实现邻近算法(KNN)

点击查看  机器学习(2)--邻近算法(KNN)

# -*- coding:utf-8 -*-  
data='''5.1,3.5,1.4,0.2,Iris-setosa
        4.9,3.0,1.4,0.2,Iris-setosa
        4.7,3.2,1.3,0.2,Iris-setosa
        4.6,3.1,1.5,0.2,Iris-setosa
        5.0,3.6,1.4,0.2,Iris-setosa
        5.4,3.9,1.7,0.4,Iris-setosa
        4.6,3.4,1.4,0.3,Iris-setosa
        5.0,3.4,1.5,0.2,Iris-setosa
        4.4,2.9,1.4,0.2,Iris-setosa
        4.9,3.1,1.5,0.1,Iris-setosa
        5.4,3.7,1.5,0.2,Iris-setosa
        4.8,3.4,1.6,0.2,Iris-setosa
        4.8,3.0,1.4,0.1,Iris-setosa
        4.3,3.0,1.1,0.1,Iris-setosa
        5.8,4.0,1.2,0.2,Iris-setosa
        5.7,4.4,1.5,0.4,Iris-setosa
        5.4,3.9,1.3,0.4,Iris-setosa
        5.1,3.5,1.4,0.3,Iris-setosa
        5.7,3.8,1.7,0.3,Iris-setosa
        5.1,3.8,1.5,0.3,Iris-setosa
        5.4,3.4,1.7,0.2,Iris-setosa
        5.1,3.7,1.5,0.4,Iris-setosa
        4.6,3.6,1.0,0.2,Iris-setosa
        5.1,3.3,1.7,0.5,Iris-setosa
        4.8,3.4,1.9,0.2,Iris-setosa
        5.0,3.0,1.6,0.2,Iris-setosa
        5.0,3.4,1.6,0.4,Iris-setosa
        5.2,3.5,1.5,0.2,Iris-setosa
        5.2,3.4,1.4,0.2,Iris-setosa
        4.7,3.2,1.6,0.2,Iris-setosa
        4.8,3.1,1.6,0.2,Iris-setosa
        5.4,3.4,1.5,0.4,Iris-setosa
        5.2,4.1,1.5,0.1,Iris-setosa
        5.5,4.2,1.4,0.2,Iris-setosa
        4.9,3.1,1.5,0.1,Iris-setosa
        5.0,3.2,1.2,0.2,Iris-setosa
        5.5,3.5,1.3,0.2,Iris-setosa
        4.9,3.1,1.5,0.1,Iris-setosa
        4.4,3.0,1.3,0.2,Iris-setosa
        5.1,3.4,1.5,0.2,Iris-setosa
        5.0,3.5,1.3,0.3,Iris-setosa
        4.5,2.3,1.3,0.3,Iris-setosa
        4.4,3.2,1.3,0.2,Iris-setosa
        5.0,3.5,1.6,0.6,Iris-setosa
        5.1,3.8,1.9,0.4,Iris-setosa
        4.8,3.0,1.4,0.3,Iris-setosa
        5.1,3.8,1.6,0.2,Iris-setosa
        4.6,3.2,1.4,0.2,Iris-setosa
        5.3,3.7,1.5,0.2,Iris-setosa
        5.0,3.3,1.4,0.2,Iris-setosa
        7.0,3.2,4.7,1.4,Iris-versicolor
        6.4,3.2,4.5,1.5,Iris-versicolor
        6.9,3.1,4.9,1.5,Iris-versicolor
        5.5,2.3,4.0,1.3,Iris-versicolor
        6.5,2.8,4.6,1.5,Iris-versicolor
        5.7,2.8,4.5,1.3,Iris-versicolor
        6.3,3.3,4.7,1.6,Iris-versicolor
        4.9,2.4,3.3,1.0,Iris-versicolor
        6.6,2.9,4.6,1.3,Iris-versicolor
        5.2,2.7,3.9,1.4,Iris-versicolor
        5.0,2.0,3.5,1.0,Iris-versicolor
        5.9,3.0,4.2,1.5,Iris-versicolor
        6.0,2.2,4.0,1.0,Iris-versicolor
        6.1,2.9,4.7,1.4,Iris-versicolor
        5.6,2.9,3.6,1.3,Iris-versicolor
        6.7,3.1,4.4,1.4,Iris-versicolor
        5.6,3.0,4.5,1.5,Iris-versicolor
        5.8,2.7,4.1,1.0,Iris-versicolor
        6.2,2.2,4.5,1.5,Iris-versicolor
        5.6,2.5,3.9,1.1,Iris-versicolor
        5.9,3.2,4.8,1.8,Iris-versicolor
        6.1,2.8,4.0,1.3,Iris-versicolor
        6.3,2.5,4.9,1.5,Iris-versicolor
        6.1,2.8,4.7,1.2,Iris-versicolor
        6.4,2.9,4.3,1.3,Iris-versicolor
        6.6,3.0,4.4,1.4,Iris-versicolor
        6.8,2.8,4.8,1.4,Iris-versicolor
        6.7,3.0,5.0,1.7,Iris-versicolor
        6.0,2.9,4.5,1.5,Iris-versicolor
        5.7,2.6,3.5,1.0,Iris-versicolor
        5.5,2.4,3.8,1.1,Iris-versicolor
        5.5,2.4,3.7,1.0,Iris-versicolor
        5.8,2.7,3.9,1.2,Iris-versicolor
        6.0,2.7,5.1,1.6,Iris-versicolor
        5.4,3.0,4.5,1.5,Iris-versicolor
        6.0,3.4,4.5,1.6,Iris-versicolor
        6.7,3.1,4.7,1.5,Iris-versicolor
        6.3,2.3,4.4,1.3,Iris-versicolor
        5.6,3.0,4.1,1.3,Iris-versicolor
        5.5,2.5,4.0,1.3,Iris-versicolor
        5.5,2.6,4.4,1.2,Iris-versicolor
        6.1,3.0,4.6,1.4,Iris-versicolor
        5.8,2.6,4.0,1.2,Iris-versicolor
        5.0,2.3,3.3,1.0,Iris-versicolor
        5.6,2.7,4.2,1.3,Iris-versicolor
        5.7,3.0,4.2,1.2,Iris-versicolor
        5.7,2.9,4.2,1.3,Iris-versicolor
        6.2,2.9,4.3,1.3,Iris-versicolor
        5.1,2.5,3.0,1.1,Iris-versicolor
        5.7,2.8,4.1,1.3,Iris-versicolor
        6.3,3.3,6.0,2.5,Iris-virginica
        5.8,2.7,5.1,1.9,Iris-virginica
        7.1,3.0,5.9,2.1,Iris-virginica
        6.3,2.9,5.6,1.8,Iris-virginica
        6.5,3.0,5.8,2.2,Iris-virginica
        7.6,3.0,6.6,2.1,Iris-virginica
        4.9,2.5,4.5,1.7,Iris-virginica
        7.3,2.9,6.3,1.8,Iris-virginica
        6.7,2.5,5.8,1.8,Iris-virginica
        7.2,3.6,6.1,2.5,Iris-virginica
        6.5,3.2,5.1,2.0,Iris-virginica
        6.4,2.7,5.3,1.9,Iris-virginica
        6.8,3.0,5.5,2.1,Iris-virginica
        5.7,2.5,5.0,2.0,Iris-virginica
        5.8,2.8,5.1,2.4,Iris-virginica
        6.4,3.2,5.3,2.3,Iris-virginica
        6.5,3.0,5.5,1.8,Iris-virginica
        7.7,3.8,6.7,2.2,Iris-virginica
        7.7,2.6,6.9,2.3,Iris-virginica
        6.0,2.2,5.0,1.5,Iris-virginica
        6.9,3.2,5.7,2.3,Iris-virginica
        5.6,2.8,4.9,2.0,Iris-virginica
        7.7,2.8,6.7,2.0,Iris-virginica
        6.3,2.7,4.9,1.8,Iris-virginica
        6.7,3.3,5.7,2.1,Iris-virginica
        7.2,3.2,6.0,1.8,Iris-virginica
        6.2,2.8,4.8,1.8,Iris-virginica
        6.1,3.0,4.9,1.8,Iris-virginica
        6.4,2.8,5.6,2.1,Iris-virginica
        7.2,3.0,5.8,1.6,Iris-virginica
        7.4,2.8,6.1,1.9,Iris-virginica
        7.9,3.8,6.4,2.0,Iris-virginica
        6.4,2.8,5.6,2.2,Iris-virginica
        6.3,2.8,5.1,1.5,Iris-virginica
        6.1,2.6,5.6,1.4,Iris-virginica
        7.7,3.0,6.1,2.3,Iris-virginica
        6.3,3.4,5.6,2.4,Iris-virginica
        6.4,3.1,5.5,1.8,Iris-virginica
        6.0,3.0,4.8,1.8,Iris-virginica
        6.9,3.1,5.4,2.1,Iris-virginica
        6.7,3.1,5.6,2.4,Iris-virginica
        6.9,3.1,5.1,2.3,Iris-virginica
        5.8,2.7,5.1,1.9,Iris-virginica
        6.8,3.2,5.9,2.3,Iris-virginica
        6.7,3.3,5.7,2.5,Iris-virginica
        6.7,3.0,5.2,2.3,Iris-virginica
        6.3,2.5,5.0,1.9,Iris-virginica
        6.5,3.0,5.2,2.0,Iris-virginica
        6.2,3.4,5.4,2.3,Iris-virginica
        5.9,3.0,5.1,1.8,Iris-virginica'''

import numpy as np
import matplotlib.pyplot as plt
data=data.replace(' ','').split('\n')
data=list(filter(lambda x: len(x)>0,data))
data=[x.split(',')[0:4] for x in data]#第五个数据是类别数据.在这里没有作用,直接舍弃
data=np.array(data).astype(np.float16)

#1、求平均值(dataAvg)
dataAvg=np.average(data,axis=0)
#2、求每个值与平均值的差(dataAdjust)
dataAdjust=dataAvg-data


#3、求协方差矩阵
#计算任意两组数据协方差函数,
def covariance(getDataAdjust,index1,index2):
    x=getDataAdjust[:,index1:index1+1]
    y=getDataAdjust[:,index2:index2+1]
    n=x.shape[0]
    return (x*y).sum()/(n-1)

CovMatrix=[
            [covariance(dataAdjust,0,0),covariance(dataAdjust,0,1),covariance(dataAdjust,0,2),covariance(dataAdjust,0,3)]
           ,[covariance(dataAdjust,1,0),covariance(dataAdjust,1,1),covariance(dataAdjust,1,2),covariance(dataAdjust,1,3)]
           ,[covariance(dataAdjust,2,0),covariance(dataAdjust,2,1),covariance(dataAdjust,2,2),covariance(dataAdjust,2,3)]
           ,[covariance(dataAdjust,3,0),covariance(dataAdjust,3,1),covariance(dataAdjust,3,2),covariance(dataAdjust,3,3)]
           ]

#4、求协方差矩阵的特征值与特征向量
e1,e2=np.linalg.eig(CovMatrix)
print('------------------------')
print('协方差矩阵的特征值:')
print(e1)#[4.22396988 0.24215651 0.07857844 0.02377251]
print('------------------------')
print('协方差矩阵的特征向量:')
print(e2)

#5、降维
#我们要降到2维,在特值中,4.22396988 0.24215651 为最大的两个,因此我们特片向量的前两列
finalX=e2[:,0:1]
finalY=e2[:,1:2]

#做矩阵乘法,
finalDataX=np.matmul(dataAdjust,e2[:,0:1])
finalDataY=np.matmul(dataAdjust,e2[:,1:2])


#6、绘制散点图
# 用于定义X,Y轴的范围
plt.axis([finalDataX.min().round(),finalDataX.max().round() 
          ,finalDataY.min().round(),finalDataY.max().round()])
plt.scatter(finalDataX[0:50],finalDataY[0:50], c='r') #前50个数据setosa(山鸢尾)的散点图,用红色表示
plt.scatter(finalDataX[50:100],finalDataY[50:100], c='g') #中50个数据versicolor(变色鸢尾)的散点图,用绿色表示
plt.scatter(finalDataX[100:],finalDataY[100:], c='b') #后50个数据virginica(维吉尼亚鸢尾)的散点图,用蓝色表示
plt.show()


你可能感兴趣的:(python,机器学习)