2-7节 k-近邻算法|手写识别系统|机器学习实战-学习笔记

文章原创,最近更新:2018-08-11

本章节的主要内容是:
重点介绍项目案例1: 手写识别系统的完整代码

1.KNN项目案例介绍:

项目案例2:

手写识别系统

项目概述:
  • 构造一个能识别数字 0 到 9 的基于 KNN 分类器的手写数字识别系统。
  • 需要识别的数字是存储在文本文件中的具有相同的色彩和大小:宽高是 32 像素 * 32 像素的黑白图像。
开发流程:
  • 收集数据:提供文本文件。
  • 准备数据:编写函数 img2vector(), 将图像格式转换为分类器使用的向量格式
  • 分析数据:在 Python 命令提示符中检查数据,确保它符合要求
  • 训练算法:此步骤不适用于 KNN
  • 测试算法:编写函数使用提供的部分数据集作为测试样本,测试样本与非测试样本的区别在于测试样本是已经完成分类的数据,如果预测分类与实际类别不同,则标记为一个错误
  • 使用算法:本例没有完成此步骤,若你感兴趣可以构建完整的应用程序,从图像中提取数字,并完成数字识别,美国的邮件分拣系统就是一个实际运行的类似系统
数据集介绍

数据来源于《机器学习实战》第二章 k邻近算法,具体如下:

  • 文件夹trainingDigits 中包含了大约 2000 个例子,每个例子内容如下图所示,每个数字大约有 200 个样本.
    手写数字数据集的例子
  • 文件夹 testDigits 中包含了大约 900 个测试数据。
  • 使用 文件trainingDigits中的数据训练分类器,使用文件 testDigits 中的数据测试分类器的效果.

trainingDigits文件夹中某个文件的内容如下所示:(备注: testDigits文件格式类似,不再展示)

00000000000001111000000000000000
00000000000011111110000000000000
00000000001111111111000000000000
00000001111111111111100000000000
00000001111111011111100000000000
00000011111110000011110000000000
00000011111110000000111000000000
00000011111110000000111100000000
00000011111110000000011100000000
00000011111110000000011100000000
00000011111100000000011110000000
00000011111100000000001110000000
00000011111100000000001110000000
00000001111110000000000111000000
00000001111110000000000111000000
00000001111110000000000111000000
00000001111110000000000111000000
00000011111110000000001111000000
00000011110110000000001111000000
00000011110000000000011110000000
00000001111000000000001111000000
00000001111000000000011111000000
00000001111000000000111110000000
00000001111000000001111100000000
00000000111000000111111000000000
00000000111100011111110000000000
00000000111111111111110000000000
00000000011111111111110000000000
00000000011111111111100000000000
00000000001111111110000000000000
00000000000111110000000000000000
00000000000011000000000000000000

trainingDigits文件夹的文件的存储方式如下所示:(备注: testDigits文件夹存储类似,不再展示)

2.手写识别系统项目

第二章是描述K近邻算法的,算法本质就是寻找距离最近的点,这个距离可以是欧式距离,也可以是其他,这本书采用的就是欧式距离了。K近邻算法主要是用来分类的,比如我新输入一个数据,要判断他属于哪个类别,用这个算法就很合适了,简单实用。

2.1准备数据:将图像转换为测试向量

首先我们创建一个名为kNN.py的文件,然后我们就创建一个函数img2vector(),输入到kNN.py这个文件.

img2vector()这个函数的主要作用是将图像数据转换为向量.

def img2vector(filename):
    """
    将图像数据转换为向量
    
    filename:图片文件,因为我们的输入数据的图片格式是32*32
    return:一维矩阵
    
    该函数将图像转换为向量:该函数创建1*1024的numpy数组,然后打开给定的文件,
    循环读出文件的前32行,并将每行的头32个字符值存储在numpy数组中,最后返回数组.
    """
    returnVect = np.zeros((1,1024))
    fr=open(filename)
    for i in range(32):
        lineStr = fr.readline()
        for j in range(32):
            returnVect[0,32*i+j] = int(lineStr[j])
    return returnVect

测试代码及其结果如下:

import kNN

testVector=kNN.img2vector("testDigits/0_13.txt")

testVector[0,0:31]
Out[8]: 
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.])

testVector[0,32:63]
Out[9]: 
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.]

相关知识点:
知识点1:
文件内容的读取,通过文中的数据集testDigits/0_13.txt进行理解.

  • .read(size=-1)
    读入全部内容,如果给出参数,读入前size长度
fr=open("testDigits/0_13.txt")
for i in range(32):
    lineStr1 = fr.read()
    print("打印第",i,"行",lineStr1)

输出结果如下:

打印第 0 行 00000000000000111100000000000000
00000000000011111110000000000000
 ----------略-------------------
00000000000011111111111000000000
00000000000000111111110000000000

打印第 1 行 
打印第 2 行 
----略-----
打印第 30 行 
打印第 31 行 
  • .readline(size=-1)
    读入一行内容,如果给出参数,读入该行前size长度
fr=open("testDigits/0_13.txt")
for i in range(32):
    lineStr1 = fr.readline()
    print("打印第",i,"行",lineStr1)

输出结果为:

打印第 0 行 00000000000000111100000000000000
打印第 1 行 00000000000011111110000000000000
 -------------------略---------------------
打印第 30 行 00000000000011111111111000000000
打印第 31 行 00000000000000111111110000000000
  • .readlines(hint=-1)
    读入文件所有行,以每行为元素形成列表如果给出参数,读入前hint行
fr=open("testDigits/0_13.txt")
for i in range(32):
    lineStr1 = fr.readlines()
    print("打印第",i,"行",lineStr1)

输出结果为:

打印第 0 行 ['00000000000000111100000000000000\n', '00000000000011111110000000000000\n', '00000000000111111111000000000000\n', '00000000001111111111100000000000\n', '00000000001111111111100000000000\n', '00000000111111111111110000000000\n', '00000000011111111111111000000000\n', '00000000111111110111111000000000\n', '00000001111110000000111100000000\n', '00000001111110000000011100000000\n', '00000011111110000000011100000000\n', '00000011111100000000011100000000\n', '00000011111100000000011100000000\n', '00000001111100000000001111000000\n', '00000001111000000000000111000000\n', '00000011110000000000000111000000\n', '00000011110000000000000111000000\n', '00000011110000000000000111000000\n', '00000011110000000000000111000000\n', '00000011110000000000000111000000\n', '00000000111000000000000011100000\n', '00000000111000000000000011100000\n', '00000000111100000000000111100000\n', '00000000111100000000000111100000\n', '00000000111110000000011111100000\n', '00000000011111000000111111000000\n', '00000000011111111111111110000000\n', '00000000001111111111111111000000\n', '00000000000111111111111111000000\n', '00000000000111111111111110000000\n', '00000000000011111111111000000000\n', '00000000000000111111110000000000\n']
打印第 1 行 []
打印第 2 行 []
 ----略-------
打印第 30 行 []
打印第 31 行 []

2.2测试算法:使用k近邻算法识别手写数字

创建一个函数handwritingClassTest(),输入到kNN.py这个文件.

handwritingClassTest()这个函数的主要作用是手写字体识别模块:训练集和测试集.

def handwritingClassTest():
    """
    手写字体识别模块:训练集和测试集
    """
    #训练集:每个文件中的数据进行识别
    
    # 存放训练集labels
    hwLabels = []
    # 以列表形式获取trainingDigits文件夹所有文件名称
    trainingFileList = listdir('trainingDigits')
    # trainingDigits文件夹所有文件个数
    m = len(trainingFileList)
    # 创建m行1024列,0矩阵
    trainingMat = np.zeros((m,1024))
    for i in range(m):
        # 读取 trainingFileList第i个数据文件名称
        fileNameStr = trainingFileList[i]  
        # split文件,通过识别”.“,[0]代表除去后面的,即txt
        fileStr = fileNameStr.split('.')[0] 
        # split文件,通过识别”_”,[0]除去了0_3后面的序号3,保留0
        classNumStr = int(fileStr.split('_')[0])
        # 存放通过文件名称识别出来的labels
        hwLabels.append(classNumStr)
        # 存放不同标签下的具体数据
        # 调用函数img2vector每行放一个1×1024的向量
        trainingMat[i,:]=img2vector('trainingDigits/%s'\
                                      % fileNameStr)
        
    # 测试集:每个文件中的数据进行识别,得出参考向量inX
    testFileList = listdir('testDigits')
    errorCount = 0.0
    mTest = len(testFileList)
    for i in range(mTest):
        fileNameStr = testFileList[i]
        fileStr = fileNameStr.split('.')[0]
        classNumStr = int(fileStr.split('_')[0])
        vectorUnderTest = img2vector('testDigits/%s' % fileNameStr)
        # 调用k近邻算法
        classifierResult = classify0(vectorUnderTest, trainingMat, hwLabels, 3)
        print("the classifier came back with: %d,the real answer is: %d" % (classifierResult, classNumStr))
        if(classifierResult != classNumStr):errorCount += 1.0
    print("\nthe totao number of errors is: %d" % errorCount)
    print("\nthe total error rate is: %f" % (errorCount/float(mTest)))

测试代码及其结果如下:

>import kNN
>kNN.handwritingClassTest()

the classifier came back with: 0,the real answer is: 0
the classifier came back with: 0,the real answer is: 0
-----------------------略-----------------------------
the classifier came back with: 9,the real answer is: 9
the classifier came back with: 9,the real answer is: 9

the totao number of errors is: 13

the total error rate is: 0.013742

通过以上代码可以知道:
k-近邻算法识别手写数字数据集,错误率为1.3%。

相关知识点:

知识点1:os.listdir(path)
os.listdir() 方法用于返回指定的文件夹包含的文件或文件夹的名字的列表。这个列表以字母顺序。 它不包括 '.' 和'..' 即使它在文件夹中。只支持在 Unix, Windows 下使用。

listdir()方法语法格式:os.listdir(path)

  • 参数:path -- 需要列出的目录路径
  • 返回值:返回指定路径下的文件和文件夹列表。
from os import listdir
path='trainingDigits'
dirs = listdir( path )
print(dirs)

输出结果如下:

['0_0.txt', '0_1.txt', '0_10.txt', '0_100.txt', '0_101.txt', '0_102.txt', '0_103.txt', '0_104.txt', '0_105.txt', '0_106.txt', '0_107.txt', '0_108.txt', '0_109.txt', '0_11.txt', '0_110.txt', '0_111.txt', '0_112.txt', '0_113.txt', '0_114.txt', '0_115.txt', '0_116.txt', '0_117.txt', '0_118.txt', '0_119.txt', '0_12.txt', '0_120.txt', '0_121.txt', '0_122.txt', '0_123.txt', '0_124.txt', '0_125.txt', '0_126.txt', '0_127.txt', '0_128.txt', '0_129.txt', '0_13.txt', '0_130.txt', '0_131.txt', '0_132.txt', '0_133.txt', '0_134.txt', '0_135.txt', '0_136.txt', '0_137.txt', '0_138.txt', '0_139.txt', '0_14.txt', '0_140.txt', '0_141.txt', '0_142.txt', '0_143.txt', '0_144.txt', '0_145.txt', '0_146.txt', '0_147.txt', '0_148.txt', '0_149.txt', '0_15.txt', '0_150.txt', '0_151.txt', '0_152.txt', '0_153.txt', '0_154.txt', '0_155.txt', '0_156.txt', '0_157.txt', '0_158.txt', '0_159.txt', '0_16.txt', '0_160.txt', '0_161.txt', '0_162.txt', '0_163.txt', '0_164.txt', '0_165.txt', '0_166.txt', '0_167.txt', '0_168.txt', '0_169.txt', '0_17.txt', '0_170.txt', '0_171.txt', '0_172.txt', '0_173.txt', '0_174.txt', '0_175.txt', '0_176.txt', '0_177.txt', '0_178.txt', '0_179.txt', '0_18.txt', '0_180.txt', '0_181.txt', '0_182.txt', '0_183.txt', '0_184.txt', '0_185.txt', '0_186.txt', '0_187.txt', '0_188.txt', '0_19.txt', '0_2.txt', '0_20.txt', '0_21.txt', '0_22.txt', '0_23.txt', '0_24.txt', '0_25.txt', '0_26.txt', '0_27.txt', '0_28.txt', '0_29.txt', '0_3.txt', '0_30.txt', '0_31.txt', '0_32.txt', '0_33.txt', '0_34.txt', '0_35.txt', '0_36.txt', '0_37.txt', '0_38.txt', '0_39.txt', '0_4.txt', '0_40.txt', '0_41.txt', '0_42.txt', '0_43.txt', '0_44.txt', '0_45.txt', '0_46.txt', '0_47.txt', '0_48.txt', '0_49.txt', '0_5.txt', '0_50.txt', '0_51.txt', '0_52.txt', '0_53.txt', '0_54.txt', '0_55.txt', '0_56.txt', '0_57.txt', '0_58.txt', '0_59.txt', '0_6.txt', '0_60.txt', '0_61.txt', '0_62.txt', '0_63.txt', '0_64.txt', '0_65.txt', '0_66.txt', '0_67.txt', '0_68.txt', '0_69.txt', '0_7.txt', '0_70.txt', '0_71.txt', '0_72.txt', '0_73.txt', '0_74.txt', '0_75.txt', '0_76.txt', '0_77.txt', '0_78.txt', '0_79.txt', '0_8.txt', '0_80.txt', '0_81.txt', '0_82.txt', '0_83.txt', '0_84.txt', '0_85.txt', '0_86.txt', '0_87.txt', '0_88.txt', '0_89.txt', '0_9.txt', '0_90.txt', '0_91.txt', '0_92.txt', '0_93.txt', '0_94.txt', '0_95.txt', '0_96.txt', '0_97.txt', '0_98.txt', '0_99.txt', '1_0.txt', '1_1.txt', '1_10.txt', '1_100.txt', '1_101.txt', '1_102.txt', '1_103.txt', '1_104.txt', '1_105.txt', '1_106.txt', '1_107.txt', '1_108.txt', '1_109.txt', '1_11.txt', '1_110.txt', '1_111.txt', '1_112.txt', '1_113.txt', '1_114.txt', '1_115.txt', '1_116.txt', '1_117.txt', '1_118.txt', '1_119.txt', '1_12.txt', '1_120.txt', '1_121.txt', '1_122.txt', '1_123.txt', '1_124.txt', '1_125.txt', '1_126.txt', '1_127.txt', '1_128.txt', '1_129.txt', '1_13.txt', '1_130.txt', '1_131.txt', '1_132.txt', '1_133.txt', '1_134.txt', '1_135.txt', '1_136.txt', '1_137.txt', '1_138.txt', '1_139.txt', '1_14.txt', '1_140.txt', '1_141.txt', '1_142.txt', '1_143.txt', '1_144.txt', '1_145.txt', '1_146.txt', '1_147.txt', '1_148.txt', '1_149.txt', '1_15.txt', '1_150.txt', '1_151.txt', '1_152.txt', '1_153.txt', '1_154.txt', '1_155.txt', '1_156.txt', '1_157.txt', '1_158.txt', '1_159.txt', '1_16.txt', '1_160.txt', '1_161.txt', '1_162.txt', '1_163.txt', '1_164.txt', '1_165.txt', '1_166.txt', '1_167.txt', '1_168.txt', '1_169.txt', '1_17.txt', '1_170.txt', '1_171.txt', '1_172.txt', '1_173.txt', '1_174.txt', '1_175.txt', '1_176.txt', '1_177.txt', '1_178.txt', '1_179.txt', '1_18.txt', '1_180.txt', '1_181.txt', '1_182.txt', '1_183.txt', '1_184.txt', '1_185.txt', '1_186.txt', '1_187.txt', '1_188.txt', '1_189.txt', '1_19.txt', '1_190.txt', '1_191.txt', '1_192.txt', '1_193.txt', '1_194.txt', '1_195.txt', '1_196.txt', '1_197.txt', '1_2.txt', '1_20.txt', '1_21.txt', '1_22.txt', '1_23.txt', '1_24.txt', '1_25.txt', '1_26.txt', '1_27.txt', '1_28.txt', '1_29.txt', '1_3.txt', '1_30.txt', '1_31.txt', '1_32.txt', '1_33.txt', '1_34.txt', '1_35.txt', '1_36.txt', '1_37.txt', '1_38.txt', '1_39.txt', '1_4.txt', '1_40.txt', '1_41.txt', '1_42.txt', '1_43.txt', '1_44.txt', '1_45.txt', '1_46.txt', '1_47.txt', '1_48.txt', '1_49.txt', '1_5.txt', '1_50.txt', '1_51.txt', '1_52.txt', '1_53.txt', '1_54.txt', '1_55.txt', '1_56.txt', '1_57.txt', '1_58.txt', '1_59.txt', '1_6.txt', '1_60.txt', '1_61.txt', '1_62.txt', '1_63.txt', '1_64.txt', '1_65.txt', '1_66.txt', '1_67.txt', '1_68.txt', '1_69.txt', '1_7.txt', '1_70.txt', '1_71.txt', '1_72.txt', '1_73.txt', '1_74.txt', '1_75.txt', '1_76.txt', '1_77.txt', '1_78.txt', '1_79.txt', '1_8.txt', '1_80.txt', '1_81.txt', '1_82.txt', '1_83.txt', '1_84.txt', '1_85.txt', '1_86.txt', '1_87.txt', '1_88.txt', '1_89.txt', '1_9.txt', '1_90.txt', '1_91.txt', '1_92.txt', '1_93.txt', '1_94.txt', '1_95.txt', '1_96.txt', '1_97.txt', '1_98.txt', '1_99.txt', '2_0.txt', '2_1.txt', '2_10.txt', '2_100.txt', '2_101.txt', '2_102.txt', '2_103.txt', '2_104.txt', '2_105.txt', '2_106.txt', '2_107.txt', '2_108.txt', '2_109.txt', '2_11.txt', '2_110.txt', '2_111.txt', '2_112.txt', '2_113.txt', '2_114.txt', '2_115.txt', '2_116.txt', '2_117.txt', '2_118.txt', '2_119.txt', '2_12.txt', '2_120.txt', '2_121.txt', '2_122.txt', '2_123.txt', '2_124.txt', '2_125.txt', '2_126.txt', '2_127.txt', '2_128.txt', '2_129.txt', '2_13.txt', '2_130.txt', '2_131.txt', '2_132.txt', '2_133.txt', '2_134.txt', '2_135.txt', '2_136.txt', '2_137.txt', '2_138.txt', '2_139.txt', '2_14.txt', '2_140.txt', '2_141.txt', '2_142.txt', '2_143.txt', '2_144.txt', '2_145.txt', '2_146.txt', '2_147.txt', '2_148.txt', '2_149.txt', '2_15.txt', '2_150.txt', '2_151.txt', '2_152.txt', '2_153.txt', '2_154.txt', '2_155.txt', '2_156.txt', '2_157.txt', '2_158.txt', '2_159.txt', '2_16.txt', '2_160.txt', '2_161.txt', '2_162.txt', '2_163.txt', '2_164.txt', '2_165.txt', '2_166.txt', '2_167.txt', '2_168.txt', '2_169.txt', '2_17.txt', '2_170.txt', '2_171.txt', '2_172.txt', '2_173.txt', '2_174.txt', '2_175.txt', '2_176.txt', '2_177.txt', '2_178.txt', '2_179.txt', '2_18.txt', '2_180.txt', '2_181.txt', '2_182.txt', '2_183.txt', '2_184.txt', '2_185.txt', '2_186.txt', '2_187.txt', '2_188.txt', '2_189.txt', '2_19.txt', '2_190.txt', '2_191.txt', '2_192.txt', '2_193.txt', '2_194.txt', '2_2.txt', '2_20.txt', '2_21.txt', '2_22.txt', '2_23.txt', '2_24.txt', '2_25.txt', '2_26.txt', '2_27.txt', '2_28.txt', '2_29.txt', '2_3.txt', '2_30.txt', '2_31.txt', '2_32.txt', '2_33.txt', '2_34.txt', '2_35.txt', '2_36.txt', '2_37.txt', '2_38.txt', '2_39.txt', '2_4.txt', '2_40.txt', '2_41.txt', '2_42.txt', '2_43.txt', '2_44.txt', '2_45.txt', '2_46.txt', '2_47.txt', '2_48.txt', '2_49.txt', '2_5.txt', '2_50.txt', '2_51.txt', '2_52.txt', '2_53.txt', '2_54.txt', '2_55.txt', '2_56.txt', '2_57.txt', '2_58.txt', '2_59.txt', '2_6.txt', '2_60.txt', '2_61.txt', '2_62.txt', '2_63.txt', '2_64.txt', '2_65.txt', '2_66.txt', '2_67.txt', '2_68.txt', '2_69.txt', '2_7.txt', '2_70.txt', '2_71.txt', '2_72.txt', '2_73.txt', '2_74.txt', '2_75.txt', '2_76.txt', '2_77.txt', '2_78.txt', '2_79.txt', '2_8.txt', '2_80.txt', '2_81.txt', '2_82.txt', '2_83.txt', '2_84.txt', '2_85.txt', '2_86.txt', '2_87.txt', '2_88.txt', '2_89.txt', '2_9.txt', '2_90.txt', '2_91.txt', '2_92.txt', '2_93.txt', '2_94.txt', '2_95.txt', '2_96.txt', '2_97.txt', '2_98.txt', '2_99.txt', '3_0.txt', '3_1.txt', '3_10.txt', '3_100.txt', '3_101.txt', '3_102.txt', '3_103.txt', '3_104.txt', '3_105.txt', '3_106.txt', '3_107.txt', '3_108.txt', '3_109.txt', '3_11.txt', '3_110.txt', '3_111.txt', '3_112.txt', '3_113.txt', '3_114.txt', '3_115.txt', '3_116.txt', '3_117.txt', '3_118.txt', '3_119.txt', '3_12.txt', '3_120.txt', '3_121.txt', '3_122.txt', '3_123.txt', '3_124.txt', '3_125.txt', '3_126.txt', '3_127.txt', '3_128.txt', '3_129.txt', '3_13.txt', '3_130.txt', '3_131.txt', '3_132.txt', '3_133.txt', '3_134.txt', '3_135.txt', '3_136.txt', '3_137.txt', '3_138.txt', '3_139.txt', '3_14.txt', '3_140.txt', '3_141.txt', '3_142.txt', '3_143.txt', '3_144.txt', '3_145.txt', '3_146.txt', '3_147.txt', '3_148.txt', '3_149.txt', '3_15.txt', '3_150.txt', '3_151.txt', '3_152.txt', '3_153.txt', '3_154.txt', '3_155.txt', '3_156.txt', '3_157.txt', '3_158.txt', '3_159.txt', '3_16.txt', '3_160.txt', '3_161.txt', '3_162.txt', '3_163.txt', '3_164.txt', '3_165.txt', '3_166.txt', '3_167.txt', '3_168.txt', '3_169.txt', '3_17.txt', '3_170.txt', '3_171.txt', '3_172.txt', '3_173.txt', '3_174.txt', '3_175.txt', '3_176.txt', '3_177.txt', '3_178.txt', '3_179.txt', '3_18.txt', '3_180.txt', '3_181.txt', '3_182.txt', '3_183.txt', '3_184.txt', '3_185.txt', '3_186.txt', '3_187.txt', '3_188.txt', '3_189.txt', '3_19.txt', '3_190.txt', '3_191.txt', '3_192.txt', '3_193.txt', '3_194.txt', '3_195.txt', '3_196.txt', '3_197.txt', '3_198.txt', '3_2.txt', '3_20.txt', '3_21.txt', '3_22.txt', '3_23.txt', '3_24.txt', '3_25.txt', '3_26.txt', '3_27.txt', '3_28.txt', '3_29.txt', '3_3.txt', '3_30.txt', '3_31.txt', '3_32.txt', '3_33.txt', '3_34.txt', '3_35.txt', '3_36.txt', '3_37.txt', '3_38.txt', '3_39.txt', '3_4.txt', '3_40.txt', '3_41.txt', '3_42.txt', '3_43.txt', '3_44.txt', '3_45.txt', '3_46.txt', '3_47.txt', '3_48.txt', '3_49.txt', '3_5.txt', '3_50.txt', '3_51.txt', '3_52.txt', '3_53.txt', '3_54.txt', '3_55.txt', '3_56.txt', '3_57.txt', '3_58.txt', '3_59.txt', '3_6.txt', '3_60.txt', '3_61.txt', '3_62.txt', '3_63.txt', '3_64.txt', '3_65.txt', '3_66.txt', '3_67.txt', '3_68.txt', '3_69.txt', '3_7.txt', '3_70.txt', '3_71.txt', '3_72.txt', '3_73.txt', '3_74.txt', '3_75.txt', '3_76.txt', '3_77.txt', '3_78.txt', '3_79.txt', '3_8.txt', '3_80.txt', '3_81.txt', '3_82.txt', '3_83.txt', '3_84.txt', '3_85.txt', '3_86.txt', '3_87.txt', '3_88.txt', '3_89.txt', '3_9.txt', '3_90.txt', '3_91.txt', '3_92.txt', '3_93.txt', '3_94.txt', '3_95.txt', '3_96.txt', '3_97.txt', '3_98.txt', '3_99.txt', '4_0.txt', '4_1.txt', '4_10.txt', '4_100.txt', '4_101.txt', '4_102.txt', '4_103.txt', '4_104.txt', '4_105.txt', '4_106.txt', '4_107.txt', '4_108.txt', '4_109.txt', '4_11.txt', '4_110.txt', '4_111.txt', '4_112.txt', '4_113.txt', '4_114.txt', '4_115.txt', '4_116.txt', '4_117.txt', '4_118.txt', '4_119.txt', '4_12.txt', '4_120.txt', '4_121.txt', '4_122.txt', '4_123.txt', '4_124.txt', '4_125.txt', '4_126.txt', '4_127.txt', '4_128.txt', '4_129.txt', '4_13.txt', '4_130.txt', '4_131.txt', '4_132.txt', '4_133.txt', '4_134.txt', '4_135.txt', '4_136.txt', '4_137.txt', '4_138.txt', '4_139.txt', '4_14.txt', '4_140.txt', '4_141.txt', '4_142.txt', '4_143.txt', '4_144.txt', '4_145.txt', '4_146.txt', '4_147.txt', '4_148.txt', '4_149.txt', '4_15.txt', '4_150.txt', '4_151.txt', '4_152.txt', '4_153.txt', '4_154.txt', '4_155.txt', '4_156.txt', '4_157.txt', '4_158.txt', '4_159.txt', '4_16.txt', '4_160.txt', '4_161.txt', '4_162.txt', '4_163.txt', '4_164.txt', '4_165.txt', '4_166.txt', '4_167.txt', '4_168.txt', '4_169.txt', '4_17.txt', '4_170.txt', '4_171.txt', '4_172.txt', '4_173.txt', '4_174.txt', '4_175.txt', '4_176.txt', '4_177.txt', '4_178.txt', '4_179.txt', '4_18.txt', '4_180.txt', '4_181.txt', '4_182.txt', '4_183.txt', '4_184.txt', '4_185.txt', '4_19.txt', '4_2.txt', '4_20.txt', '4_21.txt', '4_22.txt', '4_23.txt', '4_24.txt', '4_25.txt', '4_26.txt', '4_27.txt', '4_28.txt', '4_29.txt', '4_3.txt', '4_30.txt', '4_31.txt', '4_32.txt', '4_33.txt', '4_34.txt', '4_35.txt', '4_36.txt', '4_37.txt', '4_38.txt', '4_39.txt', '4_4.txt', '4_40.txt', '4_41.txt', '4_42.txt', '4_43.txt', '4_44.txt', '4_45.txt', '4_46.txt', '4_47.txt', '4_48.txt', '4_49.txt', '4_5.txt', '4_50.txt', '4_51.txt', '4_52.txt', '4_53.txt', '4_54.txt', '4_55.txt', '4_56.txt', '4_57.txt', '4_58.txt', '4_59.txt', '4_6.txt', '4_60.txt', '4_61.txt', '4_62.txt', '4_63.txt', '4_64.txt', '4_65.txt', '4_66.txt', '4_67.txt', '4_68.txt', '4_69.txt', '4_7.txt', '4_70.txt', '4_71.txt', '4_72.txt', '4_73.txt', '4_74.txt', '4_75.txt', '4_76.txt', '4_77.txt', '4_78.txt', '4_79.txt', '4_8.txt', '4_80.txt', '4_81.txt', '4_82.txt', '4_83.txt', '4_84.txt', '4_85.txt', '4_86.txt', '4_87.txt', '4_88.txt', '4_89.txt', '4_9.txt', '4_90.txt', '4_91.txt', '4_92.txt', '4_93.txt', '4_94.txt', '4_95.txt', '4_96.txt', '4_97.txt', '4_98.txt', '4_99.txt', '5_0.txt', '5_1.txt', '5_10.txt', '5_100.txt', '5_101.txt', '5_102.txt', '5_103.txt', '5_104.txt', '5_105.txt', '5_106.txt', '5_107.txt', '5_108.txt', '5_109.txt', '5_11.txt', '5_110.txt', '5_111.txt', '5_112.txt', '5_113.txt', '5_114.txt', '5_115.txt', '5_116.txt', '5_117.txt', '5_118.txt', '5_119.txt', '5_12.txt', '5_120.txt', '5_121.txt', '5_122.txt', '5_123.txt', '5_124.txt', '5_125.txt', '5_126.txt', '5_127.txt', '5_128.txt', '5_129.txt', '5_13.txt', '5_130.txt', '5_131.txt', '5_132.txt', '5_133.txt', '5_134.txt', '5_135.txt', '5_136.txt', '5_137.txt', '5_138.txt', '5_139.txt', '5_14.txt', '5_140.txt', '5_141.txt', '5_142.txt', '5_143.txt', '5_144.txt', '5_145.txt', '5_146.txt', '5_147.txt', '5_148.txt', '5_149.txt', '5_15.txt', '5_150.txt', '5_151.txt', '5_152.txt', '5_153.txt', '5_154.txt', '5_155.txt', '5_156.txt', '5_157.txt', '5_158.txt', '5_159.txt', '5_16.txt', '5_160.txt', '5_161.txt', '5_162.txt', '5_163.txt', '5_164.txt', '5_165.txt', '5_166.txt', '5_167.txt', '5_168.txt', '5_169.txt', '5_17.txt', '5_170.txt', '5_171.txt', '5_172.txt', '5_173.txt', '5_174.txt', '5_175.txt', '5_176.txt', '5_177.txt', '5_178.txt', '5_179.txt', '5_18.txt', '5_180.txt', '5_181.txt', '5_182.txt', '5_183.txt', '5_184.txt', '5_185.txt', '5_186.txt', '5_19.txt', '5_2.txt', '5_20.txt', '5_21.txt', '5_22.txt', '5_23.txt', '5_24.txt', '5_25.txt', '5_26.txt', '5_27.txt', '5_28.txt', '5_29.txt', '5_3.txt', '5_30.txt', '5_31.txt', '5_32.txt', '5_33.txt', '5_34.txt', '5_35.txt', '5_36.txt', '5_37.txt', '5_38.txt', '5_39.txt', '5_4.txt', '5_40.txt', '5_41.txt', '5_42.txt', '5_43.txt', '5_44.txt', '5_45.txt', '5_46.txt', '5_47.txt', '5_48.txt', '5_49.txt', '5_5.txt', '5_50.txt', '5_51.txt', '5_52.txt', '5_53.txt', '5_54.txt', '5_55.txt', '5_56.txt', '5_57.txt', '5_58.txt', '5_59.txt', '5_6.txt', '5_60.txt', '5_61.txt', '5_62.txt', '5_63.txt', '5_64.txt', '5_65.txt', '5_66.txt', '5_67.txt', '5_68.txt', '5_69.txt', '5_7.txt', '5_70.txt', '5_71.txt', '5_72.txt', '5_73.txt', '5_74.txt', '5_75.txt', '5_76.txt', '5_77.txt', '5_78.txt', '5_79.txt', '5_8.txt', '5_80.txt', '5_81.txt', '5_82.txt', '5_83.txt', '5_84.txt', '5_85.txt', '5_86.txt', '5_87.txt', '5_88.txt', '5_89.txt', '5_9.txt', '5_90.txt', '5_91.txt', '5_92.txt', '5_93.txt', '5_94.txt', '5_95.txt', '5_96.txt', '5_97.txt', '5_98.txt', '5_99.txt', '6_0.txt', '6_1.txt', '6_10.txt', '6_100.txt', '6_101.txt', '6_102.txt', '6_103.txt', '6_104.txt', '6_105.txt', '6_106.txt', '6_107.txt', '6_108.txt', '6_109.txt', '6_11.txt', '6_110.txt', '6_111.txt', '6_112.txt', '6_113.txt', '6_114.txt', '6_115.txt', '6_116.txt', '6_117.txt', '6_118.txt', '6_119.txt', '6_12.txt', '6_120.txt', '6_121.txt', '6_122.txt', '6_123.txt', '6_124.txt', '6_125.txt', '6_126.txt', '6_127.txt', '6_128.txt', '6_129.txt', '6_13.txt', '6_130.txt', '6_131.txt', '6_132.txt', '6_133.txt', '6_134.txt', '6_135.txt', '6_136.txt', '6_137.txt', '6_138.txt', '6_139.txt', '6_14.txt', '6_140.txt', '6_141.txt', '6_142.txt', '6_143.txt', '6_144.txt', '6_145.txt', '6_146.txt', '6_147.txt', '6_148.txt', '6_149.txt', '6_15.txt', '6_150.txt', '6_151.txt', '6_152.txt', '6_153.txt', '6_154.txt', '6_155.txt', '6_156.txt', '6_157.txt', '6_158.txt', '6_159.txt', '6_16.txt', '6_160.txt', '6_161.txt', '6_162.txt', '6_163.txt', '6_164.txt', '6_165.txt', '6_166.txt', '6_167.txt', '6_168.txt', '6_169.txt', '6_17.txt', '6_170.txt', '6_171.txt', '6_172.txt', '6_173.txt', '6_174.txt', '6_175.txt', '6_176.txt', '6_177.txt', '6_178.txt', '6_179.txt', '6_18.txt', '6_180.txt', '6_181.txt', '6_182.txt', '6_183.txt', '6_184.txt', '6_185.txt', '6_186.txt', '6_187.txt', '6_188.txt', '6_189.txt', '6_19.txt', '6_190.txt', '6_191.txt', '6_192.txt', '6_193.txt', '6_194.txt', '6_2.txt', '6_20.txt', '6_21.txt', '6_22.txt', '6_23.txt', '6_24.txt', '6_25.txt', '6_26.txt', '6_27.txt', '6_28.txt', '6_29.txt', '6_3.txt', '6_30.txt', '6_31.txt', '6_32.txt', '6_33.txt', '6_34.txt', '6_35.txt', '6_36.txt', '6_37.txt', '6_38.txt', '6_39.txt', '6_4.txt', '6_40.txt', '6_41.txt', '6_42.txt', '6_43.txt', '6_44.txt', '6_45.txt', '6_46.txt', '6_47.txt', '6_48.txt', '6_49.txt', '6_5.txt', '6_50.txt', '6_51.txt', '6_52.txt', '6_53.txt', '6_54.txt', '6_55.txt', '6_56.txt', '6_57.txt', '6_58.txt', '6_59.txt', '6_6.txt', '6_60.txt', '6_61.txt', '6_62.txt', '6_63.txt', '6_64.txt', '6_65.txt', '6_66.txt', '6_67.txt', '6_68.txt', '6_69.txt', '6_7.txt', '6_70.txt', '6_71.txt', '6_72.txt', '6_73.txt', '6_74.txt', '6_75.txt', '6_76.txt', '6_77.txt', '6_78.txt', '6_79.txt', '6_8.txt', '6_80.txt', '6_81.txt', '6_82.txt', '6_83.txt', '6_84.txt', '6_85.txt', '6_86.txt', '6_87.txt', '6_88.txt', '6_89.txt', '6_9.txt', '6_90.txt', '6_91.txt', '6_92.txt', '6_93.txt', '6_94.txt', '6_95.txt', '6_96.txt', '6_97.txt', '6_98.txt', '6_99.txt', '7_0.txt', '7_1.txt', '7_10.txt', '7_100.txt', '7_101.txt', '7_102.txt', '7_103.txt', '7_104.txt', '7_105.txt', '7_106.txt', '7_107.txt', '7_108.txt', '7_109.txt', '7_11.txt', '7_110.txt', '7_111.txt', '7_112.txt', '7_113.txt', '7_114.txt', '7_115.txt', '7_116.txt', '7_117.txt', '7_118.txt', '7_119.txt', '7_12.txt', '7_120.txt', '7_121.txt', '7_122.txt', '7_123.txt', '7_124.txt', '7_125.txt', '7_126.txt', '7_127.txt', '7_128.txt', '7_129.txt', '7_13.txt', '7_130.txt', '7_131.txt', '7_132.txt', '7_133.txt', '7_134.txt', '7_135.txt', '7_136.txt', '7_137.txt', '7_138.txt', '7_139.txt', '7_14.txt', '7_140.txt', '7_141.txt', '7_142.txt', '7_143.txt', '7_144.txt', '7_145.txt', '7_146.txt', '7_147.txt', '7_148.txt', '7_149.txt', '7_15.txt', '7_150.txt', '7_151.txt', '7_152.txt', '7_153.txt', '7_154.txt', '7_155.txt', '7_156.txt', '7_157.txt', '7_158.txt', '7_159.txt', '7_16.txt', '7_160.txt', '7_161.txt', '7_162.txt', '7_163.txt', '7_164.txt', '7_165.txt', '7_166.txt', '7_167.txt', '7_168.txt', '7_169.txt', '7_17.txt', '7_170.txt', '7_171.txt', '7_172.txt', '7_173.txt', '7_174.txt', '7_175.txt', '7_176.txt', '7_177.txt', '7_178.txt', '7_179.txt', '7_18.txt', '7_180.txt', '7_181.txt', '7_182.txt', '7_183.txt', '7_184.txt', '7_185.txt', '7_186.txt', '7_187.txt', '7_188.txt', '7_189.txt', '7_19.txt', '7_190.txt', '7_191.txt', '7_192.txt', '7_193.txt', '7_194.txt', '7_195.txt', '7_196.txt', '7_197.txt', '7_198.txt', '7_199.txt', '7_2.txt', '7_20.txt', '7_200.txt', '7_21.txt', '7_22.txt', '7_23.txt', '7_24.txt', '7_25.txt', '7_26.txt', '7_27.txt', '7_28.txt', '7_29.txt', '7_3.txt', '7_30.txt', '7_31.txt', '7_32.txt', '7_33.txt', '7_34.txt', '7_35.txt', '7_36.txt', '7_37.txt', '7_38.txt', '7_39.txt', '7_4.txt', '7_40.txt', '7_41.txt', '7_42.txt', '7_43.txt', '7_44.txt', '7_45.txt', '7_46.txt', '7_47.txt', '7_48.txt', '7_49.txt', '7_5.txt', '7_50.txt', '7_51.txt', '7_52.txt', '7_53.txt', '7_54.txt', '7_55.txt', '7_56.txt', '7_57.txt', '7_58.txt', '7_59.txt', '7_6.txt', '7_60.txt', '7_61.txt', '7_62.txt', '7_63.txt', '7_64.txt', '7_65.txt', '7_66.txt', '7_67.txt', '7_68.txt', '7_69.txt', '7_7.txt', '7_70.txt', '7_71.txt', '7_72.txt', '7_73.txt', '7_74.txt', '7_75.txt', '7_76.txt', '7_77.txt', '7_78.txt', '7_79.txt', '7_8.txt', '7_80.txt', '7_81.txt', '7_82.txt', '7_83.txt', '7_84.txt', '7_85.txt', '7_86.txt', '7_87.txt', '7_88.txt', '7_89.txt', '7_9.txt', '7_90.txt', '7_91.txt', '7_92.txt', '7_93.txt', '7_94.txt', '7_95.txt', '7_96.txt', '7_97.txt', '7_98.txt', '7_99.txt', '8_0.txt', '8_1.txt', '8_10.txt', '8_100.txt', '8_101.txt', '8_102.txt', '8_103.txt', '8_104.txt', '8_105.txt', '8_106.txt', '8_107.txt', '8_108.txt', '8_109.txt', '8_11.txt', '8_110.txt', '8_111.txt', '8_112.txt', '8_113.txt', '8_114.txt', '8_115.txt', '8_116.txt', '8_117.txt', '8_118.txt', '8_119.txt', '8_12.txt', '8_120.txt', '8_121.txt', '8_122.txt', '8_123.txt', '8_124.txt', '8_125.txt', '8_126.txt', '8_127.txt', '8_128.txt', '8_129.txt', '8_13.txt', '8_130.txt', '8_131.txt', '8_132.txt', '8_133.txt', '8_134.txt', '8_135.txt', '8_136.txt', '8_137.txt', '8_138.txt', '8_139.txt', '8_14.txt', '8_140.txt', '8_141.txt', '8_142.txt', '8_143.txt', '8_144.txt', '8_145.txt', '8_146.txt', '8_147.txt', '8_148.txt', '8_149.txt', '8_15.txt', '8_150.txt', '8_151.txt', '8_152.txt', '8_153.txt', '8_154.txt', '8_155.txt', '8_156.txt', '8_157.txt', '8_158.txt', '8_159.txt', '8_16.txt', '8_160.txt', '8_161.txt', '8_162.txt', '8_163.txt', '8_164.txt', '8_165.txt', '8_166.txt', '8_167.txt', '8_168.txt', '8_169.txt', '8_17.txt', '8_170.txt', '8_171.txt', '8_172.txt', '8_173.txt', '8_174.txt', '8_175.txt', '8_176.txt', '8_177.txt', '8_178.txt', '8_179.txt', '8_18.txt', '8_19.txt', '8_2.txt', '8_20.txt', '8_21.txt', '8_22.txt', '8_23.txt', '8_24.txt', '8_25.txt', '8_26.txt', '8_27.txt', '8_28.txt', '8_29.txt', '8_3.txt', '8_30.txt', '8_31.txt', '8_32.txt', '8_33.txt', '8_34.txt', '8_35.txt', '8_36.txt', '8_37.txt', '8_38.txt', '8_39.txt', '8_4.txt', '8_40.txt', '8_41.txt', '8_42.txt', '8_43.txt', '8_44.txt', '8_45.txt', '8_46.txt', '8_47.txt', '8_48.txt', '8_49.txt', '8_5.txt', '8_50.txt', '8_51.txt', '8_52.txt', '8_53.txt', '8_54.txt', '8_55.txt', '8_56.txt', '8_57.txt', '8_58.txt', '8_59.txt', '8_6.txt', '8_60.txt', '8_61.txt', '8_62.txt', '8_63.txt', '8_64.txt', '8_65.txt', '8_66.txt', '8_67.txt', '8_68.txt', '8_69.txt', '8_7.txt', '8_70.txt', '8_71.txt', '8_72.txt', '8_73.txt', '8_74.txt', '8_75.txt', '8_76.txt', '8_77.txt', '8_78.txt', '8_79.txt', '8_8.txt', '8_80.txt', '8_81.txt', '8_82.txt', '8_83.txt', '8_84.txt', '8_85.txt', '8_86.txt', '8_87.txt', '8_88.txt', '8_89.txt', '8_9.txt', '8_90.txt', '8_91.txt', '8_92.txt', '8_93.txt', '8_94.txt', '8_95.txt', '8_96.txt', '8_97.txt', '8_98.txt', '8_99.txt', '9_0.txt', '9_1.txt', '9_10.txt', '9_100.txt', '9_101.txt', '9_102.txt', '9_103.txt', '9_104.txt', '9_105.txt', '9_106.txt', '9_107.txt', '9_108.txt', '9_109.txt', '9_11.txt', '9_110.txt', '9_111.txt', '9_112.txt', '9_113.txt', '9_114.txt', '9_115.txt', '9_116.txt', '9_117.txt', '9_118.txt', '9_119.txt', '9_12.txt', '9_120.txt', '9_121.txt', '9_122.txt', '9_123.txt', '9_124.txt', '9_125.txt', '9_126.txt', '9_127.txt', '9_128.txt', '9_129.txt', '9_13.txt', '9_130.txt', '9_131.txt', '9_132.txt', '9_133.txt', '9_134.txt', '9_135.txt', '9_136.txt', '9_137.txt', '9_138.txt', '9_139.txt', '9_14.txt', '9_140.txt', '9_141.txt', '9_142.txt', '9_143.txt', '9_144.txt', '9_145.txt', '9_146.txt', '9_147.txt', '9_148.txt', '9_149.txt', '9_15.txt', '9_150.txt', '9_151.txt', '9_152.txt', '9_153.txt', '9_154.txt', '9_155.txt', '9_156.txt', '9_157.txt', '9_158.txt', '9_159.txt', '9_16.txt', '9_160.txt', '9_161.txt', '9_162.txt', '9_163.txt', '9_164.txt', '9_165.txt', '9_166.txt', '9_167.txt', '9_168.txt', '9_169.txt', '9_17.txt', '9_170.txt', '9_171.txt', '9_172.txt', '9_173.txt', '9_174.txt', '9_175.txt', '9_176.txt', '9_177.txt', '9_178.txt', '9_179.txt', '9_18.txt', '9_180.txt', '9_181.txt', '9_182.txt', '9_183.txt', '9_184.txt', '9_185.txt', '9_186.txt', '9_187.txt', '9_188.txt', '9_189.txt', '9_19.txt', '9_190.txt', '9_191.txt', '9_192.txt', '9_193.txt', '9_194.txt', '9_195.txt', '9_196.txt', '9_197.txt', '9_198.txt', '9_199.txt', '9_2.txt', '9_20.txt', '9_200.txt', '9_201.txt', '9_202.txt', '9_203.txt', '9_21.txt', '9_22.txt', '9_23.txt', '9_24.txt', '9_25.txt', '9_26.txt', '9_27.txt', '9_28.txt', '9_29.txt', '9_3.txt', '9_30.txt', '9_31.txt', '9_32.txt', '9_33.txt', '9_34.txt', '9_35.txt', '9_36.txt', '9_37.txt', '9_38.txt', '9_39.txt', '9_4.txt', '9_40.txt', '9_41.txt', '9_42.txt', '9_43.txt', '9_44.txt', '9_45.txt', '9_46.txt', '9_47.txt', '9_48.txt', '9_49.txt', '9_5.txt', '9_50.txt', '9_51.txt', '9_52.txt', '9_53.txt', '9_54.txt', '9_55.txt', '9_56.txt', '9_57.txt', '9_58.txt', '9_59.txt', '9_6.txt', '9_60.txt', '9_61.txt', '9_62.txt', '9_63.txt', '9_64.txt', '9_65.txt', '9_66.txt', '9_67.txt', '9_68.txt', '9_69.txt', '9_7.txt', '9_70.txt', '9_71.txt', '9_72.txt', '9_73.txt', '9_74.txt', '9_75.txt', '9_76.txt', '9_77.txt', '9_78.txt', '9_79.txt', '9_8.txt', '9_80.txt', '9_81.txt', '9_82.txt', '9_83.txt', '9_84.txt', '9_85.txt', '9_86.txt', '9_87.txt', '9_88.txt', '9_89.txt', '9_9.txt', '9_90.txt', '9_91.txt', '9_92.txt', '9_93.txt', '9_94.txt', '9_95.txt', '9_96.txt', '9_97.txt', '9_98.txt', '9_99.txt']

3.完整的代码:

import numpy as np
from os import listdir
import operator

def img2vector(filename):
    """
    将图像数据转换为向量
    
    filename:图片文件,因为我们的输入数据的图片格式是32*32
    return:一维矩阵
    
    该函数将图像转换为向量:该函数创建1*1024的numpy数组,然后打开给定的文件,
    循环读出文件的前32行,并将每行的头32个字符值存储在numpy数组中,最后返回数组.
    """
    returnVect = np.zeros((1,1024))
    fr=open(filename)
    for i in range(32):
        lineStr = fr.readline()
        for j in range(32):
            returnVect[0,32*i+j] = int(lineStr[j])
    return returnVect

def classify0(inX,dataSet,labels,k):
    """
    inX:用于分类的输入向量
    dataSet:输入的训练样本集
    lables:标签向量
    k:表示用于选择最近邻居的数目
    
    预测数据所在分类可在输入下列命令
    kNN.classify0([0,0], group, labels, 3)
    """
    # array的shape函数返回指定维度的大小,如dataset为n*m的矩阵,
    # 则dataset.shape[0]返回n,dataset.shape[1]返回m,dataset.shape返回n,m
    dataSetSize = dataSet.shape[0]
    # tile函数简单的理解,它的功能是重复某个数组。比如tile(A,n),功能是将数组A重复n次,构成一个新的数组
    # 所以此处tile(inX,(dataSetSize,1))的作用是将inX重复复制dataSetSize次,以便与训练样本集的样本个数一致
    # 减去dataSet就是求出其差值,所以diffMat为一个差值矩阵
    diffmat=np.tile(inX,(dataSetSize,1))-dataSet
    #距离度量,度量公式为欧氏距离
    sqdiffmat=diffmat**2
    # 将矩阵的每一行相加,axis用于控制是行相加还是列相加
    sqdistances=sqdiffmat.sum(axis=1)
    #开方
    distances=sqdistances**0.5
    # 根据距离排序从小到大的排序,返回对应的索引位置
    sortedDistIndicies=distances.argsort()
    # 选择距离最小的k个点
    classcount={}
   
    for i in range(k):
        # 找到该样本标签的类型
        voteIlabel=labels[sortedDistIndicies[i]]
        # 字典的get方法,list.get(k,d) 其中 get相当于一条if...else...语句,参数k在字典中,字典将返回list[k];如果参数k不在字典中则返回参数d
        classcount[voteIlabel]=classcount.get(voteIlabel,0)+1
        # 字典的 items() 方法,以列表返回可遍历的(键,值)元组数组。
        # sorted 中的第2个参数 key=operator.itemgetter(1) 这个参数的意思是先比较第几个元素
        sortedClasscount = sorted(classcount.items(),key=operator.itemgetter(1),reverse=True)
        # 返回最符合的标签
        return sortedClasscount[0][0]

def handwritingClassTest():
    """
    手写字体识别模块:训练集和测试集
    """
    #训练集:每个文件中的数据进行识别
    
    # 存放训练集labels
    hwLabels = []
    # 以列表形式获取trainingDigits文件夹所有文件名称
    trainingFileList = listdir('trainingDigits')
    # trainingDigits文件夹所有文件个数
    m = len(trainingFileList)
    # 创建m行1024列,0矩阵
    trainingMat = np.zeros((m,1024))
    for i in range(m):
        # 读取 trainingFileList第i个数据文件名称
        fileNameStr = trainingFileList[i]  
        # split文件,通过识别”.“,[0]代表除去后面的,即txt
        fileStr = fileNameStr.split('.')[0] 
        # split文件,通过识别”_”,[0]除去了0_3后面的序号3,保留0
        classNumStr = int(fileStr.split('_')[0])
        # 存放通过文件名称识别出来的labels
        hwLabels.append(classNumStr)
        # 存放不同标签下的具体数据
        # 调用函数img2vector每行放一个1×1024的向量
        trainingMat[i,:]=img2vector('trainingDigits/%s'\
                                      % fileNameStr)
        
    # 测试集:每个文件中的数据进行识别,得出参考向量inX
    testFileList = listdir('testDigits')
    errorCount = 0.0
    mTest = len(testFileList)
    for i in range(mTest):
        fileNameStr = testFileList[i]
        fileStr = fileNameStr.split('.')[0]
        classNumStr = int(fileStr.split('_')[0])
        vectorUnderTest = img2vector('testDigits/%s' % fileNameStr)
        # 调用k近邻算法
        classifierResult = classify0(vectorUnderTest, trainingMat, hwLabels, 3)
        print("the classifier came back with: %d,the real answer is: %d" % (classifierResult, classNumStr))
        if(classifierResult != classNumStr):errorCount += 1.0
    print("\nthe totao number of errors is: %d" % errorCount)
    print("\nthe total error rate is: %f" % (errorCount/float(mTest)))

你可能感兴趣的:(2-7节 k-近邻算法|手写识别系统|机器学习实战-学习笔记)