Programming Exercise 7:K-means Clustering and Principal Component Analysis (第二部分PCA)

        大家好,我是Mac Jiang,今天和大家分享Coursera-Stanford University-Machine Learning-Programming Exercise 7:K-means Clustering and Principal Component Analysis的第二部分PCA(主成分分析)的实现过程。第一部分的实现我已经在前面微博中展现,地址为:http://blog.csdn.net/a1015553840/article/details/50877623。第二部分讲的主要是PCA的实现过程,我写的代码虽然是正确的,但不一定是最好的,如果你有更好的想法,请留言讨论学习。当然,欢迎博友转载此文章,但转载前请标明出处,谢谢!


        PCA(主成分分析)是一种降低数据维度的算法,降低维度的好处主要有三个:(1)减少存储空间;(2)减小数据量,提高算法速度;(3)将数据降到3D,2D,可视化数据

        这次实验过程主要可以分为两大部分:

        (1)第一部分主要是展现PCA理论和实现过程

        Programming Exercise 7:K-means Clustering and Principal Component Analysis (第二部分PCA)_第1张图片                       Programming Exercise 7:K-means Clustering and Principal Component Analysis (第二部分PCA)_第2张图片

        (2)第二部分主要是利用PCA实现人脸图像的降维工作,并利用得到的低维度数据重建高纬度图像

         Programming Exercise 7:K-means Clustering and Principal Component Analysis (第二部分PCA)_第3张图片       Programming Exercise 7:K-means Clustering and Principal Component Analysis (第二部分PCA)_第4张图片


数据集:ex7data1.mat---实现实验第一部分PCA理论实现的数据

              ex7face.mat---实现实验第二部分人脸图像降维的数据

    函数:ex7_pca.m---pca实现过程的控制函数,控制程序的进行过程

              pca.m---PCA的实现文件,需要完善代码!

              projectData.m---利用得到的PCA参数把n维的数据降为k维的数据,需要完善代码!

              recoverDate.m---利用得到的低维(K维)数据重建成n维数据,需要完善代码!

              实验主要用到了这4个文件,其中需要完善代码的文件有三个。


        PCA将数据从n维降低到k维的过程为:

            (1)数据预处理,特征缩放+均值归一。

               均值归一(mean normalization)作用:求数据的平均值mu,用每个数据减去平均值x - mu,使数据在0附近

               特征缩放(Feature Scaling)作用:由于不同特征值得数量级相差很大,需要将他们缩放到同一数量级,最好在[-1,1]区间内

             (2)计算协方差矩阵:sigma = 1/m * X' * X

             (3)计算协方差矩阵sigma的特征向量:[U,S,V] = svd(sigma)。这里U是特征向量矩阵,降低到K维就区向量的前K列即可,记U_reduce = U(:,1:k);S为对角线矩阵,用于计算保留的差异性。

             (4) 将数据降到K维:Z = X * U_reduce

             (5)将数据重建到n维: X_approx = Z * U_reduce’

          注意,得到的U的每个列向量都是协方差矩阵sigma的特征向量,特征向量矩阵有个性质:X' * X = E,  X * X' = E


1.ex7_pca.m每个部分功能讲解

%% Initialization
clear ; close all; clc
%% ================== Part 1: Load Example Dataset  ===================
%  We start this exercise by using a small dataset that is easily to
%  visualize
%
fprintf('Visualizing example dataset for PCA.\n\n');
%  The following command loads the dataset. You should now have the 
%  variable X in your environment
load ('ex7data1.mat');

%  Visualize the example dataset
plot(X(:, 1), X(:, 2), 'bo');
axis([0.5 6.5 2 8]); axis square;
fprintf('Program paused. Press enter to continue.\n');
pause;

%% =============== Part 2: Principal Component Analysis ===============
%  You should now implement PCA, a dimension reduction technique. You
%  should complete the code in pca.m
%
fprintf('\nRunning PCA on example dataset.\n\n');
%  Before running PCA, it is important to first normalize X
[X_norm, mu, sigma] = featureNormalize(X);      <span style="color:#ff0000;">%特征缩放和均值归一化</span>
%  Run PCA
[U, S] = pca(X_norm);
%  Compute mu, the mean of the each feature

%  Draw the eigenvectors centered at mean of data. These lines show the
%  directions of maximum variations in the dataset.
hold on;
drawLine(mu, mu + 1.5 * S(1,1) * U(:,1)', '-k', 'LineWidth', 2);
drawLine(mu, mu + 1.5 * S(2,2) * U(:,2)', '-k', 'LineWidth', 2);
hold off;

fprintf('Top eigenvector: \n');
fprintf(' U(:,1) = %f %f \n', U(1,1), U(2,1));
fprintf('\n(you should expect to see -0.707107 -0.707107)\n');

fprintf('Program paused. Press enter to continue.\n');
pause;

%% =================== Part 3: Dimension Reduction ===================
%  You should now implement the projection step to map the data onto the 
%  first k eigenvectors. The code will then plot the data in this reduced 
%  dimensional space.  This will show you what the data looks like when 
%  using only the corresponding eigenvectors to reconstruct it.
%
%  You should complete the code in projectData.m
%
fprintf('\nDimension reduction on example dataset.\n\n');

%  Plot the normalized dataset (returned from pca)
plot(X_norm(:, 1), X_norm(:, 2), 'bo');
axis([-4 3 -4 3]); axis square

%  Project the data onto K = 1 dimension
K = 1;
Z = projectData(X_norm, U, K);
fprintf('Projection of the first example: %f\n', Z(1));
fprintf('\n(this value should be about 1.481274)\n\n');

X_rec  = recoverData(Z, U, K);
fprintf('Approximation of the first example: %f %f\n', X_rec(1, 1), X_rec(1, 2));
fprintf('\n(this value should be about  -1.047419 -1.047419)\n\n');

%  Draw lines connecting the projected points to the original points
hold on;
plot(X_rec(:, 1), X_rec(:, 2), 'ro');
for i = 1:size(X_norm, 1)
    drawLine(X_norm(i,:), X_rec(i,:), '--k', 'LineWidth', 1);
end
hold off
fprintf('Program paused. Press enter to continue.\n');
pause;

%% =============== Part 4: Loading and Visualizing Face Data =============
%  We start the exercise by first loading and visualizing the dataset.
%  The following code will load the dataset into your environment
%
fprintf('\nLoading face dataset.\n\n');

%  Load Face dataset
load ('ex7faces.mat')

%  Display the first 100 faces in the dataset
displayData(X(1:100, :));

fprintf('Program paused. Press enter to continue.\n');
pause;

%% =========== Part 5: PCA on Face Data: Eigenfaces  ===================
%  Run PCA and visualize the eigenvectors which are in this case eigenfaces
%  We display the first 36 eigenfaces.
%
fprintf(['\nRunning PCA on face dataset.\n' ...
         '(this mght take a minute or two ...)\n\n']);

%  Before running PCA, it is important to first normalize X by subtracting 
%  the mean value from each feature
[X_norm, mu, sigma] = featureNormalize(X);

%  Run PCA
[U, S] = pca(X_norm);

%  Visualize the top 36 eigenvectors found
displayData(U(:, 1:36)');            <span style="color:#ff0000;">%将32*32=1024维人脸图像降为36维</span>

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ============= Part 6: Dimension Reduction for Faces =================
%  Project images to the eigen space using the top k eigenvectors 
%  If you are applying a machine learning algorithm 
fprintf('\nDimension reduction for face dataset.\n\n');

K = 100;
Z = projectData(X_norm, U, K);

fprintf('The projected data Z has a size of: ')
fprintf('%d ', size(Z));

fprintf('\n\nProgram paused. Press enter to continue.\n');
pause;

%% ==== Part 7: Visualization of Faces after PCA Dimension Reduction ====
%  Project images to the eigen space using the top K eigen vectors and 
%  visualize only using those K dimensions
%  Compare to the original input, which is also displayed

fprintf('\nVisualizing the projected (reduced dimension) faces.\n\n');

K = 100;
X_rec  = recoverData(Z, U, K);    <span style="color:#ff0000;">%将36为数据重建为1024维人脸图像</span>

% Display normalized data
subplot(1, 2, 1);
displayData(X_norm(1:100,:));
title('Original faces');
axis square;

% Display reconstructed data from only k eigenfaces
subplot(1, 2, 2);
displayData(X_rec(1:100,:));
title('Recovered faces');
axis square;

fprintf('Program paused. Press enter to continue.\n');
pause;

        Part1:Load Example Dateset---实现PCA的实现样本导入并画出可视化图

        Part2:Principal Componet Analysis---先对数据进行预处理(特征缩放,均值归一)在计算PCA,得到U_reduce,S

        Part3:Dimension Reduction ---利用得到的U_reduce对数据进行降维(此处为2D降低到1D),并画出降维后的图像

        Part4:Loading and Visualizing Face Data---导入人脸数据,并可视化人脸,每张人脸为32*32 = 1024维

        Part5:PCA on Face Data:Eigenfaces---利用已经完善的PCA算法将1024维的人脸降低为36维,并画出此时的图像

        Part6:Dimension Reduction for Faces---利用已得到的36维数据和U_reduce,重建1024维图像

        Part7:Visualization of faces after PCA Dimension Reduction---做出原图和重建图像的对比图,便于区分它们之间的区别

2.PCA.m的实现过程

function [U, S] = pca(X)
%PCA Run principal component analysis on the dataset X
%   [U, S, X] = pca(X) computes eigenvectors of the covariance matrix of X
%   Returns the eigenvectors U, the eigenvalues (on diagonal) in S
%
% Useful values
[m, n] = size(X);
% You need to return the following variables correctly.
U = zeros(n);
S = zeros(n);
% ====================== YOUR CODE HERE ======================
% Instructions: You should first compute the covariance matrix. Then, you
%               should use the "svd" function to compute the eigenvectors
%               and eigenvalues of the covariance matrix. 
%
% Note: When computing the covariance matrix, remember to divide by m (the
%       number of examples).
%
sigma = X' * X / m;     %计算协方差矩阵
[U,S,V] = svd(sigma);   %利用SVD函数计算降维后的特征向量集U和对角矩阵S
% =========================================================================
end
3.projectData.m实现过程
function Z = projectData(X, U, K)
%PROJECTDATA Computes the reduced data representation when projecting only 
%on to the top k eigenvectors
%   Z = projectData(X, U, K) computes the projection of 
%   the normalized inputs X into the reduced dimensional space spanned by
%   the first K columns of U. It returns the projected examples in Z.
%

% You need to return the following variables correctly.
Z = zeros(size(X, 1), K);

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the projection of the data using only the top K 
%               eigenvectors in U (first K columns). 
%               For the i-th example X(i,:), the projection on to the k-th 
%               eigenvector is given as follows:
%                    x = X(i, :)';
%                    projection_k = x' * U(:, k);
%
Z = X * U(:,1:K);%计算X在新维度下的表示Z
% =============================================================
end
4.recoverData.m的实现过程
function X_rec = recoverData(Z, U, K)
%RECOVERDATA Recovers an approximation of the original data when using the 
%projected data
%   X_rec = RECOVERDATA(Z, U, K) recovers an approximation the 
%   original data that has been reduced to K dimensions. It returns the
%   approximate reconstruction in X_rec.
%

% You need to return the following variables correctly.
X_rec = zeros(size(Z, 1), size(U, 1));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the approximation of the data by projecting back
%               onto the original space using the top K eigenvectors in U.
%
%               For the i-th example Z(i,:), the (approximate)
%               recovered data for dimension j is given as follows:
%                    v = Z(i, :)';
%                    recovered_j = v' * U(j, 1:K)';
%
%               Notice that U(j, 1:K) is a row vector.
%               
X_rec = Z * U(:,1:K)'; %重建X,把X从K维度重建为N维度
% =============================================================
end



From:http://blog.csdn.net/a1015553840/article/details/50879343


你可能感兴趣的:(programming,learning,machine,pca,Coursera,exercise,人脸图像降维)