大家好,我是Mac Jiang,今天和大家分享Coursera-Stanford University-Machine Learning-Programming Exercise 7:K-means Clustering and Principal Component Analysis的第二部分PCA(主成分分析)的实现过程。第一部分的实现我已经在前面微博中展现,地址为:http://blog.csdn.net/a1015553840/article/details/50877623。第二部分讲的主要是PCA的实现过程,我写的代码虽然是正确的,但不一定是最好的,如果你有更好的想法,请留言讨论学习。当然,欢迎博友转载此文章,但转载前请标明出处,谢谢!
PCA(主成分分析)是一种降低数据维度的算法,降低维度的好处主要有三个:(1)减少存储空间;(2)减小数据量,提高算法速度;(3)将数据降到3D,2D,可视化数据
这次实验过程主要可以分为两大部分:
(1)第一部分主要是展现PCA理论和实现过程
(2)第二部分主要是利用PCA实现人脸图像的降维工作,并利用得到的低维度数据重建高纬度图像
数据集:ex7data1.mat---实现实验第一部分PCA理论实现的数据
ex7face.mat---实现实验第二部分人脸图像降维的数据
函数:ex7_pca.m---pca实现过程的控制函数,控制程序的进行过程
pca.m---PCA的实现文件,需要完善代码!
projectData.m---利用得到的PCA参数把n维的数据降为k维的数据,需要完善代码!
recoverDate.m---利用得到的低维(K维)数据重建成n维数据,需要完善代码!
实验主要用到了这4个文件,其中需要完善代码的文件有三个。
PCA将数据从n维降低到k维的过程为:
(1)数据预处理,特征缩放+均值归一。
均值归一(mean normalization)作用:求数据的平均值mu,用每个数据减去平均值x - mu,使数据在0附近
特征缩放(Feature Scaling)作用:由于不同特征值得数量级相差很大,需要将他们缩放到同一数量级,最好在[-1,1]区间内
(2)计算协方差矩阵:sigma = 1/m * X' * X
(3)计算协方差矩阵sigma的特征向量:[U,S,V] = svd(sigma)。这里U是特征向量矩阵,降低到K维就区向量的前K列即可,记U_reduce = U(:,1:k);S为对角线矩阵,用于计算保留的差异性。
(4) 将数据降到K维:Z = X * U_reduce
(5)将数据重建到n维: X_approx = Z * U_reduce’
注意,得到的U的每个列向量都是协方差矩阵sigma的特征向量,特征向量矩阵有个性质:X' * X = E, X * X' = E
1.ex7_pca.m每个部分功能讲解
%% Initialization clear ; close all; clc %% ================== Part 1: Load Example Dataset =================== % We start this exercise by using a small dataset that is easily to % visualize % fprintf('Visualizing example dataset for PCA.\n\n'); % The following command loads the dataset. You should now have the % variable X in your environment load ('ex7data1.mat'); % Visualize the example dataset plot(X(:, 1), X(:, 2), 'bo'); axis([0.5 6.5 2 8]); axis square; fprintf('Program paused. Press enter to continue.\n'); pause; %% =============== Part 2: Principal Component Analysis =============== % You should now implement PCA, a dimension reduction technique. You % should complete the code in pca.m % fprintf('\nRunning PCA on example dataset.\n\n'); % Before running PCA, it is important to first normalize X [X_norm, mu, sigma] = featureNormalize(X); <span style="color:#ff0000;">%特征缩放和均值归一化</span> % Run PCA [U, S] = pca(X_norm); % Compute mu, the mean of the each feature % Draw the eigenvectors centered at mean of data. These lines show the % directions of maximum variations in the dataset. hold on; drawLine(mu, mu + 1.5 * S(1,1) * U(:,1)', '-k', 'LineWidth', 2); drawLine(mu, mu + 1.5 * S(2,2) * U(:,2)', '-k', 'LineWidth', 2); hold off; fprintf('Top eigenvector: \n'); fprintf(' U(:,1) = %f %f \n', U(1,1), U(2,1)); fprintf('\n(you should expect to see -0.707107 -0.707107)\n'); fprintf('Program paused. Press enter to continue.\n'); pause; %% =================== Part 3: Dimension Reduction =================== % You should now implement the projection step to map the data onto the % first k eigenvectors. The code will then plot the data in this reduced % dimensional space. This will show you what the data looks like when % using only the corresponding eigenvectors to reconstruct it. % % You should complete the code in projectData.m % fprintf('\nDimension reduction on example dataset.\n\n'); % Plot the normalized dataset (returned from pca) plot(X_norm(:, 1), X_norm(:, 2), 'bo'); axis([-4 3 -4 3]); axis square % Project the data onto K = 1 dimension K = 1; Z = projectData(X_norm, U, K); fprintf('Projection of the first example: %f\n', Z(1)); fprintf('\n(this value should be about 1.481274)\n\n'); X_rec = recoverData(Z, U, K); fprintf('Approximation of the first example: %f %f\n', X_rec(1, 1), X_rec(1, 2)); fprintf('\n(this value should be about -1.047419 -1.047419)\n\n'); % Draw lines connecting the projected points to the original points hold on; plot(X_rec(:, 1), X_rec(:, 2), 'ro'); for i = 1:size(X_norm, 1) drawLine(X_norm(i,:), X_rec(i,:), '--k', 'LineWidth', 1); end hold off fprintf('Program paused. Press enter to continue.\n'); pause; %% =============== Part 4: Loading and Visualizing Face Data ============= % We start the exercise by first loading and visualizing the dataset. % The following code will load the dataset into your environment % fprintf('\nLoading face dataset.\n\n'); % Load Face dataset load ('ex7faces.mat') % Display the first 100 faces in the dataset displayData(X(1:100, :)); fprintf('Program paused. Press enter to continue.\n'); pause; %% =========== Part 5: PCA on Face Data: Eigenfaces =================== % Run PCA and visualize the eigenvectors which are in this case eigenfaces % We display the first 36 eigenfaces. % fprintf(['\nRunning PCA on face dataset.\n' ... '(this mght take a minute or two ...)\n\n']); % Before running PCA, it is important to first normalize X by subtracting % the mean value from each feature [X_norm, mu, sigma] = featureNormalize(X); % Run PCA [U, S] = pca(X_norm); % Visualize the top 36 eigenvectors found displayData(U(:, 1:36)'); <span style="color:#ff0000;">%将32*32=1024维人脸图像降为36维</span> fprintf('Program paused. Press enter to continue.\n'); pause; %% ============= Part 6: Dimension Reduction for Faces ================= % Project images to the eigen space using the top k eigenvectors % If you are applying a machine learning algorithm fprintf('\nDimension reduction for face dataset.\n\n'); K = 100; Z = projectData(X_norm, U, K); fprintf('The projected data Z has a size of: ') fprintf('%d ', size(Z)); fprintf('\n\nProgram paused. Press enter to continue.\n'); pause; %% ==== Part 7: Visualization of Faces after PCA Dimension Reduction ==== % Project images to the eigen space using the top K eigen vectors and % visualize only using those K dimensions % Compare to the original input, which is also displayed fprintf('\nVisualizing the projected (reduced dimension) faces.\n\n'); K = 100; X_rec = recoverData(Z, U, K); <span style="color:#ff0000;">%将36为数据重建为1024维人脸图像</span> % Display normalized data subplot(1, 2, 1); displayData(X_norm(1:100,:)); title('Original faces'); axis square; % Display reconstructed data from only k eigenfaces subplot(1, 2, 2); displayData(X_rec(1:100,:)); title('Recovered faces'); axis square; fprintf('Program paused. Press enter to continue.\n'); pause;
Part1:Load Example Dateset---实现PCA的实现样本导入并画出可视化图
Part2:Principal Componet Analysis---先对数据进行预处理(特征缩放,均值归一)在计算PCA,得到U_reduce,S
Part3:Dimension Reduction ---利用得到的U_reduce对数据进行降维(此处为2D降低到1D),并画出降维后的图像
Part4:Loading and Visualizing Face Data---导入人脸数据,并可视化人脸,每张人脸为32*32 = 1024维
Part5:PCA on Face Data:Eigenfaces---利用已经完善的PCA算法将1024维的人脸降低为36维,并画出此时的图像
Part6:Dimension Reduction for Faces---利用已得到的36维数据和U_reduce,重建1024维图像
Part7:Visualization of faces after PCA Dimension Reduction---做出原图和重建图像的对比图,便于区分它们之间的区别
2.PCA.m的实现过程
function [U, S] = pca(X) %PCA Run principal component analysis on the dataset X % [U, S, X] = pca(X) computes eigenvectors of the covariance matrix of X % Returns the eigenvectors U, the eigenvalues (on diagonal) in S % % Useful values [m, n] = size(X); % You need to return the following variables correctly. U = zeros(n); S = zeros(n); % ====================== YOUR CODE HERE ====================== % Instructions: You should first compute the covariance matrix. Then, you % should use the "svd" function to compute the eigenvectors % and eigenvalues of the covariance matrix. % % Note: When computing the covariance matrix, remember to divide by m (the % number of examples). % sigma = X' * X / m; %计算协方差矩阵 [U,S,V] = svd(sigma); %利用SVD函数计算降维后的特征向量集U和对角矩阵S % ========================================================================= end
function Z = projectData(X, U, K) %PROJECTDATA Computes the reduced data representation when projecting only %on to the top k eigenvectors % Z = projectData(X, U, K) computes the projection of % the normalized inputs X into the reduced dimensional space spanned by % the first K columns of U. It returns the projected examples in Z. % % You need to return the following variables correctly. Z = zeros(size(X, 1), K); % ====================== YOUR CODE HERE ====================== % Instructions: Compute the projection of the data using only the top K % eigenvectors in U (first K columns). % For the i-th example X(i,:), the projection on to the k-th % eigenvector is given as follows: % x = X(i, :)'; % projection_k = x' * U(:, k); % Z = X * U(:,1:K);%计算X在新维度下的表示Z % ============================================================= end4.recoverData.m的实现过程
function X_rec = recoverData(Z, U, K) %RECOVERDATA Recovers an approximation of the original data when using the %projected data % X_rec = RECOVERDATA(Z, U, K) recovers an approximation the % original data that has been reduced to K dimensions. It returns the % approximate reconstruction in X_rec. % % You need to return the following variables correctly. X_rec = zeros(size(Z, 1), size(U, 1)); % ====================== YOUR CODE HERE ====================== % Instructions: Compute the approximation of the data by projecting back % onto the original space using the top K eigenvectors in U. % % For the i-th example Z(i,:), the (approximate) % recovered data for dimension j is given as follows: % v = Z(i, :)'; % recovered_j = v' * U(j, 1:K)'; % % Notice that U(j, 1:K) is a row vector. % X_rec = Z * U(:,1:K)'; %重建X,把X从K维度重建为N维度 % ============================================================= end