这个简单的分类器模型是建立在每一个类别的特征向量服从正态分布的基础上的(尽管,不必是独立的),因此,整个分布函数被假设为一个高斯分布,每一类别一组系数。当给定了训练数据,算法将会估计每一个类别的向量均值和方差矩阵,然后根据这些进行预测。
对正态分布的数据的贝叶斯分类器
class CvNormalBayesClassifier : public CvStatModel { public: CvNormalBayesClassifier(); virtual ~CvNormalBayesClassifier(); CvNormalBayesClassifier( const CvMat* _train_data, const CvMat* _responses, const CvMat* _var_idx=0, const CvMat* _sample_idx=0 ); virtual bool train( const CvMat* _train_data, const CvMat* _responses, const CvMat* _var_idx = 0, const CvMat* _sample_idx=0, bool update=false ); virtual float predict( const CvMat* _samples, CvMat* results=0 ) const; virtual void clear(); virtual void save( const char* filename, const char* name=0 ); virtual void load( const char* filename, const char* name=0 ); virtual void write( CvFileStorage* storage, const char* name ); virtual void read( CvFileStorage* storage, CvFileNode* node ); protected: ... };
训练这个模型
bool CvNormalBayesClassifier::train( const CvMat* _train_data, const CvMat* _responses, const CvMat* _var_idx = 0, const CvMat* _sample_idx=0, bool update=false );
这个函数训练正态贝叶斯分类器。并且遵循通常训练“函数”的以下一些限制:只支持CV_ROW_SAMPLE类型的数据,输入的变量全部应该是有序的,输出的变量是一个分类结果。(例如,_responses中的元素必须是整数,因此向量的类型有可能是32fC1类型的),不支持missing, measurements。
另外,有一个update标志,标志着模型是否使用新数据升级。 In addition, there is update flag that identifies, whether the model should be trained from scratch (update=false) or be updated using the new training data (update=true).
对未知的样本或或本集进行预测
float CvNormalBayesClassifier::predict( const CvMat* samples, CvMat* results=0 ) const;
这个函数估计输入向量的最有可能的类别。输入向量(一个或多个)被储存在矩阵的每一行中。对于多个输入向量,则输出会是一个向量结果。对于单一的输入,函数本身的返回值就是预测结果。 长段文字
//openCV中贝叶斯分类器的API函数用法举例 //运行环境:winXP + VS2008 + openCV2.3.0 #include "stdafx.h" #include "opencv.hpp" #include "iostream" using namespace cv; using namespace std; //10个样本特征向量维数为12的训练样本集,第一列为该样本的类别标签 double inputArr[10][13] = { 1,0.708333,1,1,-0.320755,-0.105023,-1,1,-0.419847,-1,-0.225806,0,1, -1,0.583333,-1,0.333333,-0.603774,1,-1,1,0.358779,-1,-0.483871,0,-1, 1,0.166667,1,-0.333333,-0.433962,-0.383562,-1,-1,0.0687023,-1,-0.903226,-1,-1, -1,0.458333,1,1,-0.358491,-0.374429,-1,-1,-0.480916,1,-0.935484,0,-0.333333, -1,0.875,-1,-0.333333,-0.509434,-0.347032,-1,1,-0.236641,1,-0.935484,-1,-0.333333, -1,0.5,1,1,-0.509434,-0.767123,-1,-1,0.0534351,-1,-0.870968,-1,-1, 1,0.125,1,0.333333,-0.320755,-0.406393,1,1,0.0839695,1,-0.806452,0,-0.333333, 1,0.25,1,1,-0.698113,-0.484018,-1,1,0.0839695,1,-0.612903,0,-0.333333, 1,0.291667,1,1,-0.132075,-0.237443,-1,1,0.51145,-1,-0.612903,0,0.333333, 1,0.416667,-1,1,0.0566038,0.283105,-1,1,0.267176,-1,0.290323,0,1 }; //一个测试样本的特征向量 double testArr[]= { 0.25,1,1,-0.226415,-0.506849,-1,-1,0.374046,-1,-0.83871,0,-1 }; int _tmain(int argc, _TCHAR* argv[]) { Mat trainData(10, 12, CV_32FC1);//构建训练样本的特征向量 for (int i=0; i<10; i++) { for (int j=0; j<12; j++) { trainData.at<float>(i, j) = inputArr[i][j+1]; } } Mat trainResponse(10, 1, CV_32FC1);//构建训练样本的类别标签 for (int i=0; i<10; i++) { trainResponse.at<float>(i, 0) = inputArr[i][0]; } CvNormalBayesClassifier nbc; bool trainFlag = nbc.train(trainData, trainResponse);//进行贝叶斯分类器训练 if (trainFlag) { cout<<"train over..."<<endl; nbc.save("c:/normalBayes.txt"); } else { cout<<"train error..."<<endl; system("pause"); exit(-1); } CvNormalBayesClassifier testNbc; testNbc.load("c:/normalBayes.txt"); Mat testSample(1, 12, CV_32FC1);//构建测试样本 for (int i=0; i<12; i++) { testSample.at<float>(0, i) = testArr[i]; } float flag = testNbc.predict(testSample);//进行测试 cout<<"flag = "<<flag<<endl; system("pause"); return 0; }