压缩跟踪Compressive Tracking源码理解
[email protected]
http://blog.csdn.net/zouxy09
在前面一个介绍《Real-Time Compressive Tracking》这个paper的感知跟踪算法的博文中,我说过后面会学习下它的C++源码,但是当时因为有些事,所以就没有看了。今天,上到博客,看到一朋友在这个博文中评论说,有个地方不太明白。然后,觉得该履行自己的承诺,去学习学习源码了。所以刚才就花了几个小时去看了C++的源码,做了详细的注释。希望对大家有点帮助。在这也感谢这位朋友。当然,因为自己也刚刚接触这个领域,所以也有很多地方我也看不懂或者理解错了,也渴望大家的指导。
下面是这个算法的工程网站:里面包含了上面这篇论文、Matlab和C++版本的代码,还有测试数据、demo等。
http://www4.comp.polyu.edu.hk/~cslzhang/CT/CT.htm
之前自己学习这个《Real-Time Compressive Tracking》介绍的感知跟踪算法:
http://blog.csdn.net/zouxy09/article/details/8118360
非常感谢Kaihua等的paper《Real-Time Compressive Tracking》,非常感谢它的C++代码的编写和贡献者Yang Xian。
这个C++代码编写的非常简洁、清晰和漂亮。另外,经原作者提示,代码注释中不明白的地方(我打问号的地方)可以看本博文的原作者的评论。非常感谢Yang Xian的指导。
好了,废话不多说了。下面是自己注释的源码。因为代码编写的流程非常清晰,所以我就不总结流程了。这个工程包含三个文件:CompressiveTracker.cpp、CompressiveTracker.h和RunTracker.cpp,其中因为RunTracker.cpp和TLD算法中的run_tld.cpp差不多,我这里就不注释了,大家可以参考我之前的:
TLD(Tracking-Learning-Detection)学习与源码理解之(四)
http://blog.csdn.net/zouxy09/article/details/7893032
下面是具体的源码:
CompressiveTracker.h
-
-
-
-
-
-
-
-
-
-
-
-
-
- #pragma once
- #include <opencv2/opencv.hpp>
- #include <vector>
-
- using std::vector;
- using namespace cv;
-
- class CompressiveTracker
- {
- public:
- CompressiveTracker(void);
- ~CompressiveTracker(void);
-
- private:
- int featureMinNumRect;
- int featureMaxNumRect;
- int featureNum;
- vector<vector<Rect>> features;
- vector<vector<float>> featuresWeight;
- int rOuterPositive;
- vector<Rect> samplePositiveBox;
- vector<Rect> sampleNegativeBox;
- int rSearchWindow;
- Mat imageIntegral;
- Mat samplePositiveFeatureValue;
- Mat sampleNegativeFeatureValue;
-
-
-
- vector<float> muPositive;
- vector<float> sigmaPositive;
- vector<float> muNegative;
- vector<float> sigmaNegative;
- float learnRate;
- vector<Rect> detectBox;
- Mat detectFeatureValue;
- RNG rng;
-
- private:
- void HaarFeature(Rect& _objectBox, int _numFeature);
- void sampleRect(Mat& _image, Rect& _objectBox, float _rInner, float _rOuter, int _maxSampleNum, vector<Rect>& _sampleBox);
- void sampleRect(Mat& _image, Rect& _objectBox, float _srw, vector<Rect>& _sampleBox);
- void getFeatureValue(Mat& _imageIntegral, vector<Rect>& _sampleBox, Mat& _sampleFeatureValue);
- void classifierUpdate(Mat& _sampleFeatureValue, vector<float>& _mu, vector<float>& _sigma, float _learnRate);
- void radioClassifier(vector<float>& _muPos, vector<float>& _sigmaPos, vector<float>& _muNeg, vector<float>& _sigmaNeg,
- Mat& _sampleFeatureValue, float& _radioMax, int& _radioMaxIndex);
- public:
- void processFrame(Mat& _frame, Rect& _objectBox);
- void init(Mat& _frame, Rect& _objectBox);
- };
CompressiveTracker.cpp
- #include "CompressiveTracker.h"
- #include <math.h>
- #include <iostream>
- using namespace cv;
- using namespace std;
-
-
-
- CompressiveTracker::CompressiveTracker(void)
- {
- featureMinNumRect = 2;
- featureMaxNumRect = 4;
- featureNum = 50;
- rOuterPositive = 4;
- rSearchWindow = 25;
- muPositive = vector<float>(featureNum, 0.0f);
- muNegative = vector<float>(featureNum, 0.0f);
- sigmaPositive = vector<float>(featureNum, 1.0f);
- sigmaNegative = vector<float>(featureNum, 1.0f);
- learnRate = 0.85f;
- }
-
- CompressiveTracker::~CompressiveTracker(void)
- {
- }
-
-
-
-
-
-
-
-
-
- void CompressiveTracker::HaarFeature(Rect& _objectBox, int _numFeature)
-
-
-
-
-
- {
-
-
- features = vector<vector<Rect>>(_numFeature, vector<Rect>());
-
-
-
-
- featuresWeight = vector<vector<float>>(_numFeature, vector<float>());
-
-
-
- int numRect;
- Rect rectTemp;
- float weightTemp;
-
- for (int i=0; i<_numFeature; i++)
- {
-
-
-
-
-
- numRect = cvFloor(rng.uniform((double)featureMinNumRect, (double)featureMaxNumRect));
-
-
- for (int j=0; j<numRect; j++)
- {
-
-
-
-
- rectTemp.x = cvFloor(rng.uniform(0.0, (double)(_objectBox.width - 3)));
- rectTemp.y = cvFloor(rng.uniform(0.0, (double)(_objectBox.height - 3)));
-
- rectTemp.width = cvCeil(rng.uniform(0.0, (double)(_objectBox.width - rectTemp.x - 2)));
- rectTemp.height = cvCeil(rng.uniform(0.0, (double)(_objectBox.height - rectTemp.y - 2)));
-
- features[i].push_back(rectTemp);
-
-
-
-
-
-
-
-
- weightTemp = (float)pow(-1.0, cvFloor(rng.uniform(0.0, 2.0))) / sqrt(float(numRect));
-
-
- featuresWeight[i].push_back(weightTemp);
-
- }
- }
- }
-
-
- void CompressiveTracker::sampleRect(Mat& _image, Rect& _objectBox, float _rInner, float _rOuter, int _maxSampleNum, vector<Rect>& _sampleBox)
-
-
-
-
-
-
-
-
-
- {
- int rowsz = _image.rows - _objectBox.height - 1;
- int colsz = _image.cols - _objectBox.width - 1;
-
-
-
-
-
- float inradsq = _rInner*_rInner;
- float outradsq = _rOuter*_rOuter;
-
- int dist;
-
-
- int minrow = max(0,(int)_objectBox.y-(int)_rInner);
- int maxrow = min((int)rowsz-1,(int)_objectBox.y+(int)_rInner);
- int mincol = max(0,(int)_objectBox.x-(int)_rInner);
- int maxcol = min((int)colsz-1,(int)_objectBox.x+(int)_rInner);
-
-
- int i = 0;
-
-
-
- float prob = ((float)(_maxSampleNum))/(maxrow-minrow+1)/(maxcol-mincol+1);
-
- int r;
- int c;
-
- _sampleBox.clear();
- Rect rec(0,0,0,0);
-
- for( r=minrow; r<=(int)maxrow; r++ )
- for( c=mincol; c<=(int)maxcol; c++ ){
-
- dist = (_objectBox.y-r)*(_objectBox.y-r) + (_objectBox.x-c)*(_objectBox.x-c);
-
-
-
-
-
-
-
- if( rng.uniform(0.,1.) < prob && dist < inradsq && dist >= outradsq ){
-
- rec.x = c;
- rec.y = r;
- rec.width = _objectBox.width;
- rec.height= _objectBox.height;
-
- _sampleBox.push_back(rec);
-
- i++;
- }
- }
-
- _sampleBox.resize(i);
-
- }
-
-
-
-
- void CompressiveTracker::sampleRect(Mat& _image, Rect& _objectBox, float _srw, vector<Rect>& _sampleBox)
-
- {
- int rowsz = _image.rows - _objectBox.height - 1;
- int colsz = _image.cols - _objectBox.width - 1;
- float inradsq = _srw*_srw;
-
- int dist;
-
- int minrow = max(0,(int)_objectBox.y-(int)_srw);
- int maxrow = min((int)rowsz-1,(int)_objectBox.y+(int)_srw);
- int mincol = max(0,(int)_objectBox.x-(int)_srw);
- int maxcol = min((int)colsz-1,(int)_objectBox.x+(int)_srw);
-
- int i = 0;
-
- int r;
- int c;
-
- Rect rec(0,0,0,0);
- _sampleBox.clear();
-
- for( r=minrow; r<=(int)maxrow; r++ )
- for( c=mincol; c<=(int)maxcol; c++ ){
- dist = (_objectBox.y-r)*(_objectBox.y-r) + (_objectBox.x-c)*(_objectBox.x-c);
-
- if( dist < inradsq ){
-
- rec.x = c;
- rec.y = r;
- rec.width = _objectBox.width;
- rec.height= _objectBox.height;
-
- _sampleBox.push_back(rec);
-
- i++;
- }
- }
-
- _sampleBox.resize(i);
-
- }
-
-
-
-
-
-
-
- void CompressiveTracker::getFeatureValue(Mat& _imageIntegral, vector<Rect>& _sampleBox, Mat& _sampleFeatureValue)
- {
- int sampleBoxSize = _sampleBox.size();
- _sampleFeatureValue.create(featureNum, sampleBoxSize, CV_32F);
- float tempValue;
- int xMin;
- int xMax;
- int yMin;
- int yMax;
-
- for (int i=0; i<featureNum; i++)
- {
- for (int j=0; j<sampleBoxSize; j++)
- {
- tempValue = 0.0f;
- for (size_t k=0; k<features[i].size(); k++)
- {
-
-
- xMin = _sampleBox[j].x + features[i][k].x;
- xMax = _sampleBox[j].x + features[i][k].x + features[i][k].width;
- yMin = _sampleBox[j].y + features[i][k].y;
- yMax = _sampleBox[j].y + features[i][k].y + features[i][k].height;
-
-
-
-
-
- tempValue += featuresWeight[i][k] *
- (_imageIntegral.at<float>(yMin, xMin) +
- _imageIntegral.at<float>(yMax, xMax) -
- _imageIntegral.at<float>(yMin, xMax) -
- _imageIntegral.at<float>(yMax, xMin));
- }
- _sampleFeatureValue.at<float>(i,j) = tempValue;
- }
- }
- }
-
-
-
-
-
- void CompressiveTracker::classifierUpdate(Mat& _sampleFeatureValue, vector<float>& _mu, vector<float>& _sigma, float _learnRate)
- {
- Scalar muTemp;
- Scalar sigmaTemp;
-
- for (int i=0; i<featureNum; i++)
- {
-
- meanStdDev(_sampleFeatureValue.row(i), muTemp, sigmaTemp);
-
-
- _sigma[i] = (float)sqrt( _learnRate*_sigma[i]*_sigma[i] + (1.0f-_learnRate)*sigmaTemp.val[0]*sigmaTemp.val[0]
- + _learnRate*(1.0f-_learnRate)*(_mu[i]-muTemp.val[0])*(_mu[i]-muTemp.val[0]));
-
- _mu[i] = _mu[i]*_learnRate + (1.0f-_learnRate)*muTemp.val[0];
- }
- }
-
-
- void CompressiveTracker::radioClassifier(vector<float>& _muPos, vector<float>& _sigmaPos, vector<float>& _muNeg, vector<float>& _sigmaNeg,
- Mat& _sampleFeatureValue, float& _radioMax, int& _radioMaxIndex)
- {
- float sumRadio;
-
-
- _radioMax = -FLT_MAX;
-
- _radioMaxIndex = 0;
- float pPos;
- float pNeg;
- int sampleBoxNum = _sampleFeatureValue.cols;
-
- for (int j=0; j<sampleBoxNum; j++)
- {
- sumRadio = 0.0f;
- for (int i=0; i<featureNum; i++)
- {
-
-
-
- pPos = exp( (_sampleFeatureValue.at<float>(i,j)-_muPos[i])*(_sampleFeatureValue.at<float>(i,j)-_muPos[i]) / -(2.0f*_sigmaPos[i]*_sigmaPos[i]+1e-30) ) / (_sigmaPos[i]+1e-30);
- pNeg = exp( (_sampleFeatureValue.at<float>(i,j)-_muNeg[i])*(_sampleFeatureValue.at<float>(i,j)-_muNeg[i]) / -(2.0f*_sigmaNeg[i]*_sigmaNeg[i]+1e-30) ) / (_sigmaNeg[i]+1e-30);
-
-
-
-
-
- sumRadio += log(pPos+1e-30) - log(pNeg+1e-30);
- }
- if (_radioMax < sumRadio)
- {
- _radioMax = sumRadio;
- _radioMaxIndex = j;
- }
- }
- }
-
-
- void CompressiveTracker::init(Mat& _frame, Rect& _objectBox)
- {
-
-
- HaarFeature(_objectBox, featureNum);
-
-
-
-
- sampleRect(_frame, _objectBox, rOuterPositive, 0, 1000000, samplePositiveBox);
- sampleRect(_frame, _objectBox, rSearchWindow*1.5, rOuterPositive+4.0, 100, sampleNegativeBox);
-
-
- integral(_frame, imageIntegral, CV_32F);
-
-
- getFeatureValue(imageIntegral, samplePositiveBox, samplePositiveFeatureValue);
- getFeatureValue(imageIntegral, sampleNegativeBox, sampleNegativeFeatureValue);
-
-
- classifierUpdate(samplePositiveFeatureValue, muPositive, sigmaPositive, learnRate);
- classifierUpdate(sampleNegativeFeatureValue, muNegative, sigmaNegative, learnRate);
- }
-
-
- void CompressiveTracker::processFrame(Mat& _frame, Rect& _objectBox)
- {
-
-
- sampleRect(_frame, _objectBox, rSearchWindow, detectBox);
-
- integral(_frame, imageIntegral, CV_32F);
-
- getFeatureValue(imageIntegral, detectBox, detectFeatureValue);
- int radioMaxIndex;
- float radioMax;
-
- radioClassifier(muPositive, sigmaPositive, muNegative, sigmaNegative, detectFeatureValue, radioMax, radioMaxIndex);
-
- _objectBox = detectBox[radioMaxIndex];
-
-
-
- sampleRect(_frame, _objectBox, rOuterPositive, 0.0, 1000000, samplePositiveBox);
- sampleRect(_frame, _objectBox, rSearchWindow*1.5, rOuterPositive+4.0, 100, sampleNegativeBox);
-
-
- getFeatureValue(imageIntegral, samplePositiveBox, samplePositiveFeatureValue);
- getFeatureValue(imageIntegral, sampleNegativeBox, sampleNegativeFeatureValue);
-
-
- classifierUpdate(samplePositiveFeatureValue, muPositive, sigmaPositive, learnRate);
- classifierUpdate(sampleNegativeFeatureValue, muNegative, sigmaNegative, learnRate);
- }
版权声明:本文为博主原创文章,未经博主允许不得转载。
这样的错误,不知道该怎么改正,请问您遇到过这种问题吗?
要用2008+opencv2.4.2就可以了
但代码貌似是一个MIL的简化版 不知博主怎么看
//所以需要加上box的坐标才是其在整幅图像中的坐标
xMin = _sampleBox[j].x + features[i][k].x;
xMax = _sampleBox[j].x + features[i][k].x + features[i][k].width;
yMin = _sampleBox[j].y + features[i][k].y;
yMax = _sampleBox[j].y + features[i][k].y + features[i][k].height;
这部分的作用能讲解下吗,为什么用样本的坐标加上特征的坐标,什么意思?
就像如果中国的北京作为原点(0,0),而上海的坐标是(10,-20),而广州在以上海为原点的时候的坐标是(-20,-30),那么广州在以北京作为原点的坐标是多少呢?就是两者相加嘛
举个例子,比如现在跟踪人脸,每次计算Haar特征时,对每一个_sampleBox抽取的小box的相对位置和权重都应该和第一帧是相同的(这是相对意义上的,小box的实际位置还要叠加上_sampleBox的坐标),这样才能保证第一帧如果随机抽取到的小box是人脸的鼻子,以后的每一帧都要抽取“假定是鼻子”的那个部位,而不是每次都随机抽取,因为在跟踪过程中,人脸在画面中是可以动的,但是鼻子相对于脸的位置是保持不变的,每一个_sampleBox都应该拿“假定是鼻子”的那个部位的Haar特征和上一帧目标中真实鼻子的Haar特征比较才是合理的,事实上只有唯一一个准确的_sampleBox的“假定是鼻子”的那个部位正好就是鼻子(此时贝叶斯分类器的得分最高),这个_sampleBox即是当前帧跟踪到的目标框!
【写了很多但不知道我表达清楚没有,如果我的理解有误,欢迎大家指出!】
这个是在线跟踪漂移和跟丢的问题。个人感觉这个是比较宽泛的问题,目前的策略还是有挺多的,例如运动分析啊、重构误差啊、保存历史模板进行匹配啊等等,至于有没有比较好的推荐,个人暂时没有这方面的经验。望各位前辈指导!呵呵
pPos = exp( (_sampleFeatureValue.at<float>(i,j)-_muPos[i])*(_sampleFeatureValue.at<float>(i,j)-_muPos[i]) / -(2.0f*_sigmaPos[i]*_sigmaPos[i]+1e-30) ) / (_sigmaPos[i]+1e-30); ……
//这个模型参数更新的公式见论文的公式6
_sigma[i] = (float)sqrt( _learnRate*_sigma[i]*_sigma[i] + (1.0f-_learnRate)*sigmaTemp.val[0]*sigmaTemp.val[0] ……
首先,对于原始的一个样本来说。它的特征我们用一系列的矩形特征(矩阵内所有像素灰度的和)来描述。而矩形特征可位于图像任意位置,大小也可以任意改变,所以很小的检测窗口就含有非常多的矩形特征。每一个矩阵特征(矩阵内所有像素灰度的和)即xi,所有的这些矩阵特征就构成了原始样本图像的特征向量x。这个x的维数很大。(那任意位置任意大小的矩形框内的像素和怎么算啊,就是积分图啊,和论文中的一系列多尺度矩形滤波器进行卷积的功能是一样的,因为卷积模板核的元素全是1)。
好了,原始样本的特征x的维数太大了,我们需要对其降维。降维后每一个样本只有50个特征,也就是v,n=50,每一个特征分量vi是由2到3个随机选择的矩形框的灰度加权求和来得到,也就是vi=rij*xj(对所有j),rij就是随机矩阵R中的第i行的第j个元素,因为每一行只有最多4个非零的元素(这里只挑2个或者3个)。那我们就只需要计算这些非零元素与xj(第j个矩形框内像素的灰度和)的乘积的和就行了。那怎么决定矩阵R的这一行哪些元素非零呢?这就通过随机矩阵的性质,通过论文中说到的生成随机矩阵的方法来随机生成了。这样得到的低维的特征v几乎保留原有图像的高维特征信息x。
不知道自己的理解对不,若有错误,往大家指正。呵呵
1.RIP矩阵中系数值为√2或√3的前提是s为2或者3,但实验中s的取值为m/4,程序中仍然取的√2或√3,而且为何s等于子窗口的数目有点不解
2.程序中是除以√2或√3,不是乘以,有点奇怪
不知道自己的理解对不,若有错误,往大家指正。呵呵
论文中说:上式s取2或者3时,矩阵就满足Johnson-Lindenstrauss推论。
呵呵,我觉得您理解的是对的啊。
那这么说来,如果想要在实际的监控视频上进行实验,应该是TLD的效果更好一些了?两者用的特征好像差不太多,TLD优势的部分在于半监督学习机制?对TLD了解比较少,你是怎么看的呢?
06_car的跟踪在大搜索窗口下变差了。
另外速度慢了几倍,具体没测试。
初步怀疑是用简单的积分图差做目标特征过于简单