这篇论文提出了大名鼎鼎的HOG特征
使用SVM作为分类器,实现行人检测
adopting linear SVM based human detec- tion as a test case
HOG特征提取非常非常好
we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors sig- nificantly outperform existing feature sets for human detec- tion
a) We study the issue of feature sets for human detection, showing that lo- cally normalized Histogram of Oriented Gradient (HOG) de- scriptors provide excellent performance relative to other ex- isting feature sets
The proposed descriptors are reminiscent of edge orientation histograms [4,5], SIFT descriptors [12] and shape contexts [1], but they are computed on a dense grid of uniformly spaced cells and they use overlapping local contrast normalizations for im- proved performance
For sim- plicity and speed, we use linear SVM as a baseline classifier throughout the study.
See [6] for a survey. Papageorgiou et al [18] describe a pedestrian detector based on a polynomial SVM using rectified Haar wavelets as input descriptors >>
Depoortere et al give an optimized version of this
In contrast, our detector uses a simpler archi- tecture with a single detection window, but appears to give significantly higher performance on pedestrian images
The method is based on evaluating well-normalized local histograms of image gradient orienta- tions in a dense grid
The basic idea is that local object appearance and shape can often be characterized rather well by the distribution of local intensity gradients
把图片分成一个个小的cell,对每个cell计算梯度和梯度方向。然后几个cell合成一个block代表,在这个代表的基础上在进行对比度归一化,这可以让特征对光线强度具有不变性。
In practice this is im- plemented by dividing the image window into small spatial regions (“cells”), for each cell accumulating a local 1-D his- togram of gradient directions or edge orientations over the pixels of the cell. The combined histogram entries form the representation. For better invariance to illumination, shad- owing, etc., it is also useful to contrast-normalize the local responses before using them
使用梯度直方图比较成功的有知名的SIFT算法
Shape Context算法的基本思想也类似与cell和block,但是该算法只用到了像素计数,没有用到梯度方向直方图,但是这提高了算法的运行效率
HOG和SIFT算法的优点在于,他们可以捕捉局部物体梯度和方向
RGB and LAB colour spaces give comparable results, but restricting to grayscale reduces performance by 1.5% at 10−4FPPW
关闭Gamma校正,参见博客
Detector performance is sensitive to the way in which gradients are computed, but the simplest scheme turns out to be the best
作者同时使用了梯度检测算子和高斯算子来计算图像梯度,用了好几种大小的模版和值,发现还σ = 0时,以及使用[-1,0,1]的mask时,performance是最好的
Using larger masks always seems to decrease performance
Each pixel calculates a weighted vote for an edge orientation histogram channel based on the orientation of the gradient element centred on it, and the votes are accumu- lated into orientation bins over local spatial regions that we call cells
(2) To reduce aliasing, votes are interpolated bilinearly between the neighbouring bin centres in both orientation and posi- tion.
注意,这里每个bin是基于每个像素的梯度幅度进行统计的。当然了,也可以是梯度幅度裁剪值,或者其平方值但是,通常还是使用其原始值还是最好的
bin数量的选择,对于最终的performance有着非常大的影响,所以我们需要尽可能选择合适数量的bin,事件表明,bin取9时是比较好的。至于角度的区间,我们可以选择[0,360],也可以选择[0,180],后者要更好一些
(1) Gradient strengths vary over a wide range owing to local variations in illumination and foreground-background con- trast, so effective local contrast normalization turns out to be essential for good performance
(2) In fact, we typically overlap the blocks so that each scalar cell response contributes several components to the final de- scriptor vector, each normalized with respect to a different block
图示可以看出,L2-norm、L2-hys、L1-sqrt三种方法效果差不多,但是L1-norm效果就不上很好了。 对比没有使用归一化的结果,可以看出局部对比度归一化还是很有比较的
HOG的performance比小波变换以及各种实现对图像进行平滑的方法都要好,因为一副图像中最有效的信息就是来源于变化大的边缘。如果要对其进行平滑模糊,会丢失很多的信息另外需要注意的是,梯度计算的算子的大小也要适中,不能太大,也不能太小
另外,局部对比度归一化在提取HOG特征的时候也是非常重要的!它可以减少不同光照条件对图像梯度的影响
因为时间有限,仓促的实现了一个比较粗糙的版本,主要是为了理解算法,以后有时间了再改改。
import numpy as np
import matplotlib.pyplot as plt
from math import pi
cell_size = (8, 8)
block_size = (4,4)
def gamma(input_img, g=1 / 2.2):
out_img = np.mean(np.copy(input_img), axis=2)
out_img = (out_img + 0.5) / 256.0
out_img = np.power(out_img, g)
return np.array(out_img * 256 - 0.5, int)
def compute_Hog(img):
ker_x = [-1, 0, 1]
height = img.shape[0]
width = img.shape[1]
img_x = np.zeros_like(img)
img_y = np.zeros_like(img)
for x in range(0, width - 2):
for y in range(0, height):
tile = img[y, x:x + 3]
res = np.sum(tile * ker_x, axis=0)
img_x[y, x + 1] = res
for x in range(0, width):
for y in range(0, height - 2):
tile = img[y:y + 3, x]
res = np.sum(tile * ker_x, axis=0)
img_y[y + 1, x] = res
img_res = np.array(np.sqrt(img_x ** 2 + img_y ** 2), dtype=int)
plt.imshow(img_res,cmap='gray')
plt.show()
c_h = cell_size[0]
c_w = cell_size[1]
cell_all = []
for y in range(0, height,c_h):
cell_row = []
for x in range(0, width, c_w):
cell_x = img_x[y:y + c_h, x:x + c_w].flatten()
cell_y = img_y[y:y + c_h, x:x + c_w].flatten()
cell_g = img_res[y:y + c_h, x:x + c_w].flatten()
cell = np.zeros(shape=[9,])
for index,( gx, gy, amp) in enumerate(zip(cell_x, cell_y,cell_g)):
if gx == 0 or gy == 0: continue
theta = np.arctan(gx / gy)
if theta < 0: theta =-2 * theta
theta/=(pi / 9)
cur = int(theta)
cell[cur] = ( theta % 1 ) * amp
cell[(cur + 1 ) % 8] = (1 - theta % 1) * amp
cell_row.append(cell)
cell_all.append(cell_row)
res = []
for y in range(0, len(cell_all)-block_size[0],2):
for x in range(0, len(cell_all[0])-block_size[1]-3,2):
block = []
for b_y in range(y,y + block_size[0]):
for b_x in range(x,x+block_size[1]):
block.extend(cell_all[b_y][b_x])
# L2-norm
block_np = np.array(block)
block_np =block_np / (np.sum(np.sqrt(block_np**2))+1e-3)
res.extend(block_np)
return np.array(res)
if __name__ == '__main__':
img_path = 'F:\\DataSet\\MIT_persons_jpg\\per00001.jpg'
img = plt.imread(img_path)
img = gamma(img)
compute_Hog(img)