HOG图像特征提取及其SK-imgae实现

Unrealluver

前言

HOG(Histogram of Oriented Gradients)最早由是Dadal博士在CVPR 2005年的论文中提出，用以解决道路行人的识别问题。后来逐渐成为计算机视觉、模式识别领域很常用的一种描述图像局部纹理的特征。顾名思义，就是先计算图片某一区域中不同方向上梯度的值，然后进行累积，得到直方图，再将直方图进行一定的处理得到不同维数的特征。之后即可将特征可以输入到分类器里面了。
Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, 2005, 1: 886-893.
在其博士论文中，有更详细的描述及拓展。在使用过HOG之后，便会对它在识别上产生的提升作用叹为观止。拜读这篇博士论文的过程之中，也让人收获到了一些科研过程中有益的思路。
Dalal N. Finding people in images and videos[D]. Institut National Polytechnique de Grenoble-INPG, 2006.
刚刚上完刚哥的图像处理课程，有朋友来询问一些HOG的使用细节，这也是这篇文章出现的契机。相信一部分同学在课设中对python的skimage库的HOG使用有了各自的感受，那么接下来就让我们立足HOG的skimage实现，将其与论文步骤一一对应，深入探究一下此算法。

hog(image, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(3, 3),
        block_norm='L2-Hys', visualize=False, transform_sqrt=False,
        feature_vector=True, multichannel=None)

图像标准化

在这一步，我们的主要目的是为了预处理图像，减少光照等带来的影响。
$f(I) = I^{ \gamma }$

此处我们选择 $\gamma$ 值小于1便会使图像整体灰度变大，如果我们选择 $\gamma$ 值大于1便会使图像整体灰度变小。灰度的大小某种程度上决定了图片的亮暗，灰度越小，图片越发昏暗，反之亦然。

if transform_sqrt:
    image = np.sqrt(image)

skimage在此处的实现极为简洁，直接使用了开方来对图片进行处理。

图像平滑

去除灰度图像的噪点，一般选取离散高斯平滑模板进行平滑，高斯函数在不同平滑的尺度下进行对灰度图像进行平滑操作。Dalal的实验中moving from σ=0 to σ=2 reduces the recall rate from 89% to 80% at FPPW，反应给出做了图像平滑之后HOG效果反而变差。我们在实验过程中也得出了相似的结论，很容易让人想到，HOG是基于图像边缘梯度的算法，但平滑过程有可能破坏边缘的梯度信息，从而影响HOG的效果。

梯度计算

首先是像素点梯度的计算，我们使用 $I(x, y)$ 来表示图像上（x, y）像素点的像素值。那么每个像素点的水平和竖直方向的梯度(Gradient)可以分别被表示为:
$<br /> G_{x}(x, y)=I(x+1, y)-I(x-1, y)<br />$

$<br /> G_{y}(x, y)=I(x, y+1)-I(x, y-1)<br />$
横纵梯度表示

那么显然，作为两个梯度矢量，它们的幅度值和角度 $\alpha$ 也可以分别表示为：
$$
\begin{aligned}
G(x, y) &=\sqrt{G_{x}(x, y)^{2}+G_{y}(x, y)^{2}} \
\alpha &=\arctan \frac{G_{y}(x, y)}{G_{x}(x, y)}
\end{aligned}
$$

    if image.dtype.kind == 'u':
        # convert uint image to float
        # to avoid problems with subtracting unsigned numbers
        image = image.astype('float')

    if multichannel:
        g_row_by_ch = np.empty_like(image, dtype=np.double)
        g_col_by_ch = np.empty_like(image, dtype=np.double)
        g_magn = np.empty_like(image, dtype=np.double)

        for idx_ch in range(image.shape[2]):
            g_row_by_ch[:, :, idx_ch], g_col_by_ch[:, :, idx_ch] = \
                _hog_channel_gradient(image[:, :, idx_ch])
            g_magn[:, :, idx_ch] = np.hypot(g_row_by_ch[:, :, idx_ch],
                                            g_col_by_ch[:, :, idx_ch])

        # For each pixel select the channel with the highest gradient magnitude
        idcs_max = g_magn.argmax(axis=2)
        rr, cc = np.meshgrid(np.arange(image.shape[0]),
                             np.arange(image.shape[1]),
                             indexing='ij',
                             sparse=True)
        g_row = g_row_by_ch[rr, cc, idcs_max]
        g_col = g_col_by_ch[rr, cc, idcs_max]
    else:
        g_row, g_col = _hog_channel_gradient(image)

从HOG的实现中我们可以看到，这里是先将图片以float的形式读入，防止出现uint(小) - uint(大) 越界出现正数的情况。
接着是对于多信道的一个判断，如果图像是多信道的话，我们会分信道进行梯度统计，如果是灰度图片，会直接只进行一次梯度统计处理。梯度统计的代码如下：

def _hog_channel_gradient(channel):
    """Compute unnormalized gradient image along `row` and `col` axes.

    Parameters
    ----------
    channel : (M, N) ndarray
        Grayscale image or one of image channel.

    Returns
    -------
    g_row, g_col : channel gradient along `row` and `col` axes correspondingly.
    """
    g_row = np.empty(channel.shape, dtype=np.double)
    g_row[0, :] = 0
    g_row[-1, :] = 0
    g_row[1:-1, :] = channel[2:, :] - channel[:-2, :]
    g_col = np.empty(channel.shape, dtype=np.double)
    g_col[:, 0] = 0
    g_col[:, -1] = 0
    g_col[:, 1:-1] = channel[:, 2:] - channel[:, :-2]

    return g_row, g_col

接着，我们要将这些像素点整合为一个个的cell，选取的方式有正方形取点R-HOG，圆形取点C-HOG，和中心切割型取点Single centre C-HOG，而Dadel的论文指出：

We evaluated two variants of the C-HOG geometry, ones with a single circular central cell (similar to the GLOH feature), and ones whose cen-tralcellis divided into angular sectors as in shape contexts.We present results only for the circular-centrevariants, as these have fewer spatial cells than the divided centre ones and give the same per-formance in practice.

由于中心切割型要消耗更多的4cell，但效果却基本与圆形取点C-HOG相吻合，所以我们通常选用R-HOG和C-HOG二者之一。此处我们选择R-HOG这一常用的HOG结构。
下一步便是pixels per cell参数的选取，此处我们如果选择(4x4)作为参数，那么就代表由4x4个像素构成一个cell，这时要对每一个cell当中的各个像素进行梯度向量的统计，此处我们选择使用直方图来进行统计，对应的横轴坐标就是向量的角度。这里简单起见会考虑用若干个区间来覆盖向量角度，Dadal论文当中采用的是9份，skimage官方的demo中采用的是8份，这里我们不妨选取9份作为例子。

orientation选取为8举例

这样一来从0°到180°（如果是0°到360°则需考虑方向的正负）即可以分为20°的每份来作为梯度向量统计直方图的横轴，对应的纵轴方向则填充像素点对应的梯度的幅度值。
同理，我们选择cell per block参数，例如也选取(4x4)。那么对于每一个block，都由对应数量的cell合成。此时我们得到的块特征向量长度应该是4x4x9

C-HOG示意图，ppc=cpb=(4, 4)

s_row, s_col = image.shape[:2]
    c_row, c_col = pixels_per_cell
    b_row, b_col = cells_per_block

    n_cells_row = int(s_row // c_row)  # number of cells along row-axis
    n_cells_col = int(s_col // c_col)  # number of cells along col-axis

    # compute orientations integral images
    orientation_histogram = np.zeros((n_cells_row, n_cells_col, orientations))

    _hoghistogram.hog_histograms(g_col, g_row, c_col, c_row, s_col, s_row,
                                 n_cells_col, n_cells_row,
                                 orientations, orientation_histogram)

    # now compute the histogram for each cell
    hog_image = None

    if visualize:
        from .. import draw

        radius = min(c_row, c_col) // 2 - 1
        orientations_arr = np.arange(orientations)
        # set dr_arr, dc_arr to correspond to midpoints of orientation bins
        orientation_bin_midpoints = (
            np.pi * (orientations_arr + .5) / orientations)
        dr_arr = radius * np.sin(orientation_bin_midpoints)
        dc_arr = radius * np.cos(orientation_bin_midpoints)
        hog_image = np.zeros((s_row, s_col), dtype=float)
        for r in range(n_cells_row):
            for c in range(n_cells_col):
                for o, dr, dc in zip(orientations_arr, dr_arr, dc_arr):
                    centre = tuple([r * c_row + c_row // 2,
                                    c * c_col + c_col // 2])
                    rr, cc = draw.line(int(centre[0] - dc),
                                       int(centre[1] + dr),
                                       int(centre[0] + dc),
                                       int(centre[1] - dr))
                    hog_image[rr, cc] += orientation_histogram[r, c, o]

这里我们可以看到hog_histograms是一个bultins的函数，我们无法看到它内部的实现，但我们猜测应该是通过移动扫描窗口来实现直方图的cell统计。为了保证效率，采取了c实现。
这里还有一个visualize的实现，是在之前询问我们是否返回一个hog的可视图。如果选择是是，这里就会根据之前统计值引入draw作图。

归一化

使局部光照对比度归一化，压缩光照，明暗，边缘对比度对图片带来的影响。这一步是基于block进行的，也就是说每一个cell，可能同时属于不同的block，那么它就会在不同的block被分别均一化。
设 $v$ 为没有归一化的feature vector，此处的均一化，我们通常有以下四种方式可选：
- $L_{1}-norm: v \leftarrow \frac{v}{|v|_{1}+\xi}$
- $L_{1}-sqrt: v \leftarrow \sqrt{\frac{v}{|v|_{1}+\xi}}$
- $L_{2}-norm: v \leftarrow \frac{v}{\sqrt{|v|_{12}^{2}+\xi^{2}}}$ ：加一个极小的 $\xi$ 以防止分母为0
- $L_{2}-H y s$ ：在 $L_{2}$ 的基础上限制 $v$ 的最大值为0.2，再归一化。

这里的块均一化方法同时支持了我们上面所描述的四种方法。

def _hog_normalize_block(block, method, eps=1e-5):
    if method == 'L1':
        out = block / (np.sum(np.abs(block)) + eps)
    elif method == 'L1-sqrt':
        out = np.sqrt(block / (np.sum(np.abs(block)) + eps))
    elif method == 'L2':
        out = block / np.sqrt(np.sum(block ** 2) + eps ** 2)
    elif method == 'L2-Hys':
        out = block / np.sqrt(np.sum(block ** 2) + eps ** 2)
        out = np.minimum(out, 0.2)
        out = out / np.sqrt(np.sum(out ** 2) + eps ** 2)
    else:
        raise ValueError('Selected block normalization method is invalid.')

    return out

再来看一下具体的实现过程，n_blocks_row 对应的是block的行数，需要对应的cell在行上平均分布开的数目减去对应的cells_per_block的行数再加上1。列的计算依然。由此，我们可以推断出对应的特征向量维数应该是之前每一个block对应的维数4x4x9再乘上对应的block数目(8-4+1)x(8-4+1)，最终等于3600维。选取了不同的参数也可以根据此判据来进行计算。

    n_blocks_row = (n_cells_row - b_row) + 1
    n_blocks_col = (n_cells_col - b_col) + 1
    normalized_blocks = np.zeros((n_blocks_row, n_blocks_col,
                                  b_row, b_col, orientations))

    for r in range(n_blocks_row):
        for c in range(n_blocks_col):
            block = orientation_histogram[r:r + b_row, c:c + b_col, :]
            normalized_blocks[r, c, :] = \
                _hog_normalize_block(block, method=block_norm)

HOG图像特征提取及其SK-imgae实现

前言

图像标准化

图像平滑

梯度计算

归一化

参考文章