A Maximum-value Perfect Matching for Mining Hard Samples



  • MVP Matching: A Maximum-value Perfect Matching for Mining Hard Samples, with Application to Person Re-identification

    全文链接:
    http://openaccess.thecvf.com/content_ICCV_2019/html/Sun_MVP_Matching_A_Maximum-Value_Perfect_Matching_for_Mining_Hard_Samples_ICCV_2019_paper.html

    The source code is here.

    Introduction

    Hard Samples

    1. The appearance of different pedestrians may be highly similar;

    2. The pose of a person may vary significantly as time and space changed;

    3. The light conditions taken by some cameras are sometimes poor.

    These hard samples would strongly slow down the convergence rate of the metric learning, which works on pulling similar samples to cluster together while pushing dissimilar ones to widen apart.

    Or worse of all, the learned embedding metric and feature representation could be heavily biased by these hard samples.

    Deep metric learning

    • Seminal contrastive loss [1] or triplet loss [2] learns the semantical information within image pairs or triplets based on the siamese-like networks [3]. They do not make full use of all available information within a batch of samples.

    • Batch all triplet loss [4] and N-pair loss [5] have been developed to remedy this flaw, but they do not attach enough attention to hard samples and require expensive resampling techniques to boost the performance.

    • Lifted loss [6] and quadruplet loss [7] only consider hard negative samples mining while ignoring the hard positive samples.

    • Batch hard triplet loss [4] considers the hardest positive and negative mining depended on the distances of features simply. Its performance is easily influenced by some outlier samples (e.g., indistinguishable or mislabeled images in person re-ID datasets), which could be regarded as hardest sample pairs by many other samples simultaneously and lead to oscillation during metric learning process.

    Fig1

    Conventional batchwise loss objectives for metric learning can be optimized using all available information within training batches, so that all similar positive pairs with large ground distances and dissimilar negative pairs with small ground distances would be emphasized simultaneously.

    1. Overtraining:

      Similar positive pairs with small distances would still be optimized (e.g., x1x_1 and x4x_4).

    2. Counteracting:

      Or worse, if we treat the optimization as a whole rather than the individual particle, the metric learning process of these methods is vulnerable to oscillation. Since the hard samples might be emphasized by many particles simultaneously, they could all cancel each other out (e.g., x2x_2 and x5x_5).

    In contrast, metric learning with the MVP matching based loss objective can guarantee that each sample selects one exclusive hard positive and negative pairs.

    Thus, the learned MVP matching could deal with unbalanced samples according to the adaptive edge weights in the bipartite graph and the adverse optimization, e.g., overtraining and counteracting yielded by other anchors as black arrows shown in Figure 1.(c), would be effectively eliminated.

    As a consequence, the convergence rate and recognition performance of metric learning could be improved significantly.

    Method

    Fig2

    Adaptive Weighting for Positives and Negatives

    We divide the complete bipartite graph G(U,V,E)G(U,V,E) into two bipartite graphs GPG_P and GNG_N for positive and negative pairs respectively.

    An adaptive weight Mij+M_{ij}^+ and $M_{ij}^−$ of each edge (i,j)(i, j) in GPG_P and GNG_N:

    $$
    M_{ij}^+(x_i, x_j; f)=
    \begin{cases}
    ||f(x_i)-f(x_j)||_2^2-\alpha,\quad&\mbox{if}\quad\delta_{ij}^+=1\\
    0,&\mbox{if}\quad\delta_{ij}^+=0
    \end{cases}
    $$

    where

    $$
    \delta_{ij}^+=
    \begin{cases}
    1,\quad&\mbox{if}\quad ||f(x_i)-f(x_j)||_2^2\gt\alpha\\
    0,&\mbox{if}\quad ||f(x_i)-f(x_j)||_2^2\le\alpha
    \end{cases}
    $$

    where α\alpha is a learnable variable.

    This function would prevent the overtraining for positive samples in contrastive loss, which demands similar pairs gather as close as possible.

    $$
    M_{ij}^-(x_i, x_j; f)=
    \begin{cases}
    \beta-||f(x_i)-f(x_j)||_2^2,\quad&\mbox{if}\quad\delta_{ij}^-=1\\
    0,&\mbox{if}\quad\delta_{ij}^-=0
    \end{cases}
    $$

    where

    $$
    \delta_{ij}^-=
    \begin{cases}
    1,\quad&\mbox{if}\quad ||f(x_i)-f(x_j)||_2^2\lt\beta\\
    0,&\mbox{if}\quad ||f(x_i)-f(x_j)||_2^2\ge\beta
    \end{cases}
    $$

    where β=ϵ+α\beta = \epsilon + \alpha

    This function penalizes the dissimilar pairs within the margin β\beta and ignores the others. (prevent the counteracting)

    MVP Matching for Hard Samples Pairs

    We define the matching variables Tij+T_{ij}^+ and $T_{ij}^−$ for the edge (i,j)(i, j) on these two weighted bipartite graphs GPG_P and GNG_N:

    $$
    t_{ij}=
    \begin{cases}
    1,\quad&\mbox{indicates a matching pair}\\
    0,&\mbox{means unmatching one}
    \end{cases}
    $$

    where tijT+(or)t_{ij}\in T^{+(or-)}

    maxTij0i,j=1nTij+(or)Mij+(or)\max_{T_{ij}\ge 0}\sum_{i,j=1}^nT_{ij}^{+(or-)}M_{ij}^{+(or-)}

    $$\mbox{s.t.}\quad\sum_{j=1}^nT_{ij}^{+(or-)}=1,\quad\sum_{i=1}^nT_{ij}^{+(or-)}=1$$

    Kuhn-Munkres assignment (KA) algorithm [8]

    Definition 3.1. Feasible labeling :

    l:xi,xjUVRl:\forall x_i, x_j\in U\cup V\rightarrow R

    $$\mbox{s.t.}\quad M_{ij}\le l(x_i)+l(x_j)$$

    Definition 3.2. Equality graph :

    Gl={(i,j):Mij=l(xi)+l(xj)}G_l=\lbrace\forall (i, j):M_{ij}=l(x_i)+l(x_j)\rbrace

    Theorem 3.3. If ll is a feasible labeling function and TT is an arbitrary perfect matching (i.e., one-to-one matching) in the equality graph GlG_l, then TT must be a maximum-value perfect (MVP) matching TT^*.

    Proof: For TT in G(U,V,E)G(U, V, E):

    i,j(xi,xj)TMijixiUl(xi)+jxjVl(xj)\sum_{i, j}^{(x_i, x_j)\in T}M_{ij}\le\sum_i^{x_i\in U}l(x_i)+\sum_j^{x_j\in V}l(x_j)

    Only when TT is a matching in GlG_l:

    i,j(xi,xj)TMij=ixiUl(xi)+jxjVl(xj)\sum_{i, j}^{(x_i, x_j)\in T^*}M_{ij}=\sum_i^{x_i\in U}l(x_i)+\sum_j^{x_j\in V}l(x_j)

    Alg1

    Batch-wise MVP Matching based Loss

    $$
    \begin{split}
    L(x_i, x_j; f)&=L^++L^-\\
    &=\sum_{ij}^ny_{ij}T_{ij}^+M_{ij}^++\sum_{ij}^n(1-y_{ij})T_{ij}^-M_{ij}^-
    \end{split}
    $$

    where

    $$
    y_{ij}=
    \begin{cases}
    1,\quad &\mbox{if samples $x_i$ and $x_j$ are deemed similar}\\
    0,&\mbox{otherwise}
    \end{cases}
    $$

    Update:

    Lf(xi)=j=1n2(f(xi)f(xj))(δij+yijTij+δij(1yij)Tij)\frac{\partial L}{\partial f(x_i)}=\sum_{j=1}^n2(f(x_i)-f(x_j))(\delta_{ij}^+y_{ij}T_{ij}^+-\delta_{ij}^-(1-y_{ij})T_{ij}^-)

    Lf(xj)=i=1n2(f(xi)f(xj))(δij+yijTij+δij(1yij)Tij)\frac{\partial L}{\partial f(x_j)}=-\sum_{i=1}^n2(f(x_i)-f(x_j))(\delta_{ij}^+y_{ij}T_{ij}^+-\delta_{ij}^-(1-y_{ij})T_{ij}^-)

    α\alpha is a learnable variable:

    Lα=i,j=1n[yijδij++(1yij)δij]\frac{\partial L}{\partial \alpha}=\sum_{i,j=1}^n[-y_{ij}\delta_{ij}^++(1-y_{ij})\delta_{ij}^-]

    References

    [1] Sumit Chopra, Raia Hadsell, and Yann LeCun. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 539–546. IEEE, 2005. [link]

    [2] Gal Chechik, Varun Sharma, Uri Shalit, and Samy Bengio. Large scale online learning of image similarity through ranking. Journal of Machine Learning Research, 11(Mar):1109–1135, 2010. [link]

    [3] Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard S¨ackinger, and Roopak Shah. Signature verification using a "siamese" time delay neural network. In Advances in neural information processing systems, pages 737–744, 1994. [link]

    [4] Alexander Hermans, Lucas Beyer, and Bastian Leibe. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737, 2017. [link]

    [5] Kihyuk Sohn. Improved deep metric learning with multiclass n-pair loss objective. In Advances in Neural Information Processing Systems, pages 1857–1865, 2016. [link]

    [6] Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4004–4012, 2016. [link]

    [7] Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. Beyond triplet loss: a deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 403–412, 2017. [link]

    [8] Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955. [link]

    查看原文

    如果喜欢我的文章,欢迎关注我的个人博客


 

Copyright © 2018 bbs.dian.org.cn All rights reserved.

与 Dian 的连接断开,我们正在尝试重连,请耐心等待