A Maximumvalue Perfect Matching for Mining Hard Samples

MVP Matching: A Maximumvalue Perfect Matching for Mining Hard Samples, with Application to Person Reidentification
The source code is here.
Introduction
Hard Samples

The appearance of different pedestrians may be highly similar;

The pose of a person may vary significantly as time and space changed;

The light conditions taken by some cameras are sometimes poor.
These hard samples would strongly slow down the convergence rate of the metric learning, which works on pulling similar samples to cluster together while pushing dissimilar ones to widen apart.
Or worse of all, the learned embedding metric and feature representation could be heavily biased by these hard samples.
Deep metric learning

Seminal contrastive loss [1] or triplet loss [2] learns the semantical information within image pairs or triplets based on the siameselike networks [3]. They do not make full use of all available information within a batch of samples.

Batch all triplet loss [4] and Npair loss [5] have been developed to remedy this flaw, but they do not attach enough attention to hard samples and require expensive resampling techniques to boost the performance.

Lifted loss [6] and quadruplet loss [7] only consider hard negative samples mining while ignoring the hard positive samples.

Batch hard triplet loss [4] considers the hardest positive and negative mining depended on the distances of features simply. Its performance is easily influenced by some outlier samples (e.g., indistinguishable or mislabeled images in person reID datasets), which could be regarded as hardest sample pairs by many other samples simultaneously and lead to oscillation during metric learning process.
Conventional batchwise loss objectives for metric learning can be optimized using all available information within training batches, so that all similar positive pairs with large ground distances and dissimilar negative pairs with small ground distances would be emphasized simultaneously.

Overtraining:
Similar positive pairs with small distances would still be optimized (e.g.,
$x_1$ and$x_4$ ). 
Counteracting:
Or worse, if we treat the optimization as a whole rather than the individual particle, the metric learning process of these methods is vulnerable to oscillation. Since the hard samples might be emphasized by many particles simultaneously, they could all cancel each other out (e.g.,
$x_2$ and$x_5$ ).
In contrast, metric learning with the MVP matching based loss objective can guarantee that each sample selects one exclusive hard positive and negative pairs.
Thus, the learned MVP matching could deal with unbalanced samples according to the adaptive edge weights in the bipartite graph and the adverse optimization, e.g., overtraining and counteracting yielded by other anchors as black arrows shown in Figure 1.(c), would be effectively eliminated.
As a consequence, the convergence rate and recognition performance of metric learning could be improved significantly.
Method
Adaptive Weighting for Positives and Negatives
We divide the complete bipartite graph
$G(U,V,E)$ into two bipartite graphs$G_P$ and$G_N$ for positive and negative pairs respectively.An adaptive weight
$M_{ij}^+$ and$M_{ij}^−$ of each edge $(i, j)$ in$G_P$ and$G_N$ :$$
M_{ij}^+(x_i, x_j; f)=
\begin{cases}
f(x_i)f(x_j)_2^2\alpha,\quad&\mbox{if}\quad\delta_{ij}^+=1\\
0,&\mbox{if}\quad\delta_{ij}^+=0
\end{cases}
$$where
$$
\delta_{ij}^+=
\begin{cases}
1,\quad&\mbox{if}\quad f(x_i)f(x_j)_2^2\gt\alpha\\
0,&\mbox{if}\quad f(x_i)f(x_j)_2^2\le\alpha
\end{cases}
$$where
$\alpha$ is a learnable variable.This function would prevent the overtraining for positive samples in contrastive loss, which demands similar pairs gather as close as possible.
$$
M_{ij}^(x_i, x_j; f)=
\begin{cases}
\betaf(x_i)f(x_j)_2^2,\quad&\mbox{if}\quad\delta_{ij}^=1\\
0,&\mbox{if}\quad\delta_{ij}^=0
\end{cases}
$$where
$$
\delta_{ij}^=
\begin{cases}
1,\quad&\mbox{if}\quad f(x_i)f(x_j)_2^2\lt\beta\\
0,&\mbox{if}\quad f(x_i)f(x_j)_2^2\ge\beta
\end{cases}
$$where
$\beta = \epsilon + \alpha$ This function penalizes the dissimilar pairs within the margin
$\beta$ and ignores the others. (prevent the counteracting)MVP Matching for Hard Samples Pairs
We define the matching variables
$T_{ij}^+$ and$T_{ij}^−$ for the edge $(i, j)$ on these two weighted bipartite graphs$G_P$ and$G_N$ :$$
t_{ij}=
\begin{cases}
1,\quad&\mbox{indicates a matching pair}\\
0,&\mbox{means unmatching one}
\end{cases}
$$where
$t_{ij}\in T^{+(or)}$ $\max_{T_{ij}\ge 0}\sum_{i,j=1}^nT_{ij}^{+(or)}M_{ij}^{+(or)}$ $$\mbox{s.t.}\quad\sum_{j=1}^nT_{ij}^{+(or)}=1,\quad\sum_{i=1}^nT_{ij}^{+(or)}=1$$ KuhnMunkres assignment (KA) algorithm [8]
Definition 3.1. Feasible labeling :
$l:\forall x_i, x_j\in U\cup V\rightarrow R$ $$\mbox{s.t.}\quad M_{ij}\le l(x_i)+l(x_j)$$ Definition 3.2. Equality graph :
$G_l=\lbrace\forall (i, j):M_{ij}=l(x_i)+l(x_j)\rbrace$ Theorem 3.3. If
$l$ is a feasible labeling function and$T$ is an arbitrary perfect matching (i.e., onetoone matching) in the equality graph$G_l$ , then$T$ must be a maximumvalue perfect (MVP) matching$T^*$ .Proof: For
$T$ in$G(U, V, E)$ :$\sum_{i, j}^{(x_i, x_j)\in T}M_{ij}\le\sum_i^{x_i\in U}l(x_i)+\sum_j^{x_j\in V}l(x_j)$ Only when
$T$ is a matching in$G_l$ :$\sum_{i, j}^{(x_i, x_j)\in T^*}M_{ij}=\sum_i^{x_i\in U}l(x_i)+\sum_j^{x_j\in V}l(x_j)$ Batchwise MVP Matching based Loss
$$
\begin{split}
L(x_i, x_j; f)&=L^++L^\\
&=\sum_{ij}^ny_{ij}T_{ij}^+M_{ij}^++\sum_{ij}^n(1y_{ij})T_{ij}^M_{ij}^
\end{split}
$$where
$$
y_{ij}=
\begin{cases}
1,\quad &\mbox{if samples $x_i$ and $x_j$ are deemed similar}\\
0,&\mbox{otherwise}
\end{cases}
$$Update:
$\frac{\partial L}{\partial f(x_i)}=\sum_{j=1}^n2(f(x_i)f(x_j))(\delta_{ij}^+y_{ij}T_{ij}^+\delta_{ij}^(1y_{ij})T_{ij}^)$ $\frac{\partial L}{\partial f(x_j)}=\sum_{i=1}^n2(f(x_i)f(x_j))(\delta_{ij}^+y_{ij}T_{ij}^+\delta_{ij}^(1y_{ij})T_{ij}^)$ $\alpha$ is a learnable variable:$\frac{\partial L}{\partial \alpha}=\sum_{i,j=1}^n[y_{ij}\delta_{ij}^++(1y_{ij})\delta_{ij}^]$ References
[1] Sumit Chopra, Raia Hadsell, and Yann LeCun. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 539–546. IEEE, 2005. [link]
[2] Gal Chechik, Varun Sharma, Uri Shalit, and Samy Bengio. Large scale online learning of image similarity through ranking. Journal of Machine Learning Research, 11(Mar):1109–1135, 2010. [link]
[3] Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard S¨ackinger, and Roopak Shah. Signature verification using a "siamese" time delay neural network. In Advances in neural information processing systems, pages 737–744, 1994. [link]
[4] Alexander Hermans, Lucas Beyer, and Bastian Leibe. In defense of the triplet loss for person reidentification. arXiv preprint arXiv:1703.07737, 2017. [link]
[5] Kihyuk Sohn. Improved deep metric learning with multiclass npair loss objective. In Advances in Neural Information Processing Systems, pages 1857–1865, 2016. [link]
[6] Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4004–4012, 2016. [link]
[7] Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. Beyond triplet loss: a deep quadruplet network for person reidentification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 403–412, 2017. [link]
[8] Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(12):83–97, 1955. [link]
如果喜欢我的文章，欢迎关注我的个人博客
