cs231assignment1代码实现记录

lamb

cs231assignment1代码实现记录

这里的LaTeX不支持矩阵，有点乱码，可以点链接 git博客链接

1.kNN

计算的距离(不需循环达到最高速)

$\displaystyle d_2(I_1,I_2)=\sqrt{ \sum_p(I^p_1-I^p_2)^2}$

证明如下：

$$\mathbf{A}=\left[ \begin{matrix} a^1_1 & a^2_1 & a^3_1 \ a^1_2 & a^2_2 & a^3_2 \end{matrix} \right]\$$ $$\mathbf{B}=\left[ \begin{matrix} b^1_1 & b^2_1 & b^3_1 \ b^1_2 & b^2_2 & b^3_2 \ b^1_3 & b^2_3 & b^3_3\end{matrix} \right]\ $ $ \mathbf{AB^T}=\left[ \begin{matrix} \sum_{k=1}^{3}{a^k_1b^k_1}&\sum_{k=1}^3 a^k_1b^k_2 & \sum_{k=1}^3 a^k_1b^k_3\ \sum_{k=1}^{3}{a^k_2b^k_1}&\sum_{k=1}^3 a^k_2b^k_2 & \sum_{k=1}^3 a^k_2b^k_3\\end{matrix} \right]\ $ $ \mathbf{A_{sq}}=\left[ \begin{matrix} \sum_{k=1}^{3}{(a^k_1)^2}&\sum_{k=1}^{3}{(a^k_1)^2}&\sum_{k=1}^{3}{(a^k_1)^2}\\sum_{k=1}^{3}{(a^k_2)^2}&\sum_{k=1}^{3}{(a^k_2)^2} & \sum_{k=1}^{3}{(a^k_2)^2}\\end{matrix} \right]\ $ $ \mathbf{B_{sq}}=\left[ \begin{matrix} \sum_{k=1}^{3}{(b^k_1)^2}&\sum_{k=2}^{3}{(b^k_2)^2}&\sum_{k=1}^{3}{(b^k_3)^2}\\sum_{k=1}^{3}{(b^k_1)^2}&\sum_{k=1}^{3}{(b^k_2)^2} & \sum_{k=1}^{3}{(b^k_3)^2}\\end{matrix} \right]\$$

可看出A、B向量集两两间的欧式距离为$$\sqrt{\mathbf{A_{sq}+B_{sq}-2AB^T}}$$

代码实现

dists[i][j] = np.sqrt(np.sum(np.square(X[i] - self.X_train[j])))

dists[i] = np.sqrt(np.sum(np.square(self.X_train - X[i]), axis = 1))

#可视作计算两个向量间的欧式距离，超级快
dists = np.sqrt(-2*np.dot(X, self.X_train.T) + np.sum(np.square(self.X_train), axis = 1) + np.transpose([np.sum(np.square(X), axis = 1)]))

对kNN的一些理解

The decision boundary of the k-NN classifier is not linear.（非线性）
The training phase of the algorithm only includes storing the feature vectors and class labels of the training samples.In the test phase, the test points are classified by assigning the most frequent tags among the k training samples closest to the query point, so the calculation is higher.（测试会使时间变长）

2.SVM

loss function

$$Li=\sum_{j≠y_i}max(0,s_j−s_{y_i}+Δ)
$$（s为分到某一类的分数）
- 检查小技巧：将W设为0矩阵，得到loss等于类目数量-1

Regularization

$$L=\frac{1}{N}\sum_i\sum_{j\not=y_i}[max(0,f(x_i;W)j-f(x_i;W){y_i}+\Delta)]+\lambda \sum_k \sum_l W^2_{k,l}$$

梯度计算

$$\nabla_{w_{y_i}}L=-(\sum_{j≠y_i}1(w_j^Tx_i-w^T_{y_i}x_i+\Delta>0))x_i$$ (如果条件为真，函数值为1，如果为假，则函数值为0)
```
margin=(margin>0)*1
row_sum=np.sum(margin,axis=1)
margin[np.arange(num_train),y]=-row_sum
dW=X.T.dot(margin)/num_train+reg*W
```

SGD

stochastic gradient descent，即随机梯度下降

总体代码实现

def svm_loss_vectorized(W, X, y, reg):
    loss = 0.0
    dW = np.zeros(W.shape) # initialize the gradient as zero
    scores = X.dot(W)        
    num_classes = W.shape[1]
    num_train = X.shape[0] 

    scores_correct = scores[np.arange(num_train), y]   # ！！！取出来score对应坐标的分数
    scores_correct = np.reshape(scores_correct, (num_train, -1))  # N by 1
    margins = scores - scores_correct + 1    # N by C
    margins = np.maximum(0,margins)
    margins[np.arange(num_train), y] = 0
    loss += np.sum(margins) / num_train
    loss += 0.5 * reg * np.sum(W * W)
    
    margins[margins > 0] = 1
    row_sum = np.sum(margins, axis=1)                  # 1 by N
    margins[np.arange(num_train), y] = -row_sum        
    dW += np.dot(X.T, margins)/num_train + reg * W     # D by C

    return loss, dW

对W的理解

The visualized SVM weights look like they have the average temple(outline) of the corresponding objects, which are what they are expected to respond to. Because the scores is the inner pruduct between the sample and the corresponding weight, if we want to get to higer score in the correct label, the correspoding weight should be more parallel to the sample.勾勒轮廓

3.Softmax

loss function

$$L_i = -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right) \hspace{0.5in} \text{or equivalently} \hspace{0.5in} L_i = -f_{y_i} + \log\sum_j e^{f_j}$$

shift_scores = scores - np.max(scores, axis = 1).reshape(-1,1)
softmax_output = np.exp(shift_scores)/np.sum(np.exp(shift_scores), axis = 1).reshape(-1,1)
loss = -np.sum(np.log(softmax_output[range(N), list(y)]))
loss /= N
loss +=  0.5* reg * (np.sum(W1 * W1) + np.sum(W2 * W2))

梯度计算

当$$j ！= y_i$$时，$$\frac {\partial{L_i}} {\partial{W_j}} = \frac{e^{f_{j}}}{ \sum_j e^{f_j} } \frac{\partial f_j}{\partial W_j} = \frac{e^{f_{j}}}{ \sum_j e^{f_j} } X_i^T$$

当$$j==y_i$$时，$$u

dscores = softmax_output.copy()
dscores[range(N), list(y)] -= 1  ##减一对应j==y_i
dscores /= N
grads['W2'] = h_output.T.dot(dscores) + reg * W2

再采用chain rule逐级求导

grads['b2'] = np.sum(dscores, axis = 0)

dh = dscores.dot(W2.T)
dh_ReLu = (h_output > 0) * dh
grads['W1'] = X.T.dot(dh_ReLu) + reg * W1
grads['b1'] = np.sum(dh_ReLu, axis = 0)

实际问题：数值稳定性

减去最大值，进行归一化

f = np.array([123, 456, 789]) # example with 3 classes and each having large scores
p = np.exp(f) / np.sum(np.exp(f)) # Bad: Numeric problem, potential blowup

# instead: first shift the values of f so that the highest number is 0:
f -= np.max(f) # f becomes [-666, -333, 0]
p = np.exp(f) / np.sum(np.exp(f)) # safe to do, gives the correct answer

与SVM的比较

Softmax分类器从不完全满意它产生的分数：正确的类总是有更高的概率，不正确的类总是更低的概率，并且损失总是会变得更好。然而，一旦边际得到满足，SVM就会感到高兴，并且它不会对超出此约束的精确分数进行微观管理。