Bagging and other ensemble methods

zkhust

Bagging and other ensemble methods

Bagging is a tech for reducing generalization error by combining several models.

Ensemble

Suppose we have k models, every model has error $\epsilon_{i}$ . Variance $\mathbb E[\epsilon_{i}^{2}] = v$ and covariance $\mathbb E[\epsilon_{i}\epsilon_{j}] = c$ .The result of ensemble is:
$\mathbb E[(\frac{1}{k}\sum_{i}\epsilon_{i})^{2}] = \frac{1}{k^{2}} \mathbb E[\sum_{i}(\epsilon_{i}^{2} + \sum_{j} \epsilon_{i}\epsilon_{j})]=\frac{1}{k}v + \frac{k-1}{k}c$

So, when $c=v$ ,clearly, ensemble doesn't help at all.However, when $c=0$ , bagging result is $\frac{1}{k}v$ . It's much smaller than $v$ . On average, ensemble will perform at least as well as any of its members. And if the members make independent errors, the ensemble will perform significantly better than its members.

bagging

Bagging is a method that allows the same kind of model, training algorithm and objective function to be reused several times.

First, construct $k$ different datasets. Each dataset has the same examples as the origin dataset. And every dataset is constructed by sampling with replacement from the origin dataset. Then model $i$ is trained on the dataset $i$ .

Some tips to train nn models

mainly make partially independent errors.

random initialization
random selection of mini batches
differences in hyperparameters
different outcomes of non-deterministic implementations of nn