August 03, 2013

Running optimizer...

**Issue**: Likelihood variance is collapsing to zero.

**Guess**: Training ML is different from naive ML and isn't penalizing noise correctly.

**Test 1**: save params, compare training ml and naive ml

Same result.

The marginal likelihood is monotonically increasing as the noise variance \(\sigma_n^2\) approaches zero. This shouldn't be happening, because as the likelihood variance collapses to a delta, the marginal likelihood should become equal to the prior evaluated at the (virtual) observed locations.

\begin{align}
\lim_{\sigma_n \to 0} p(D) &= \lim_{\sigma_n \to 0} \int p_0(x) \; p_l(D \mid x) dx & \text{(Marginalization)}\\
&= \lim_{\sigma_n \to 0} \int p_0(x) \; f(D - x) dx & \text{(Convolution)} \\
&= \int p_0(x) \delta(D - x) dx \\
&= p_0(D)
\end{align}

This implies there's a bug in the code causing this phenomenon; the mathematical model is not the cause.

Found the bug. When computing the likelihood precision \(S\) from the unscaled precision \(S_0\), the noise variance \(\sigma_n^2\) was multiplied instead of divided:

```
S = S0 * sigma_n; // incorrect
S = S0 * (1/sigma_n); // corrected version
```

Had some trouble with noise sigmas being too low. Solved by in two ways:

- Attempts 1-3: clamping to a minimum value and then adding a penalty depending on the amount that was clamped.
- Attempt 4: offset by minimum value before transforming

Command:

```
train_params_done = tr_train(test_Corrs_ll, train_params, data_, 400);
```

**Attempt #1**:

Handle extreme values by incuring a penalty for variances smaller than 1e-5. Result stored in `train_params_done_1`

.

```
train_params_done_1 =
smoothing_variance: 0.0025
noise_variance: 0.1231
position_variance: 1.6676e+04
rate_variance: 0.2212
perturb_smoothing_variance: 1
perturb_rate_variance: 1
perturb_position_variance: 1
perturb_scale: 1
```

Surprisingly small smoothness sigma. Surprisingly low rate variance.

**Attempt #2**

Testing sensitivity to the magnitude of the penalty term. Scaled up penalty by 1000. Results in `train_params_done_2`

:

```
train_params_done_2 =
smoothing_variance: 0.0025
noise_variance: 0.1231
position_variance: 1.6676e+04
rate_variance: 0.2212
perturb_smoothing_variance: 1
perturb_rate_variance: 1
perturb_position_variance: 1
perturb_scale: 1
```

Summary: no change.

**Attempt #3**

Testing sensitivity to the threshold where the penalty term starts to be incurred. Set MIN_NOISE_VARIANCE to 1e-3 and MIN_SMOOTHING_VARIANCE to 1e-7 (previously both 1e-5). Results in `train_params_done_3`

:

```
train_params_done_3 =
smoothing_variance: 0.0025
noise_variance: 0.1231
position_variance: 1.6676e+04
rate_variance: 0.2212
perturb_smoothing_variance: 1
perturb_rate_variance: 1
perturb_position_variance: 1
perturb_scale: 1
```

Summary: no change.

** Attempt #4 **
Handled small variances by offsetting before transforming:

```
% converting param to state variable
x(1) = log(smoothing_variance - MIN_SMOOTING_VARIANCE);
% converting state variable to param
x(1) = exp(smoothing_variance) + MIN_SMOOTING_VARIANCE;
```

This is more elegant than the penalty hack used in the last three attempts. Results:

```
train_params_done =
smoothing_variance: 0.0025
noise_variance: 0.1231
position_variance: 1.6676e+04
rate_variance: 0.2212
perturb_smoothing_variance: 1
perturb_rate_variance: 1
perturb_position_variance: 1
perturb_scale: 1
```

Summary: no change.

Successfully trained model.

Required **115 function evaluations**, taking **112 s**.

Optimal marginal likelihood: **23714.937760**.

Optimal parameters:

```
smoothing_variance: 0.0025
noise_variance: 0.1231
position_variance: 1.6676e+04
rate_variance: 0.2212
```

Surprised at how small noise_variance was, considering the calibration noise. However, I guess the maximum-likelihood reconstruction looked pretty good, if unity-apect-ratio axis scaling is used.

Getting weird results. Halted after second iteration; second iteration took 250 function evaluations; perturb_smoothing_variance gradient is zero. Results:

```
train_params_done =
smoothing_variance: 1.5003e-05
noise_variance: 1.0087e-04
position_variance: 7.9393e+04
rate_variance: 0.7879
perturb_smoothing_variance: 0.2500
perturb_rate_variance: 1.5183
perturb_position_variance: 2.5159e+04
perturb_scale: 48.6163
```

Recall that new model ML's aren't deeply tested; probably a bug in there (in the kernel implementation? in the kernel theory? in the conversion from mats to kernel?). Will continue tomorrow.

Posted by Kyle Simek