Troubleshooting training of OU perturbation model (train/tr_curves_ml.m
).
Found bug in tr_curves_ml.m::ou_kernel
-- smoothing_variance
was used where perturb_smoothing_variance
should have been (copy-paste bug). Also fixed in sqexp_kernel
.
Optimizer appears to be running much more smoothly now. Probably because there previously was a strong correlation, due to smoothing_variance
appearing in two different parts of the equation. The resulting ridge could have caused slow progress.
...
Attempt #1
Initial params:
train_params =
smoothing_variance: 0.2500
noise_variance: 100
position_variance: 90000
rate_variance: 2.2500
perturb_smoothing_variance: 0.2500
perturb_rate_variance: 2.2500
perturb_position_variance: 90000
perturb_scale: 1
Stopped after 3 iterations. Results:
train_params_done_1 =
smoothing_variance: 0.4501
noise_variance: 2.5982e-04
position_variance: 6.5359e+04
rate_variance: 3.9645
perturb_smoothing_variance: 3.3437e-06
perturb_rate_variance: 0.7268
perturb_position_variance: 3.0785e+04
perturb_scale: 0.5867
Final ML: 28423.438936
The final ML (2.84e4) is greater than the no-pertrub model (2.37e4). This is expected, since this model has more parameters, so we can get a tighter fit (maybe overfitting).
I like that noise variance is much smaller (close to the minimum--do I need to lower the floor?). This is expected, because the only source of IID noise is the rasterization process.
Position standard deviation is much higher than before (255 vs. 126), as is rate standard deviation (2.0 vs. 0.47).
Perturb variances are harder to explain.
Perturb smoothing variance is low, which is nice to see, because the worst of the perturbations are arising from miscalibrations (rate and position perturbations), not deformations. But it's still far smaller than I expected. hOnestly, I was expecting it to be almost the same as smoothing_variance
, but I realize now it makes sense for it to be smaller.
Perturb position variance is much, much larger than I expected. Standard deviation is 175.4, which is basically saying each plant has 17 cm of IID perturbations from the unobserved mean plant. I have no good explanation for this, expect maybe that the initial value was ridiculously large.
Pertub scale seems reasonable.
Attempt #2
Test sensitivity to initialization. Initial values:
train_params_2 =
smoothing_variance: 0.0025
noise_variance: 0.1231
position_variance: 1.6676e+04
rate_variance: 0.2212
perturb_smoothing_variance: 0.0500
perturb_rate_variance: 0.1000
perturb_position_variance: 1
perturb_scale: 2
First four were initialized from the trained no-perturb model.
Note the biggest change: perturb_position_variance changed from 9000 to 1.
Terminated after 46 iterations; much better than the 3 iterations from last attempt.
train_params_done_2 =
smoothing_variance: 0.0019
noise_variance: 0.0721
position_variance: 1.6267e+04
rate_variance: 0.2050
perturb_smoothing_variance: 6.7471e-10
perturb_rate_variance: 1.3201e-04
perturb_position_variance: 403.7082
perturb_scale: 2.4888e+03
Final ML: 25163.231449
Weird that this is lower than that poorly-initialized model. Consider initializing using a combination of Attempt #1 and Attempt #2's initial values.
Smoothing variance is much smaller than before, noise variance is larger.
Position variance is in the same order-of-magnitude, but about 1/5 the magnitude.
Rate variance is an OoM smaller.
Perturb smoothing variance is still near-zero.
Perturb rate variance is 3 OoM smaller.
Perturb position variance is 2 OoM smaller.
Perturb scale is sooooooo high. Basically flat, so the regular and perturb variances sum.
Conclusions: Result is very sensitive to initialization.
Discussion: When perturb scale is this high, there exists a ridge, due to perturb variances playing the same role as normal variances. Note that when perturb scale is infinity, this model degenerates to the no-perturb model. Notice that Final ML isn't much better than the no-perturb. Possibly when perturb scale exploded, the model could no longer improve, so constraining perturb scale to some small set of values may be a good idea (sigmoid transform).