[Work Log]

August 04, 2013

Project	Tulips
Subproject	Data Association v2
Working path	projects/tulips/trunk/src/matlab/data_association_2

Unless otherwise noted, all filesystem paths are relative to the "Working path" named above.

Troubleshooting training of OU perturbation model (train/tr_curves_ml.m).

Found bug in tr_curves_ml.m::ou_kernel -- smoothing_variance was used where perturb_smoothing_variance should have been (copy-paste bug). Also fixed in sqexp_kernel.

Optimizer appears to be running much more smoothly now. Probably because there previously was a strong correlation, due to smoothing_variance appearing in two different parts of the equation. The resulting ridge could have caused slow progress.

...

Attempt #1

Initial params:

train_params =

            smoothing_variance: 0.2500
                noise_variance: 100
             position_variance: 90000
                 rate_variance: 2.2500
    perturb_smoothing_variance: 0.2500
         perturb_rate_variance: 2.2500
     perturb_position_variance: 90000
                 perturb_scale: 1

Stopped after 3 iterations. Results:

train_params_done_1 = 

            smoothing_variance: 0.4501
                noise_variance: 2.5982e-04
             position_variance: 6.5359e+04
                 rate_variance: 3.9645
    perturb_smoothing_variance: 3.3437e-06
         perturb_rate_variance: 0.7268
     perturb_position_variance: 3.0785e+04
                 perturb_scale: 0.5867

Final ML: 28423.438936

The final ML (2.84e4) is greater than the no-pertrub model (2.37e4). This is expected, since this model has more parameters, so we can get a tighter fit (maybe overfitting).

I like that noise variance is much smaller (close to the minimum--do I need to lower the floor?). This is expected, because the only source of IID noise is the rasterization process.

Position standard deviation is much higher than before (255 vs. 126), as is rate standard deviation (2.0 vs. 0.47).

Perturb variances are harder to explain.

Perturb smoothing variance is low, which is nice to see, because the worst of the perturbations are arising from miscalibrations (rate and position perturbations), not deformations. But it's still far smaller than I expected. hOnestly, I was expecting it to be almost the same as smoothing_variance, but I realize now it makes sense for it to be smaller.

Perturb position variance is much, much larger than I expected. Standard deviation is 175.4, which is basically saying each plant has 17 cm of IID perturbations from the unobserved mean plant. I have no good explanation for this, expect maybe that the initial value was ridiculously large.

Pertub scale seems reasonable.

Attempt #2

Test sensitivity to initialization. Initial values:

train_params_2 = 

            smoothing_variance: 0.0025
                noise_variance: 0.1231
             position_variance: 1.6676e+04
                 rate_variance: 0.2212
    perturb_smoothing_variance: 0.0500
         perturb_rate_variance: 0.1000
     perturb_position_variance: 1
                 perturb_scale: 2

First four were initialized from the trained no-perturb model.

Note the biggest change: perturb_position_variance changed from 9000 to 1.

Terminated after 46 iterations; much better than the 3 iterations from last attempt.

train_params_done_2 = 

            smoothing_variance: 0.0019
                noise_variance: 0.0721
             position_variance: 1.6267e+04
                 rate_variance: 0.2050
    perturb_smoothing_variance: 6.7471e-10
         perturb_rate_variance: 1.3201e-04
     perturb_position_variance: 403.7082
                 perturb_scale: 2.4888e+03

Final ML: 25163.231449

Weird that this is lower than that poorly-initialized model. Consider initializing using a combination of Attempt #1 and Attempt #2's initial values.

Smoothing variance is much smaller than before, noise variance is larger.

Position variance is in the same order-of-magnitude, but about 1/5 the magnitude.

Rate variance is an OoM smaller.

Perturb smoothing variance is still near-zero.

Perturb rate variance is 3 OoM smaller.

Perturb position variance is 2 OoM smaller.

Perturb scale is sooooooo high. Basically flat, so the regular and perturb variances sum.

Conclusions: Result is very sensitive to initialization.

Discussion: When perturb scale is this high, there exists a ridge, due to perturb variances playing the same role as normal variances. Note that when perturb scale is infinity, this model degenerates to the no-perturb model. Notice that Final ML isn't much better than the no-perturb. Possibly when perturb scale exploded, the model could no longer improve, so constraining perturb scale to some small set of values may be a good idea (sigmoid transform).

Next Steps

Try smaller changes to initialization.
Try sigmoid transform on perturb scale.
Try training using standard deviations instead of variances.
Try changing one or more trained values to sensible defaults and retrain.
Work with standard deviations, not variances
Fix some 'known' values and optimize others.

Posted by Kyle Simek

Training Bugs → ←

KLS Research Blog ← Nothing to see here...

[Work Log]

Next Steps