[Work Log] Implementing, testing Hessian

November 29, 2013

Spent most of the week implementing and testing the Hessian, now done.

The biggest issue was correcting the math error mentioned in the previous post. Developed a general equation for the second derivitive of the kernel matrix, K, and updated all formulas in the writeup to incorporate it.

Finished hessian test program, test/test_ml_derivs.h, which compares the analytical solution to one obtained by finite differences. The error that arises in the Hessian during finite differences was shockingly large and unstable -- lowering the delta below 1e-4 caused very large errors (> 1%) in finite differences. This was a significant source of confusion and frustration during testing. In order to track down the source of errors, performed numerical and analytical first and second derivatives of all intermediate values, and tracked growth of error through the equations. In the end, error appeared to grow slowly through each computation, with the largest error arising at the beginning, in K'.

It also took a long time to find a reasonable way to compare the analytical and numerical gradients. Absolute differences were deceiving, because small absolute diferences can actually be large in terms of percentages. On the other hand, when the true gradient is near zero, the percent error skyrocketed. Ultimately followed the lead of Rasmussen's checkgrad.m, which divides the determinant of the difference by the determinant of the sum. Was able to confirm results to an error of 1e-3.

Implemented Hessian in curve_ml_gradient_3.m, refactoring existing computation to share as much as possible with the hessian. New function runs 4x slower than the gradient alone in a 1500-dimensional problem, which isn't bad considering the numerical Hessian would take 15002 = 2.25 million times more running time!

Still need to test against a few random elements of the numerical Hessian.

Next step is to run fminunc with the hessian to optimize indices after calling corr_to_likelihood. If that fixes our problems, we should roll it into corr_to_likelihood and which could simplify the code significantly.

Posted by Kyle Simek
blog comments powered by Disqus