October 28, 2013

Re-ran with method 2. Still getting negative eigenvalues.

Possibly non-symmetry issue.

Change gears -- work on fast implementation, then resume debugging.

Goal: approximate \(\left(K + \Sigma_D \right)^{-1}\), using a low-rank approximation of K; where \(\Sigma_D\) is the likelihood covariance.

In our case \(\Sigma_D\) has infinite covariance, so this inverse doesn't exist. However, if we work with precision matrix \(\Lambda = \Sigma_D^{-1}\), and use the decomposition \( S' S = \Lambda \) we can replace this expression with the equivalent \[ S' \left( S K S' + I \right)^{-1} S \] Even though S is rank-deficient, the inverse in this expression does exist, thanks to the finite positive values being added to the diagonal.

We approximate the above expression as follows. Let \(K\) be an \(N\) by \(N\) matrix. We can take the eigendecomposition \(K = V D V'\) and approximate it with \(\tilde K = \tilde V \tilde D \tilde V'\), where \(\tilde V\) and \(\tilde D\) consist of the first \(n\) eigenvalues and eigenvectors of \(K\) respectively. We can use this low-rank approximation with the [Woodbury matrix identity](http://en.wikipedia.org/wiki/Woodbury_matrix_identity) to approximate the above inverse in \(O(n^3)\) time, rather than \(O(N^3)\). The Woodbury identity is \[ (A + U C V )^{-1} = A^{-1} - A^{-1} U (C^{-1} + V A^{-1} U)^{-1} V A^{-1} \] Setting \(A = I\), \(U = V' = S \tilde V \) and \(C = \tilde D \), we get: \[ \left( S K S' + I \right)^{-1} = I - S \tilde V (\tilde D^{-1} + \tilde V' S' S \tilde V)^{-1} \tilde V' S' \]

It remains to find \(\tilde V\) and \(\tilde D\) efficiently. Naive eigenvalue decomposition takes \(O(N^3)\) time, which isn't any better than direct inversion. Sections 4.3.2 and 8.1 of Williams and Rasmussen show how to approximate \(n\) eigenfunctions and eigenvalues from \(n\) data points in \(O(n^3)\) time. Eigenvectors \(\tilde V\) arise by evaluating the approximate eigenfunctions at the appropriate indices.

Substitutuing back into the original expression by surrounding with (\(S'\) and \(S\)), we get the final expression: \[ \begin{align} \approx & S' \left ( I - S \tilde V (\tilde D^{-1} + \tilde V' S' S \tilde V)^{-1} \tilde V' S' \right) S \\ =&\Lambda - \Lambda \tilde V (\tilde D^{-1} + \tilde V' \Lambda \tilde V)^{-1} \tilde V' \Lambda \end{align} \]

Implemented two different implementations of "nystrom solve" (`tools/nystrom_solve.m`

). Crude testing shows big speedup, but stalling on a big eigenvalue decomposition. Replacing with a call to "chol", waiting on visual results.

...

"chol" gives much faster results. (no need to symmetrize) But results are junk -- blank screen.

Oops, bug in call to chol; fixed.

still getting black. Posterior covariance is not positive definite; eigenvalues are negative.

When I force covariance to zero (to view the mean value), it's still in the ballpark but wonky (same as before).

**Clue**: When re-using (thinned) input covariance matrix and indices, result looks sensible. Clearly, we have a problem with how K_star is computed (and likelihood K_star_star, too).

Comparing different K_star's...

Looks like on-diagonal elements differ, off-diagonals are okay.

...

Got it! Wacv was trained using no-perturb model, but is being tested with OU-perturb.

TODO: edit get_all_Wacv and/or get_wacv_result to receive a model-type... done.

reducing test to block (1,1)

possible sources of disparity: indices, function

checked indices -- same.

check params -- model-type doesnt match.

First view now looks good in matlab.

Running all views in likelihood_server.

...

Means look great!. Now on to random perturbed.

...

Cholesky claims posterior covariance isn't positive definite.

Possibly a bug in construct_attachment_covariance_3?

Re-running with K_star_star from original covariance matrix. Still gives "not positive definite" errors.

Resorting to eigenvalue decomposition method.

Results look good! perturbation from mode is *TINY*, at least in the ground truthed WACV dataset. If this is true for hyptoheses in the full system, we can probably get away with simply taking the mean and saving significant computation of the covariacne matrix.

nystrom factor: 50

nystrom factor: 20

nystrom factor: 10

Nystrom factor apparently has a huge affect on reconstruction quality. Possibly better choice of thinned index set (compared to uniform sampling) will improve sensitivity, but a brief attempt at this (grouping xyz indices ) had net negative effect.

Needs more thought, possibly should resort to subset of data (SD) or subset of regressors (SR) method, or exploit markov structure.

- Nystrom method is implemented and provides huge speedups without obvious in rendered results.
- Spent many hours debugging issues with covariance matrix. Turned out to not be bug in theory or implementation of the math, but from an unexpected hard-coded value in wacvin wacv results.

Nystrom thinning: 10 ll2 spacing: 3 Unclear

- what is best thinning amount for Nystrom method?
- what is best spacing for output index set
- are perturbations always virtually nil?

- refactor construct_attachment_covariance_* into a single implementation.
- optimize
`curve_tree_ml_2.m`