September 03, 2013

Bug: when attaching, was getting incorrect results for estimated location of attachment. Our copmutation used a subset of data points on the parent curve nearby the branch point to estimate the distribution of the branch point position. We were simply taking a submatrix of the full (decomposed) precision matrix, which wasn't correct, due to the way the full decomposed matrix was constructed.

When using subset of variables, need to be able to get submatrix of noise precision and prior.

Let \((s' s) = S\), and let \(P\) be a matrix that selects rows of \(S\). if you want to decompose the submatrix \(P S P'\), in general, you cannot simply take the submatrix \(P s P'\). There are at-least two special case where this is valid:

- \(s\) is a Cholesky decomposition of \(S\). But since \(S\) is degenerate in our case, we can't use Cholesky.
- \(S\) is block-diagonal, and \(P\) doesn't split-up the blocks.

I thought that condition 2 was being satisfied, because I foolishly assumed the eigenvector matrix produced by eigenvalue decomposition was preserving block structure. I had to modify `correspondence/flatten_sort_and_reverse.m`

to perform eigenvalue decomposition on each block individually. I had previously determined that operating on groups of five blocks at once is faster than operating individually, but we'll have to sacrifice this speed-up for now.

Okay, fixed the precision problem. After attaching stem 2 to stem 1, Corrs{2}.mu_b now looks reasonable.

Still getting really low ML's...

Bingo! Bug in `curve_tree_ml_2.m`

that was introduced when we started allowing non-zero means when evaluating our conditional normal distribution. Recall the equation for marginal likelihood (reproduced from this post):

\[
\begin{align}
p(y) &= \frac{Z_{l,3d} }{Z_l}\frac{1}{Z} \exp\{x^\top S(S + S K S)^{-1} S x \} \\
&= \frac{Z_{l,3d} }{Z_l}\frac{1}{Z} \exp\{x^\top s^\top(I + s K s^\top)^{-1} s x \} \\
&= \frac{Z_{l,3d} }{Z_l} \mathcal{N}\left (sx ; 0, (I + s K s^\top) \right )
\end{align}
\]

Aside: in the second line, we've substituded the decomposition, \(s^\top s = S\), which we use in practice.

However, this equation assume zero mean. If we include a nonzero mean, the \(s\) matrix distributes to both \(y\) and \(\mu\):

\[
\begin{align}
p(y) &= \frac{Z_{l,3d} }{Z_l}\frac{1}{Z} \exp\{(x-\mu)^\top s^\top(I + s K s^\top)^{-1} s (x - \mu) \} \\
&= \frac{Z_{l,3d} }{Z_l} \mathcal{N}\left (sx ; s \mu, (I + s K s^\top) \right )
\end{align}
\]

In our implementation, we weren't multiplying \(\mu\) by \(s\). Fixing this seems to fix the low ML issue:

```
Corrs = attach(Corrs, 2,1,12, -12, params);
curve_tree_ml_2(Corrs, params, data_)
ans =
2.0903e+04
Corrs_ind = attach(Corrs, 2,0,0,0, params);
curve_tree_ml_2(Corrs_ind, params, data_)
ans =
2.0804e+04
```

In the case above, ML is roughly maximized when branch index is -12 and start index is 12, which seems to agree with the data, visually.

Next: * get attachment, reversal, and branch points from ground truth * store ml_2d with corr. update on merge. use during ml computation instead of data_ * branching in training ML * traing with branching * branch-wise reconstruction.

Posted by Kyle Simek