October 29, 2013

Realization: nystrom may be tricky, because the same thinning factor probably won't be appropriate for all scenarios (e.g. short curves).

Possible approach: always take first and last inidices of a curve; thin longer curves more aggressively.

Better idea: use markov structure.

Main ideas: Use inheritence sampling to sample fully tree. Two rules (1) Parent curve posterior should only depend on it's observed points and a small number of points on its children. (2) Child curves depend also on the sampled parent points.

**Parent points**

The main question here is how many and which observed points of child-curves should be included to compute posterior? Heuristic approach: start with child points nearest to branch-point and work outward until information gain is negligible.

Information gain is the change in entropy of the branch-point posterior, conditioned on the active set. Entropy is proportional to the determinant of the covariance matrix.

Let bp be the branch point, and I be the active set.

\[
Entropy(bp | I) \propto K(bp, bp) - K(bp, np) [K(np, np) + \Sigma_y(np)]^{-1} K(np, bp)
\]

Where Sigma_y is the observation covariance. We may choose to use the prior conditional entropy, which is a simpler stand-in for the posterior conditional entropy.

\[
Entropy(bp | I) \propto K(bp, bp) - K(bp, np) [K(np, np)]^{-1} K(np, bp)
\]

Note that the entropy is now only a function of the indices of the observations, not the observations themselves. Thus, each observation only contribues one dimension to the matrix inversion, instead of three, making the computation 3^{3} times faster. This is only sensible if the observation variance is relatively uniform between observations, which isn't necessarilly true, but it may be servicable heuristic if thresholds are set appropriately high.

Opportunities exist to do rank-one updates to inverse, but probably not necessary, as the matrices should be small.

**Child curves**

Note that child-curves can also be parent-curves, so the previous approach for determining the active set should be followed.

We need to estimate the conditional prior probability for the curve, conditioned on its sampled parent. If the sampled parent contains the branch point, bp, the conditional prior covariance is already available in the track's pre-computed prior_K field, and the mean is bp. In general, bp isn't sampled, so we must use the nearby points on the parent to find the conditional prior of bp first and sample it explicitly.

Let bp be a branch point, cp be the child point, and nbp be the "neighbors" of bp on the parent curve, i.e. the markov blanket of bp on the parent.

Given bp, the child points are distributed by

\[
cp \mid bp \sim \mathcal{N}(bp, K_{prior})
\]

The branch point is distributed by

\[
bp \mid pnp \sim \mathcal{N}(\mu, \Sigma)
\]

where

\[
\mu = k(bp, nbp) k(nbp, nbp)^{-1} nbp \\
\Sigma = k(bp, bp) - k(bp, pnp) K(pnp, pnp)^{-1} K(pnp, bp)
\]

The a neighborhood size of four points around bp should be sufficient, since this model is piecewise cubic, but it may be worthwhile to experiment with more, since we aren't in bottleneck territory.

Now that we have conditional priors, we can construct the conditional posterior using the standard formula.

Ran into a snag when trying to incorporate the noise-free conditional values that arise during ancestral sampling. Derived an elegant solution; see the writeup here.

For now, using a constant number of points on parent, instead of using the entropy method above. Could add later; basic approach: precompute all covariances and add/remove one at a time.

...

Seems to be working, but slow as hell (slower than nystrom method). profiling

...

Construct_attachment_covariance_3 is called 486 times!

- Construct non view-specific matrices outside of view loop.
- nystrom for large curves?
- do all covariances at once? saves the computation of sibling and parent objecst?

implemented a first attempt at speedup; producing faulty results.

- Finish optimized ancestral sampling

Optimizing posterior-sampling for pixel likelihood →
← Mixing Noisy and Noise-free values in GP Posterior