December 10, 2013

Continuing along the inspiration from the yesterday's Pascal Fua paper investigating length constraint terms for index optimization.

It's difficult, because length implies known structure, which in turn implies known indices. So constraining indices based on length feels like a circular argument. This has been a sticking point in my thinking for a long time. But we can formulate this in terms of expectations, namely, given an index set, the *expected segment length* should be equal to difference of their indices.

Let \( L_i = |x_i - x_{i-1}| \) be the distance between points with adjacent indices.

Expected length given and index set, \(t\) is:

\[
\begin{align}
\mathrm{E}[L_i^2] &= \mathrm{E}[\| x_i - x_{i-1}\|^2] \\
&= \mathrm{E}[x_i^2 - 2 x_i x_{i-1} + x_{i-1}^2 ] \\
&= \mathrm{E}[x_i^2] - 2 \mathrm{E}[x_i x_{i-1}] + \mathrm{E}[x_{i-1}^2 ] \\
&= \left( \mathrm{Cov}[x_i] + \left( \mathrm{E}[x_i] \right) \right) - 2 \left( \mathrm{Cov}[x_i, x_{i-1}] + \mathrm{E}[x_i]\mathrm{E}[x_{i-1}] \right) + \left( \mathrm{Cov}[x_{i-1}] + \left( \mathrm{E}[x_{i-1}] \right) \right)\\
&= \left(k(x_i,x_i) + k(x_{i-1}, x_{i-1} - 2 k(x_i, x_{i-1}) \right) + \left( \mu_i^2 + \mu_{i-1}^2 - 2 \mu_i \mu_{i-1} \right)\\
&= D K_i D^\top + (D \mu_{i,i-1})^\top D \mu_{i,i-1}
\end{align}
\]

Where D is the differencing matrix, \(D = (1, -1)\).

Observe that each dimension is independent in the squared distance equation, so we can treat each dimension separately in the equation above.

For now, we'll aproximate by ignoring the first term, since nearby indices should be nearly 100% corellated. So all that remains is the squared difference of the reconstructed points.

We'll consider two different energy functions. The first is exactly what we'd like to minimize, but it needs an approximation.

\[
E_i = \left( \mathrm{E}[L_i] - \Delta_{t_i} \right)^2
\approx \left( \sqrt{\mathrm{E}[L_i^2]} - \Delta_{t_i} \right)^2
\]

We can't compute \(\mathrm{E}[L_i] \) directly, so we approximate it by taking the square root of the expected square length. The square root function is concave, so Jensen's inequality tells us that this approximation will never under-estimate the expected length, and since our main objective is to prevent index shrinkage, overestimating is preferred to underestimating. The result is \(D \mu \), the adjacent differences of the posterior mean.

TODO: writeup derivation for gradient for \(E\) and \(\mu\)

Implemented analytical Jacobian for \(\mu\) in `posterior_mu_gradient.m`

. Some error in results, according to `test/test_mu_deriv.m`

, but passes the inspection test. Overall, diagonal term looks okay, so error is probably in derivation of dZ. Particularly damning is that the quality metric isn't sensitive to the delta step size.

Should probably test Jacobian dZ. It's undergone some changes today.

...

Yep, subtle but significant error. Inspection suggests `term2`

is the culprit. Can we focus on the individual faulty elements and check the partial derivatives?