August 21, 2013

Continuing from yesterday, let's convert our conditional gaussian distribution to a gaussian function over \(x_c\) and \(\mu_b\).

Let \(II\) represent a stack of identity matrices:

\[
II = \left( \begin{array}{c} I\\I\\...\\I \end{array}\right)
\]
The exponential expression for the conditional Gaussian distribution is
\[
-\frac{1}{2} (x_C - II \mu_b)^\top \left( \Sigma_C + II \Sigma_b II^\top \right )^{-1}(x_C - II \mu_b)
\]
Let's convert this to a linear function over \((x_C, \mu_b)\), instead of just \(x_C\):
\[
-\frac{1}{2} \left(\begin{array}{c}x_C \\ \mu_b \end{array} \right )^\top \left ( \begin{array}{cc}I & -II\end{array}\right )^\top \left( \Sigma_C + II \Sigma_b II^\top \right )^{-1}\left ( \begin{array}{cc}I & -II\end{array}\right )\left(\begin{array}{c}x_C \\ \mu_b \end{array}\right )
\]

Recall that \(\mu_b\) is a linear function of the data-points in the Markov-blanket, \(y_\mathcal{MB}\) (reproduced from yesterday's post)

\[
\begin{align}
\mu_b &= K_*^\top \left(K_{\mathcal{MB}} + S^{-1}\right)^{-1} y_{MB} \\
&= K_*^\top K^{-1}_{y_\mathcal{MB}} y_\mathcal{MB} \\
\end{align}
\]
Rewriting the expression as a function of \((x_C, y_\mathcal{MB})\):
\[
\begin{align}
&=
-\frac{1}{2} \left(\begin{array}{c}x_C \\ K_* K^{-1}_{y_\mathcal{MB}} y_\mathcal{MB} \end{array} \right )^\top
\left ( \begin{array}{cc}I & -II\end{array}\right )^\top
\left( \Sigma_C + II \Sigma_b II^\top \right )^{-1}
\left ( \begin{array}{cc}I & -II\end{array}\right )
\left(\begin{array}{c}x_C \\ K_* K^{-1}_{y_\mathcal{MB}} y_\mathcal{MB} \end{array}\right )
\\
&=
-\frac{1}{2} \left(\begin{array}{c}x_C \\ y_\mathcal{MB} \end{array} \right )^\top
\left ( \begin{array}{cc}I & -II K_* K^{-1}_{y_\mathcal{MB}} \end{array}\right )^\top
\left( \Sigma_C + II \Sigma_b II^\top \right )^{-1}
\left ( \begin{array}{cc}I & -II K_* K^{-1}_{y_\mathcal{MB}}\end{array}\right )
\left(\begin{array}{c}x_C \\ y_\mathcal{MB} \end{array}\right )
\end{align}
\]
And expanding \(\Sigma_b\):
\[
\begin{align}
\Sigma_b &= K_b - K_*^\top \left(K_{\mathcal{MB}} + S^{-1}\right)^{-1} K_* \\
&= K_b - K_*^\top K^{-1}_{y_\mathcal{MB}} K_*
\end{align}
\]
The final expression is:
\[
\begin{align}
&=
-\frac{1}{2} \left(\begin{array}{c}x_C \\ y_\mathcal{MB} \end{array} \right )^\top
\left ( \begin{array}{cc}I & -II K_* K^{-1}_{y_\mathcal{MB}} \end{array}\right )^\top
\left( \Sigma_C + II \left(K_b - K_*^\top K^{-1}_{y_\mathcal{MB}} K_* \right) II^\top \right )^{-1}
\left ( \begin{array}{cc}I & -II K_* K^{-1}_{y_\mathcal{MB}}\end{array}\right )
\left(\begin{array}{c}x_C \\ y_\mathcal{MB} \end{array}\right ) \\
&=
-\frac{1}{2} \left(\begin{array}{c}x_C \\ y_\mathcal{MB} \end{array} \right )^\top
\Lambda_{C \mid \mathcal{MB}}
\left(\begin{array}{c}x_C \\ y_\mathcal{MB} \end{array}\right )
\end{align}
\]
the normalization constant is:
\[
Z =
(2 \pi)^\frac{k}{2} \left | \Sigma_C + II \left(K_b - K_*^\top K^{-1}_{y_\mathcal{MB}} K_* \right) II^\top \right |^\frac{1}{2}
\]

See `exp_2013_08_21_clique_tree_test.m`

- sample N points for curve 1, \(C_1\)
- sample N points for curve 2, \(C_2\)
- offset curve 1: \(C_2 = C_2 + C_1(:,5)\)
- add noise to \(C_1\) and \(C_2\) to get data \(y_1\),\(y_2\)
- add noise to \(C_1\) and \(C_2\) to get data \(y_1\), \(y_2\).
- Construct full prior, \(\Sigma\) (see below for definition).
- Evaluate ML directly from \(p(y_1, y_2) = \mathcal{N}(0, (\Sigma + \sigm_n I))\).
- Construct ML decomposed: \(p(y_2 | y_1) p(y_1) \)
- (didn't implement) Construct clique tree using
- Node 1: \((0, \Sigma_1)\)
- Node 2: \((0, inv(\Lambda_{C\mid \mathcal{MB}}))\)
- multiply by corresponding noise nodes.

- (didn't implement) Marginalize clique tree. compare against result in 7.

I originally thought the result from 8 was only an approximation (mainly because I hadn't originally written it out that way). In fact, it's an exact computation, but it isn't very useful in this form, because deep trees still exhibit linear growth of the condition-set, meaning cubic growth in running time. In practice, we can replace \(y_1\) to the data markov-blanket, \(y_{1,\mathcal{MB}}\). The data markov-blanket is naturally larger than the prior m.b., which is a single point, but if the noise is low relative to the point-spacing, the data m.b. should still be relatively small. Thus, we can approximate the ML with a decomposed form that avoids cubic growth.

The alternative is to use the clique tree from step 9., but the former is simpler to implement, because we don't have to use the crazy linear form we developed above, and we don't have to do any message-passing. We just need \(\Sigma_b\) and \(\mu_b\).

Since this operation won't be run during production, we can just implement it the naive way.

But if we wanted a fast version, we could to a forward-pass approximation, i.e. find the max for the root curve, and then pass that as data to the child curves.

A full forward-backward algorithm probably wouldn't be too hard, but probably not worth the trouble at the moment.

- apply attacment to trackset
- guess branch spacing
- construct independent cliques, given branch spacing.
- Guess branch point
- construct markov-blanket, \(\mathcal{MB}\).
- construct \(sigma_b\) and \(\mu_b\) from parent \(\mathcal{MB}\).

`attachment/attach.m`

`attachment/att_set_start_index.m`

`attachment/att_set_branch_index.m`

- Next
- pre-flatten and pre_sort ll_*.
- remove all calls to flatten_inputs

- precompute likelihood covariance matrix.
- remove logic from
`curve_ml5.m`

- remove logic from
- compute full branching ML using cached values.
- identify connected components
- topological sort
- root-to-leaf evaluation over each CC.

- pre-flatten and pre_sort ll_*.