It occurred to me that naively doing camera fitting causes strong correlations between camera pose and reconstruction position. Originally, I planned on alternating between optimizing the conditional curve density \(p(x_i | y_i, c_i)\) and the conditional camera density, \(p(c_i | y_i, x_i)\). The problem here is that the optimal curve \(x_i\) for the current camera will be very close to the evidence \(y_i\), so optimizing the camera will only move the curve slightly. We get into a situation where we move the camera slightly, which allows us to move the curve slightly, which allows us to move the curve slightly, etc. etc. This is analogous to the problem in Gibbs sampling with strongly correlated variables, and like there, the solution is to integrate out one of the correlated variables.
We should be optimizing \(p(c_i | Y)\) instead of \( p(c_i | x_i, Y)\). For now, we'll assume the prior over \(c_i\) is flat, so this reduces to optimizing the likelihood function \(p(Y | c_i) \). Let \(Y_- = y_{1:i-1}\) and let \(Y_+ = y_{i+1:n}\)
Below are the definitions of the terms above.
Here, \(\mu_0\) is the 3D prior mean, \(\pi_c(X)\) is the projection of 3D point \(X\), \(J_c\) is the Jacobian of \(\pi_c\) centered at \(\mu_i\), \(\Sigma_*\) is the posterior covariance of \((x_{i-1}, x_{i+1})\), \(\Sigma_y\) is the likelihood covariance, and \(K_*\) is the prior cross covariance between \(x_i\) and \((x_{i-1}, x_{i+1})\).
The integral above is a convolution that represents the sum of random variables. We represent this sum below, where \(\epsilon_M \sim \mathcal{N}(0, M) \).
Both the covariance and prior depend on the camera (because the camera determines \(\Sigma_y\)), but if we assume the covariance is nearly isotropic, maximizing the expression above is equivalent to minimizing the norm of the residuals, \(| Y - \pi_c(\mu_i) |\).
Recall that we never explicitly have an expression for the data Gaussian, so Y isn't known. We could derive this from the posterior and the prior, but a simple approximation is to just use the posterior mean, under the weak assumption that the likelihood is much more peaked than the prior. The optimization procedure is then:
In practice, isotropism isn't necessarily a good assumption, so we can transform the residuals by the square-root inverse covariance before evaluating the error. The square root and inverse operations will get expensive if performed at each iteration, so we can either only update it every Nth iteration, but my intuitions says the isotropic assumption should be good enough. It would be nice to have a better argument for this.
Posted by Kyle Simek ← Camera refinement