To implement efficient camera refinement, I need to derive the Jacobian of the residial vector w.r.t. camera parameters.
The parameterization of camera orientation deserves special discussion. We prefer a parameterization that is free of constraints, so quaternions and rotation matrices aren't an option, leaving euler angles or an axis/angle vector. Both parameterizations have singularities, but we can avoid them by centering the parameterization at the camera's current orientation. If we don't expect to drift too far from the current orientation, this should be okay; otherwise we can reparameterize after every step. We follow Hartley and Zisserman's approach and use axis-angle parameterization. For all other parameters, we will use no transformation.
Let the vector \(\mathbf{t}_r\) represent a rotation of angle \(\|\mathbf{r}_r\|\) around axis \(\hat{\mathbf{t}}_r\), and let \(R_{\mathbf{t}_r}\) be the corresponding rotation matrix. Let \(K\) be the intrinsic matrix, and let \(\mathbf{t}_0\) be the translation vector.
The transformation of a point from from world coordinates to homogeneous image coordinates is using the camera \(P\)
We seek the Jacobian of this transformation centered on the current camera, \(P\):
When centered at the current camera, the rotation vector is zero, \(\mathbf{t}_r = (0,0,0)\). For small \(\mathbf{t}_r\), the rotation matrix is approximated by \( R_{\mathbf{t}_r} = I + [\mathbf{t}_r]_\times \). The Jacobian of rotation \( R_{\mathbf{t}_r} \mathbf{X}\) is then \( -[\mathbf{X}]_\times \), and the jacobian of \(x_h\) is
where \(\mathbf{X}_c = R (\mathbf{X} - \mathbf{t}_0)\) is the point in camera coordinates.
The other derivatives are straightforward to derive.
The Jacobian w.r.t. translation is:
The Jacobian w.r.t. intrinsic parameters is
We've derived the jacobian of the transformation from world to homogeneous image coordinates w.r.t. each camera parameter. To get the Jacobian of the residuals, it remains to transform to nonhomogeneous screen coordinates.
The Jacobian of this is
where \(x_{c,3}\) is the point's depth in camera coordinates, and \((x_1, x_2)\) is the point in nonhomogeneous image coordinates.
The Jacobian of the residuals w.r.t. camera parameters is then
In what follows, we'll drop the \(J_\mathbf{x}\) and use \(J_K\) to refer to the Jacobian of the residuals (and likewise for \(J_{\mathbf{t}_r}\) and \(J_{\mathbf{t}_0}\)). In other words, let \(J_K \leftarrow J_\mathbf{x} J_K\).
We now derive the Jacobian of all residuals in all views, where cameras share the same intrinsic parameters.
Let \(J_{K_{ij}}\) be the Jacobian of residuals the \(j\)th point in view \(i\) w.r.t. intrinsic parameters, and let \(J_\mathbf{R_{i,j}}\) be the same w.r.t. rotation. Let \(J_{\mathbf{t}_i}\) be the Jacobian w.r.t. translation in view \(i\).
The full Jacobian is a block matrix with form: