January 03, 2014

Read two papers. The first was David Sbarra's paper for the FIRE project, modeling sleep disturbances vs. depression symptoms. The second is a dense SLAM paper from CVPR 2013.

Decomposing

by David A. Sbarra and John J. B. Allen

David's paper, which will likely play a role in the FIRE project this Spring semester. Uses time-series data of ~100 people at 5 time points describing two variables: mood and sleep disruptions. A dynamical model (first order diff-eq) is developed to model the data, which captures drift, self-regulation, and inter-variable coupling.

Since the model is trained and evaluated on the same data, it isn't clear how predictive this model would be on held-out data. Chi-squared test is used to determine which parameters are meaningful; I'm not familiar enough with this method to comment on it, but the Bayesian literatures constant struggle with model selection suggests that the classical (i.e. frequentist) methods like this may need scruitiny.

Synthetic data is drawn from the resulting fit for 5 subjects, using observations for initial values. I couldn't compare it to the true data or maybe I misunderstood the plot -- perhaps David can elaborate on this.

A vector-field plot nicely illustrates the flow of the dynamical system.

Can 6 parameters really model this data well? Need to get familiar with the data. Perhaps a good candidate nonparametric modeling?

Ideas:

- Make Bayesian by using LDS similar to Jinyan's work.
- Model the direction at each grid-point independently, using GP to enforce smoothness and avoid overfitting.
- Consider higher-order diff-eq.
- Use a GP-LVM model to handle non-linear dynamics.

Uses GP-LVM-based shape priors (cars) to improve existing dense SLAM implementation. Manages to heavilly leverage parallelism in CPU and GPU to achieve (arguably) real-time performance. Use Feltzenswalb car detector to find consistent detections in two views and perform rough pose estimation. Uses color-based (3D histogram) and depth-based energy functions to refine pose and shape; authors derive the gradient of these energy functions w.r.t. rigid transformations and GP-LVM latent variables. Tractibility of GP-LVM addressed by taking lowest N frequencies from a DCT; reduced 128x128x128 to 25x25x25.

How is the GP-LVM trained? Where is the training data coming from?

Related papers:

- Newcombe, A. J. Davison, S. Izadi, P. Kohli, O. Hilliges, J. Shotton, D. Molyneaux, S. Hodges, D. Kim, and A. Fitzgibbon, “KinectFusion: Real-time dense surface mapping and tracking,” Mixed and Augmented Reality (ISMAR), 2011 10th IEEE International Symposium on, pp. 127–136, 2011.

The implicit level-set representation and the merging of depth images are the same as those from KinectFusion. Also, KinectFusion's stereo results are used to evaluate their monocular results.

- Bao and S. Savarese, “Semantic structure from motion,” presented at the Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 2011, pp. 2025–2032.

Spiritual predecessor of this paper -- use detectors to improve structure from motion.

Posted by Kyle Simek