[Work Log] Reading: Semantic SLAM w/ GPLVM shape priors; FIRE reading

January 03, 2014

Read two papers. The first was David Sbarra's paper for the FIRE project, modeling sleep disturbances vs. depression symptoms. The second is a dense SLAM paper from CVPR 2013.


Decomposing Depression: On the Prospective and Reciprocal Dynamics of Mood and Sleep Disturbances

by David A. Sbarra and John J. B. Allen

David's paper, which will likely play a role in the FIRE project this Spring semester. Uses time-series data of ~100 people at 5 time points describing two variables: mood and sleep disruptions. A dynamical model (first order diff-eq) is developed to model the data, which captures drift, self-regulation, and inter-variable coupling.

Since the model is trained and evaluated on the same data, it isn't clear how predictive this model would be on held-out data. Chi-squared test is used to determine which parameters are meaningful; I'm not familiar enough with this method to comment on it, but the Bayesian literatures constant struggle with model selection suggests that the classical (i.e. frequentist) methods like this may need scruitiny.

Synthetic data is drawn from the resulting fit for 5 subjects, using observations for initial values. I couldn't compare it to the true data or maybe I misunderstood the plot -- perhaps David can elaborate on this.

A vector-field plot nicely illustrates the flow of the dynamical system.

Can 6 parameters really model this data well? Need to get familiar with the data. Perhaps a good candidate nonparametric modeling?


Dense Reconstruction Using 3D Ojbect Shape Priors

Uses GP-LVM-based shape priors (cars) to improve existing dense SLAM implementation. Manages to heavilly leverage parallelism in CPU and GPU to achieve (arguably) real-time performance. Use Feltzenswalb car detector to find consistent detections in two views and perform rough pose estimation. Uses color-based (3D histogram) and depth-based energy functions to refine pose and shape; authors derive the gradient of these energy functions w.r.t. rigid transformations and GP-LVM latent variables. Tractibility of GP-LVM addressed by taking lowest N frequencies from a DCT; reduced 128x128x128 to 25x25x25.

How is the GP-LVM trained? Where is the training data coming from?

Related papers:

The implicit level-set representation and the merging of depth images are the same as those from KinectFusion. Also, KinectFusion's stereo results are used to evaluate their monocular results.

Spiritual predecessor of this paper -- use detectors to improve structure from motion.

Posted by Kyle Simek
blog comments powered by Disqus