[Work Log] FIRE - background reading, thinking and planning

March 04, 2014

Structure Learning

The fact that we have a 500-dimensional dynamical system offers opportunities to learn high-dimensional relationships. I spent some time looking into structure learning.
Variational Bayes

Zoubin Ghahramani slides on variational bayes

Variational EM could learn LDS with unknown structure or switching state space model.

Dirichlet Process

Video: Inferring structure from data, Tom Griffiths, Cognitive Science and Machine Learning Summer School, 2010

Maybe could be used to learn LDS structure. Griffiths shows a noisy-or model with unknown dimension, using a Dirichlet process prior. At its center is a rank-defficient matrix constructed from product of NxM and MxN matrices, where M < N. Could use something similar to learn sparse dynamics matrix, but we're dealing with continuous-valued data rather than binary-valued. If this can be done, there must be existing literature on it, will dig further.

Parameter Learning

This tech report from Ghahramani and Hinton derives the EM for learning LDS parameters and latent states jointly. For the M step, they derive expressions for the full transition matrix, plus the system and observation noise covariance, and initial state. For the E step, they use Kalman for forward estimation followed by a backward recursion. We're more interested in a sparse representation of dynamics.

Parameter Estimation for Linear Dynamical Systems[pdf]

MARRS

Multiple Auto-regressive State-Space models for Analyzing Time-series Data by Elizabeth E. Holmes, Eric J. Ward, Kellie Wills

This R package handles learning and inference in general state-space models, with unspecified or semi-specified model structure. It uses EM to infer the model and latent state jointly. On the plus side, it handles missing data, and the model is very flexible. On the down side, it doesn't handle i.i.d. data in an obvious way (although could handle a small number by repeating matrix blocks). Could probably implement some form of the TIES model, sans prior.

Other R Packages

Modeling

Kobus pointed out that the short time-frame means second-order dynamics likely won't emerge.

Matlab CSV File parsing

Matlab only handles CSV files with numeric-valued fields. We have dates and strings (and possibly others), so we need to write our own CSV parser.

Creating a .meta file for each .csv file, which describes column names and datatypes. Currently only three ways to parse a column: "numeric", "mm/dd/yy/ date", and "ignore". Possibly more in the future.

Wrote code for parsing meta files and csv files (all code is in projects/fire/trunk/src):

Misc notes about data

Coverage

The following applies only to parsable fields only

Coverage map is below (rows are fields, columns are records)

Data coverage

Datatypes statistics

How many fully covered fields have more than two values?

Next pass:

Should probably read the unreadable fields as text.

Posted by Kyle Simek
blog comments powered by Disqus