The fact that we have a 500-dimensional dynamical system offers opportunities to learn high-dimensional relationships. I spent some time looking into structure learning.
Variational Bayes
Zoubin Ghahramani slides on variational bayes
Variational EM could learn LDS with unknown structure or switching state space model.
Dirichlet Process
Video: Inferring structure from data, Tom Griffiths, Cognitive Science and Machine Learning Summer School, 2010
Maybe could be used to learn LDS structure. Griffiths shows a noisy-or model with unknown dimension, using a Dirichlet process prior. At its center is a rank-defficient matrix constructed from product of NxM and MxN matrices, where M < N. Could use something similar to learn sparse dynamics matrix, but we're dealing with continuous-valued data rather than binary-valued. If this can be done, there must be existing literature on it, will dig further.
Parameter Learning
This tech report from Ghahramani and Hinton derives the EM for learning LDS parameters and latent states jointly. For the M step, they derive expressions for the full transition matrix, plus the system and observation noise covariance, and initial state. For the E step, they use Kalman for forward estimation followed by a backward recursion. We're more interested in a sparse representation of dynamics.
Parameter Estimation for Linear Dynamical Systems[pdf]
MARRS
Multiple Auto-regressive State-Space models for Analyzing Time-series Data by Elizabeth E. Holmes, Eric J. Ward, Kellie Wills
This R package handles learning and inference in general state-space models, with unspecified or semi-specified model structure. It uses EM to infer the model and latent state jointly. On the plus side, it handles missing data, and the model is very flexible. On the down side, it doesn't handle i.i.d. data in an obvious way (although could handle a small number by repeating matrix blocks). Could probably implement some form of the TIES model, sans prior.
Other R Packages
Kobus pointed out that the short time-frame means second-order dynamics likely won't emerge.
Matlab only handles CSV files with numeric-valued fields. We have dates and strings (and possibly others), so we need to write our own CSV parser.
Creating a .meta file for each .csv file, which describes column names and datatypes. Currently only three ways to parse a column: "numeric", "mm/dd/yy/ date", and "ignore". Possibly more in the future.
Wrote code for parsing meta files and csv files (all code is in projects/fire/trunk/src):
io/fire_read_meta.m
- Reads a .meta file into a 1xN structure array.io/fire_read_csv.m
- Reads a .csv file, using the corresponding .meta file to determine names and datatypes.The following applies only to parsable fields only
Coverage map is below (rows are fields, columns are records)
How many fully covered fields have more than two values?
Next pass:
Should probably read the unreadable fields as text.
Posted by Kyle Simek