[Work Log] FIRE - improving fitting

May 26, 2014
Project FIRE
Subproject Piecewise Linear Clustering
Working path projects/​fire/​trunk/​src/​piecewise_linear
SVN Revision unknown (see text)
Unless otherwise noted, all filesystem paths are relative to the "Working path" named above.

Run: Refactoring kmeans, fixed bug

Description: Refactored k-means to make replicates easier to run. Also fixed a bug in how collapsed clusters are handled. Results:


    10 itns
    training error: 3.61348
    test error: 4.60147

    15 itns
    training error: 3.60094
    test error: 4.58815

    20 itns
    training error: 3.60094
    test error: 4.58815


    10 itns
    training error: 3.5833
    test error: 4.57327

    15 itns
    training error: 3.58389
    test error: 4.57315

    20 itns
    training error: 3.58411
    test error: 4.57321


Neither of these results match what I was getting on Friday. Testing error in particular is worse. Did I break something?

Found it: error in evaluation code arising from bad copy/paste.

Another issue: should be using centered_data.txt, not data.txt.

Run: multiple repetitions

Description: Run kmeans with 10 repetitions.


trivial model error: 3.55057

10 itns
training error: 3.54682
test error: 4.51326

20 itns
training error: 3.53998
test error: 4.52958

Training and teesting error improve over the single-initialization version. Test error is slightly worse for 20-iteration run; possibly due to overfitting.

Run: New baseline - use centered data

Description: Perform 10 replicates of k-means using centered log-transformed data.

Trivial model error: 1.40502

10 itns
training error: 1.10196
test error: 1.11679

15 itns
training error: 1.09979
test error: 1.09903

20 itns
training error: 1.09756
test error: 1.09137

Run: compare against null model

Description: Do we do better or worse with a constant model? (slope zero, intercept zero) Results: see previous runs; "trivial model" results have been added.

It is interesting that the trivial model performs better on raw data than the cluster model. With rescaled data, the cluster model performs better.

Run: continuous model (aborted)

Description: Re-run using the continuous model. Details:

Will need to re-run all experiments.

Run: baseline (rerun)

Description: Re-run baseline fitting of centered data using discontinuous model. 10 Repetitions


Trivial model error: 1.58913
Single cluster error: 1.58365
training error: 1.61625
test error: 1.64005

Discussion: trivial model outperforms clustered model. This shouldn't be happening, need to investigate.

Posted by Kyle Simek
blog comments powered by Disqus