Description: Refactored k-means to make replicates easier to run. Also fixed a bug in how collapsed clusters are handled. Results:
BEFORE REPOPULATION FIX
10 itns
----------
training error: 3.61348
test error: 4.60147
15 itns
--------
training error: 3.60094
test error: 4.58815
20 itns
----------
training error: 3.60094
test error: 4.58815
AFTER REPOPULATION FIX
10 itns
---------
training error: 3.5833
test error: 4.57327
15 itns
---------
training error: 3.58389
test error: 4.57315
20 itns
---------
training error: 3.58411
test error: 4.57321
Discussion:
Neither of these results match what I was getting on Friday. Testing error in particular is worse. Did I break something?
Found it: error in evaluation code arising from bad copy/paste.
Another issue: should be using centered_data.txt, not data.txt.
Description: Run kmeans with 10 repetitions.
Issues:
Results:
trivial model error: 3.55057
10 itns
-------
training error: 3.54682
test error: 4.51326
20 itns
------------
training error: 3.53998
test error: 4.52958
Training and teesting error improve over the single-initialization version. Test error is slightly worse for 20-iteration run; possibly due to overfitting.
Description: Perform 10 replicates of k-means using centered log-transformed data.
Trivial model error: 1.40502
10 itns
-----------
training error: 1.10196
test error: 1.11679
15 itns
----------
training error: 1.09979
test error: 1.09903
20 itns
--------------
training error: 1.09756
test error: 1.09137
Description: Do we do better or worse with a constant model? (slope zero, intercept zero) Results: see previous runs; "trivial model" results have been added.
Discussion:
It is interesting that the trivial model performs better on raw data than the cluster model. With rescaled data, the cluster model performs better.
Description: Re-run using the continuous model. Details:
Will need to re-run all experiments.
Description: Re-run baseline fitting of centered data using discontinuous model. 10 Repetitions
Results:
Trivial model error: 1.58913
Single cluster error: 1.58365
training error: 1.61625
test error: 1.64005
Discussion: trivial model outperforms clustered model. This shouldn't be happening, need to investigate.
Posted by Kyle Simek