[Work Log] FIRE - data prep, analyzing clustering

May 20, 2014
Project FIRE
Subproject Piecewise Linear Clustering
Working path projects/​fire/​trunk/​src/​piecewise_linear
SVN Revision unknown (see text)
Unless otherwise noted, all filesystem paths are relative to the "Working path" named above.

Appending radiation data to full data table

Ran into some issues needing refactoring when trying to add radiation date columns to fire_all.csv. The work was slow-going, because the parsing code is so incredibly slow (2-3 minutes to read and write all the data). I managed to speed it up slightly by skipping the missing data check when no missing keys are specified.

Some of the bugs fixed:

Finally (!) merged and committed radiation columns into fire_all.csv. Committed and wrote a summary for Warren.

Analyzing cluster membership

Description: Are the membership values output by our clustering code related to treatment type?
Method: Modify preprocessing code to output a matlab struct instead of a text file. Modify clustering code to output memberships as a text file instead of a color image. Visualize both cluster membership and ground truth treatment type and compare.

Results:

Treatmentp(cluster == 1 | treatment)
Overall74.7%
None95.5%
Radiation only44.0%
Chemotherapy only73.9%
Both85.2%

So, cluster 3 strongly corresponds to "some treatment" and most likely implies radiation only.

Next steps

Posted by Kyle Simek
blog comments powered by Disqus