Ran into some issues needing refactoring when trying to add radiation date columns to fire_all.csv. The work was slow-going, because the parsing code is so incredibly slow (2-3 minutes to read and write all the data). I managed to speed it up slightly by skipping the missing data check when no missing keys are specified.
Some of the bugs fixed:
Finally (!) merged and committed radiation columns into fire_all.csv. Committed and wrote a summary for Warren.
Description: Are the membership values output by our clustering code related to treatment type?
Method: Modify preprocessing code to output a matlab struct instead of a text file. Modify clustering code to output memberships as a text file instead of a color image. Visualize both cluster membership and ground truth treatment type and compare.
Results:
Treatment | p(cluster == 1 | treatment) |
---|---|
Overall | 74.7% |
None | 95.5% |
Radiation only | 44.0% |
Chemotherapy only | 73.9% |
Both | 85.2% |
So, cluster 3 strongly corresponds to "some treatment" and most likely implies radiation only.