[Work Log] FIRE - cluster w/ missing data

May 13, 2014

Project	FIRE
Subproject	Piecewise Linear Clustering (tests)
Working path	projects/fire/trunk/src/piecewise_linear/test
SVN Revision	unknown (see text)

Unless otherwise noted, all filesystem paths are relative to the "Working path" named above.

Testing missing data in cluster model

Run #1 - enable missing data

Segfault resulting from empty cluster. Writing routine to create a cluster from worst point.

...

Still getting weird results. Clusters are collapsing constantly.

Even a single missing value screws up results. There must be a bug in my initial estimate script

...

BUG: true/false swap when determining whether to use missing-data-enabled line fitting

...

Several bugs related to computing epsilon. Fixed after several hours :-/

...

It seems we can continue to increase the missing percentage indefinitely, without the clustering suffering (or at least until an entire observation becomes missing, which isn't handled).

Likely the small amount of noise is helping us a lot here. We'll see how it works on real data.

Real FIRE data

High-level Tasks

merge radation data from Laura (into demograph dataset?)
for each subject, first chemo last chemo first rad last rad
write results in FIRE data format

Reading and merging radiation data:

construct out_db cols: subject_ID, had_radiation
if row has start and end date,
- if out_db already has start or end date, record error
- else record start and end date
assert all "radiation=yes" have start and end date
- can already see 57533 fails this test

Do same for chemo dates.

Merge chemo and rad.

compute "type"

...

All implemented in in_progress/process_treatment_dates.m.

Data consistency issues

some subjects surgeries occur after treatment
- 57527, 57563, 575139, 575145
Only 97 of 136 subjects have immune data
some subjects disagree about treatment type
- 57517 - no histo dates, but hist = 70
- 575146, 575156 - demo:rad = yes, but no dates in rad_supl

Posted by Kyle Simek

FIRE immunity plots → ← FIRE - streamlining; missing data

KLS Research Blog ← Nothing to see here...

[Work Log] FIRE - cluster w/ missing data

Run #1 - enable missing data

Real FIRE data

Data consistency issues