[Work Log] Fire: mergin immunity data

March 31, 2014

Having written code for guessing visit-number from dates in the Immunity dataset, I ran it and found several records with significant error. Spent a long time inspecting each one in detail, and found lots of data-entry errors (wrong year, wrong month, transposed digits, miskeyed subject id's, etc). In one case, the visit number was off-by-one in the self-report data, which was only apparent when the immunity data was introduced and an extra "halways-between" visit was present. Was able to correct about 30 of these errors confidently; in the end only about 10 out of 700 records remained error greater than 10 days.

Wrote code to merge new datasets into the full database; used it to merge newly-cleaned immunity data into the full database.

Next step: write export-to-CSV routine

Posted by Kyle Simek
blog comments powered by Disqus