I’ve been trying to puzzle through how I would like to see covid case and injection data such that it can be used to try and understand the things I try to understand here. The problem of course is that the data we have available to us from US States’ health departments is always digested and aggregated in ways that are hard to make sense of.
My thought was to try to construct an anonymized way to present patient’s injection and case positive histories such that one can try to disentangle the biases buried in the numbers presented publicly. With that in hand, step 1 would then be to politely send a request for data presented in that format to each of the 50 US state health departments1, in the hope at least one might come up with something. Failing that, step 2 might be to go the FOIA route, though I have no idea how to tackle that yet. First we just try asking.
Step zero though is to come up with the form of data to ask for — I think I have a good start on that. The idea is to ask for an anonymized event history per person:
Format: either CSV or json (though we could make do with probably anything, employing varying quantities of expletives)
Patient ID (non-identifying but unique to the individual)
Birth Year
Sex
Sequence of events for each instance they occur for this Patient ID:
COVID Positive
Week of Year
Year
Test Type
Hospitalization
Week of Year
Year
Duration of Stay
Diagnosis Code
Fatality
Week of Year
Year
Cause/Diagnosis Code
COVID Vaccination Dose
Week of Year
Year
Type
VAERS ID if exists/known
I’d request that the data be published along with the other data they make public — department of health website or whatever. Also would hope it updated/maintained, but first getting anything would be a bonus.
Some notes why I’m asking for things this way:
Certainly this would need to be anonymized, so asking them to invent some unique way of identifying an individual so that they don’t have to worry about privacy
Anonymity is why I am limiting to week of the year — I might be trying to solve a problem before actually having it here though. Ideally having the actual date would be useful to perhaps identify “from covid” vs “with covid” questions — for things like hospital stays it may be down to the day there. I though have seen anonymized data where they’ve blurred the date to prevent using other records to try to ID the individual.
VAERS ID would be nice, but there might be an issue with whether the state has that information on its own. And it’ll send up a warning flare I’m sure. This might better be a follow up request if I get anything.
Was toying with some indication of “known not to be vaccinated2” flag there too, since I wouldn’t otherwise have any way to distinguish unknown status vs unvaccinated.
I’d like to have events from Jan 1, 2020 to present
Please add a comment if you see something missing or if a piece of this that could make an answer less likely or excessively delayed.
From my side though if I had data constructed like this, I believe I could then do some more proper analysis of the relationship between shots and cases. There would be time dependence, and ability to evaluate natural immunity and re-infection. All sorts of things to play with.
I don’t particularly expect much willingness to divulge this, but in principle I think the data should be in hand for any state health department.
Thoughts?
I’m actually very much motivated here by Joel Smalley’s open letters to Governors (FL, TX, TN, IN — YES keep going!!!)
Yes I’m using the “v” word here. I am here using it in the equivalently meaningless context it has been redefined as now.