This page is to show some of the methodology and reasoning behind how the other Covid pages are composed.
Most reporting that is seen on the web or on TV is showing
positive tests. Unfortunately, tests are probably the worst metric
to use based on the variability of testing (when even done at
all). Deaths while not perfect, seem to be a better indicator.
Deaths, however, are a lagging indicator as the person is probably
14 to 21 days behind when a positive test would have shown up, if
testing were in fact done and evaluated immediately. The RT-PCR
tests are taking up to 8 days to be evaluated. How are they then
reported, when the positive is confirmed or when the test was
Deaths have their own issues as well, as the New York Times did some good reporting on. Early on, some deaths were not reported as Covid when in fact they were. Deaths not in the hospitals continue to have what can be called "false negatives". Supposedly NY is now (as of April 9) requiring deaths outside the hospitals to be reported as Covid if that is what was suspected. If this is true, NY should see a bump, similar to when China changed their reporting criteria on positive tests at the beginning of the pandemic.
The data has been compiled by either the New York Times (for US)
or Johns Hopkins (US and all countries) and is stored in github,
and can be pulled out as a "csv" file which then can be parsed in
a linux environment. Originally I started with just the
Johns Hopkins, but for the US I have moved to NYT. This is because
at least according to their logs, the NYT will update prior dates
with corrections if needed. The Johns Hopkins data for states is a
new data set each day just for that day, so an error which is
found for say 3 days ago won't get updated other than the current
day may have a more accurate cumulative total. The JH data for
countries is different, each country has its own row, and the last
entry in the row is the current cumulative data. It may be they do
correct errors in that.
That being said, both datasets disagree with each other and in
the case of Colorado, disagree with the state's "official" web
In addition, the Colorado website updates without a log file explaining the updates. Compare the two Colorado columns. There was a major update on April 10 for data through April 9.
This picture below shows a comparison (cells highlighted when
there is a delta from the previous day):
There are two types of visualization that are used on these
pages. The original that was used is just reporting total number
of deaths vs date, with the y axis being a log scale. Eventually
when the number of deaths stops increasing, then the slope on the
curve will go to zero (a horizontal line). As the slope starts to
approach zero, this type of chart becomes less useful.
A better chart then is the daily death rate. This is what the second type of charts show. For those, the death rate is per million inhabitants, with the population taken from google. In Looking at this data, it can be very noisy, so what is done is each day is an average of the current day and the five previous days. It is a straight average. That is what is done on the roll6 page.
However, for New York State, I want to show the daily vs a 2 day avg, 3 day ... through 6 day. You can see that the 6 day does a good job smoothing but also will lag the daily.
What can be seen is two fold. First is that the average is much
less noisy than the daily. The second is due to the averaging over
6 days is that the average will lag the daily by a few days.
There are other ways to smooth the curve
1) change the number of days averaged over
2) give the more current days a higher weight than the earlier days
3) fairly sophisticated Holt-Winters moving average filter. Mark Handley uses this method.
4) others ??
Overall for the purposes of this visualization, not modelling,
the 6 day rolling average that was originally proposed by Kevin
Drum of MotherJones does pretty well I think.
Main (Roll6) Covid Page
Cumulative Covid Page
Eagle vs Summit Covid Page
This page maintained by Don Samuels