People love to read too much into dumb headlines. So many people shared the “US surpasses Italy in confirmed coronaviruses cases” news today. Guess what? It’s fucking meaningless! We could’ve looked at the growth rate some days ago and known this was going to happen. Additionally, once again, confirmed cases aren’t a great metric (more below). If I’m going to spend brain cycles on coronavirus, I want to think critically, go beneath the surface, and explore data myself… OR make memes. Not share these normie-ass articles.
The second-order effects of these braindead articles are odd. I’ve even seen it lead to some of my Facebook friends (who are a lost cause by now – they still think this is nothing more than the flu) to proclaim that “WELL ACTUALLY Italy still has more cases per capita.” Wow they must be so smart to know how to divide a number by the population of a country. Many of these people like to talk about how the media is incentivized to spin stories to attract clicks / views, and how because of this we can’t trust the media. In some ways, I share their disdain…
I started poking around at some of the different sources of covid related out there. covidtracking.com is really doing a lot of the thankless work around cleaning data from many different sources – very much appreciated. I have a public notebook with some code here: https://colab.research.google.com/drive/1qr6G1dxj9I1OlAdxqlZECFQdqfV9RDps
I’m trying to find the right questions to ask – they should be answerable from available data, help people get a clearer view of what is actually going on, and not already discussed ad naseum. I’m hesitant to do any forecasting because of lack of domain knowledge, and I certainly don’t want to add more noise to an already incredibly exhausting topic.
Testing vs Confirmed cases
There was a good thread by a buddy on the high correlation between number of tests and confirmed cases.
FYI: the growth in positive test results per @COVID19Tracking data maps almost 1:1 with the growth in tests performed : R^2 of 0.97. In other words, reporting new and daily case counts is really just reporting on the ramp up in testing capability. @DanRosenheck @Noahpinion pic.twitter.com/T0aZBmVZLm— Kostya Medvedovsky (@kmedved) March 27, 2020
I learned from one of the replies that finding R^2 between two time series can lead to spurious correlations though. Curious to see the follow up on this. Or I guess I could do the work myself here…
so reading from the source, it's a bit nuanced.— canzhi is at home right now (@canzhiye) March 27, 2020
5564 tests on "likely" people (prob similar criteria to US). 685 positive. (12%)
6163 tests on the general population. 52 positive. (0.8%)https://t.co/Koeyjk5loP
To my knowledge, Iceland is the only country thus far to do random testing of the general population. They lead the world in tests per 1000 at roughly 30. For comparison, the two highest US states are NY and WA, at 7 and 6 respectively. We have data on Iceland’s randomized tests for 3/20, 3/23, and 3/25 so maybe we can do something with this data. Simple back of the envelope calculation tells us that means there are 2.5M people in the US with the virus if we extrapolate using Iceland’s rate. There’s so many other variables that could render the previous sentence completely inccorect that I almost didn’t want to include it here. Will need to think more about how to best use this data.
More on data
Speaking of data, the NYT released their county-level coronavirus data. I felt kind of misled by their article though.
this is excellent! can you also release this data? pic.twitter.com/SKwAJo8XCT— canzhi is at home right now (@canzhiye) March 27, 2020
Not that county-level data isn’t useful, but seems like getting information about those who actually got the virus would be very useful, or at the very least, interesting to analyze. Perhaps there is county-level demographic and health data that is easily accessible that I could merge with the NYT data… If I remember to go look for it and find it, I’ll add it to the notebook.
In looking at the testing data from covidtracking.com more closely (with Kostya), I found that the quality was not ideal. There were quite a few days where NY just didn’t report any negative cases… Such is real-world data, I guess. With sports, the data quality is usually much higher. When I’m modeling sports, I feel pretty confident that whatever data is captured is done in a relatively consistent way across games and seasons (still not perfect). It really is a great playground.
I’ve heard conflicting information about the efficacy of masks. The WHO dismisses their usefulness as a protective measure for healthy people, and Michael Osterholm (epidemiologist expert) echoed it on Joe Rogan’s podcast a couple weeks ago. However, I have also seen a document compiling a list of studies that claim that masks are helpful! There’s just so much stuff on the internet that you will get exactly what you search for.
I don’t know what to believe. Now that the level of seriousness has been ratcheted up, I feel much more aware of how easily misinformation or even half-truths can be spread on social media. This doesn’t feel like the malicious fake news campaigns from the 2016 election, but more an overconfidence in something where the truth is not known. If anyone asks me if masks are effective, I’ll fall back to “I don’t know,” but if I have to go to the grocery store, I’ll take the Pascal’s wager approach.