Share it with your network!
Help your colleagues and friends deepening their knowledge
If you are like me, you probably watch every day the tally count for coronas virus cases, with the numbers increasing at a frightening rate: 1.000.000 cases worldwide early April, and already over 2.000.000 today, just 2 weeks later. However, we can all agree on one thing: that number is not correct. It only reflects how many people were tested positive, which leads us to the question: how many cases are there in reality?
In this article, we will investigate together how the data available on the coronavirus can be used to estimate the diagnosis efficiency of each country. This will show severe disparity that can be linked to political decisions, healthcare or testing capacity of each country. We will also see that many countries with similar politics fall within the same range of values.
With this article, we aim to illustrate in a pedagogical way how logical reasoning and data analysis work together to extract insights from raw data. The output numbers shown here should however under no circumstances be taken as the truth, as they rely on a fair number of assumptions.
So, how can we reconstruct the missing data?
It all starts from the following two hypotheses:
There are of course several factors of influence, namely: the age repartition of the population, the lifestyle (obesity), the quality of healthcare… but these factors are unlikely to have an impact stronger than a factor of 2 or 3 (going from 2% in South Korea, to maybe 4 or 6% in some countries?).
However, we actually see countries with a fatality rate in the 30% … 15 times more than South Korea! So what is accounting for most of it? This is what we will explore, but let’s first look at how we can accurately calculate the observed fatality rate:
Delay between contamination and death
To more accurately measure the fatality rate we need to assign a death to its day of diagnostic. Indeed, the time between diagnosis and death coupled to the exponential growth of cases produces a fatality rate not constant in time and which underestimates the reality:
Because the data is not known for each patient, we will have to use an average duration to shift the curve in time. Let’s keep in mind that this average duration will also differ from country to country, depending on the testing capabilities and quality / availability of healthcare.
From the observation over the countries that are the most affected by the virus, we observe delays ranging typically between 7 and 11 days. We thus decided to apply an average of 9 days shift to every country to perform the corrected fatality rate calculation.
Before performing this time correction, we could observe an apparent fatality rate for Belgium rising, going from 5% to 10% between the 31 March and 9 April. After the time correction it now appears constant over that period of time, but at the rate of 20%!
At this point, it is important to note that the fatality rate is not 20% of people infected, as not every infected person is diagnosed. This apparent high rate is due to the very low testing capacity in Belgium, with most of tests performed on people requiring hospitalisation. We may also note that Belgium decided to include in the death count any suspect death in retirement homes regardless of diagnosis, increasing this rate even further.
A small parenthesis on South Korea
Let’s now dive a bit into the analysis of South Korea’s strategy to fight the virus. The fatality rate there is amongst the lowest of all countries, with 2% only. Why is that?
South Korea has performed very aggressive testing using apps on every citizen’s phone to track their movement. Meaning that from the moment a positive case is detected, the government can rapidly identify everybody who crossed path with the infected person and warn them so they could isolate themselves and get tested. This is how South Korea managed to contain the virus without needing to do any lockdown, saving at the same time lives and their economy. And while the 100% diagnosis is never achieved, it is safe to assume this strategy allowed them to diagnose a high number of cases, probably nearing the 100%.
How can we then estimate the diagnosis efficiency for each country?
With everything we have just seen, we can take assumptions and extrapolate what % of infected people are actually diagnosed in every country. The three assumptions being:
From the observed fatality rate and the three assumptions above, we can estimate the real number of cases per country and what proportion of these real cases have been diagnosed. The results are shown in the below interactive chart:
We observe fluctuations and evolutions over time:
We have thus managed to link data to events, and understand the situation. This reasoning can be valid for a wide range of applications (being politics, public health, economics, … or the performance of your business) and provides with insights that help take better decision.
While the trends shown here hold, please note that because of the high number of unknowns with the COVID-19, the numbers are only rough approximations and should not be re-used anywhere out of context.