The Uncanny Consistency of Covid-19 Age-Based Fatality Data

There’s a deep underlying consistency in Covid-19 age-based fatality data — but you have to do some math to see it

Steve McConnell
Towards Data Science
10 min readJul 24, 2020

--

You are probably aware that Covid-19 is deadlier for older people than for younger people. Reporting of that general pattern has been consistent, but the specifics have varied wildly: for people over 80 years old, South Korea has reported a 7% fatality rate, China 15%, Spain 21%, and Sweden 36%.

How can there be so much difference in fatality rates?

I wanted to understand the real difference in age-based fatality from Covid-19. As I investigated, I found that data from US states and other countries shows a remarkable consistency in the effect of age on fatality. But you have to look at the data the right way to see the consistency.

Photo by Sergiu Vălenaș on Unsplash

The Short Version, for People Who Aren’t Interested in Math

While the absolute fatality rates reported as percentages vary significantly, the ratios of fatality rates from one age band to another are extremely consistent.

Applying the overall infection fatality rate (IFR) I described in a previous article, you end up with the age-based fatality rates shown in this table.

Table A — Age-based infection fatality rates (IFRs) for Covid-19 (95% confidence ranges).

Because the age-related ratios are so consistent, there’s little variability in the fatality rates for each age band, even when you allow for variability in the overall fatality rate (the 0.4%, 0.5%, and 0.6% numbers at the bottom of the table). This allows people to accurately determine their personal risk levels and the risk levels of other people they care about.

The fatality risk to people age 80+ is 100 times as high as to ages 30–39, 200 times as high as to ages 20–29, and 400 times as high as to ages 0–19.

The Long Version — With Lots of Math

To understand age-based fatality, I needed data on both the number of positive tests by age and the number of fatalities by age.

New York has logged the most Covid-19 deaths of any state by a wide margin. To get the most reliable data on age-based fatality rates, I initially wanted to use the state with the largest numbers.

New York publishes the number of fatalities by age, but unfortunately, it does not publish the number of positive tests by age. So I had to look elsewhere.

Small States to the Rescue

I searched for states that had published both the numbers of positive tests and fatalities by age. I reviewed the Covid-19 websites of all 50 states and the District of Columbia — which was a frustrating exercise. Many states publish age data on fatality, but not positive tests. Many publish age data on tests, but not fatality. A couple publish both, but use age bands that are misaligned with the 10-year age bands used by other states and countries (e.g., California uses age bands of 0–4, 5–17, 18–34, 35–49 rather than 0–19, 20–29, 30–39, 40–49).

Ten states publish both positive test data and fatality data by age, with meaningful age bands.

Table 1 summarizes the age-based CFRs (case fatality rates) I calculated for these states. These are crude fatality rates because they are computed by dividing the total number of deaths by the total number of positive tests. The results are approximate — both because positive tests are not a very consistent proxy for infections and because these fatality rates don’t include future deaths that will be associated with the most recent positive tests.

Table 1 — Crude CFRs for “Small” States

Variations in the states’ fatality rates are similar to variations in the country-level rates. Fatality rates in the 80+ age category range from 18% in New Hampshire to 36% in Connecticut, which is similar to the range from 7% in South Korea to 36% in Sweden.

Because none of these states has recorded anywhere near the number of deaths that New York has, and some recorded fewer than 100, I thought it would be most accurate to pool the state data into a weighted average, with weighting based on the number of deaths. The result of that pooling is shown in the right-most column in Table 1.

It’s largely meaningless to talk about an overall fatality rate for Covid-19, except for statistical purposes. With a 400-fold variation in fatality rate from young to old, the overall fatality rate grossly overstates the risk for younger people and grossly understates the risk for older people.

Comparison of US Fatality Rates to Other Countries

I wanted to see how the US age-based fatality rates compared to rates reported by other countries. Table 2 shows the comparison, again based on crude CFRs.

Table 2 — Crude age-based CFRs for the US “small” states and selected countries.

Because of the significant variations in test practices across countries, I didn’t expect that the fatality rates would be directly comparable, and indeed Table 2 shows significant variability. Any pattern in age-based fatality rates remains well-hidden when the data is presented this way.

Unhiding the Amazing Consistency in Age-Based Fatality Ratios

To facilitate an apples-to-apples comparison and potentially unhide patterns in the data, I normalized the fatality rates. (Normalizing is a process of putting different datasets onto the same scale.) I treated the rate in the 80+ category as 1.0 in each region, and then I scaled the other fatality rates proportionately to that. The result is shown in Table 3.

Table 3 — Normalized fatality ratios by age for US “small” states and selected countries.

With data normalized around age 80+, the ratios become very interesting:

  • At ages 20–29, the total variation across regions is 0.00 to 0.01.
  • At ages 30–39, every region reported a ratio that rounds to 0.01.
  • For age 40–49, the total variation is the range between 0.01 and 0.03.

Regardless of the data source, fatality ratios for all age bands are strikingly consistent.

Why Consistency in Age-Based Ratios Matters

More reports than I can count have expressed confusion over the differences in fatality rates reported for different countries. People have speculated that the virus has different levels of fatality in different locations.

The striking consistency in these ratios suggests otherwise. It supports the idea that the virus behaves consistently. The variability in reported death rates is a result of different levels of testing being performed, not the result of variability in behavior of the virus. This is easy to believe, because we already know that there’s been huge variability in testing practices.

Revisiting New York State’s Age-Based Fatality Ratios

The normalization calculation depends on the ratios of positive tests in each age band, but it does not depend on the absolute numbers of tests.

Because New York state doesn’t provide age breakdowns of its test data, I wondered whether the level of infections there might be high enough that fatality ratios could be calculated by using the overall population of the state instead. This would amount to assuming that New York’s population had been uniformly infected across age bands. That seemed plausible considering how widespread the pandemic has been in New York.

I tested this theory by calculating normalized ratios as shown in Table 4, with New York state added as the right-most column.

New York’s age-based fatality rates are indeed within the ranges already established in every age band except 70–79, which suggests that the population-based calculation approach is valid.

Table 4 — Normalized fatality ratios by age with New York state added.

Analysis With Samplings From Additional Areas

I went through similar analysis with population-based ratios for New York City, and test-based and population-based ratios for the entire United States. (US data was not available when I began the analysis but became available before writing this article.) The full set of results is listed in Table 5.

Table 5 — Expanded set of normalized fatality ratios by age.

The additional data points did not change the overall picture very much. Most of the additional points stayed within the range that had been established by the initial “small states + selected countries” analysis. The only points outside the initial range were US-population-based points in the 60–69 and 70–79 age bands.

One interesting implication of the close tracking between test-based data and population-based data is that it suggests the population is being tested proportionately. There’s been speculation that younger people get tested more, or get tested less. If that was true you’d see that reflected in differences between the population-based numbers (US and New York) and the test-based numbers (all the others). To the degree that each of the US test-based numbers is higher than the corresponding population-based numbers, the only disproportionality conclusion the data supports is that every group under age 80+ is being under-tested compared to the group age 80+.

Final Age-Based Fatality Ratios

The availability of several data samples provides a basis for calculating confidence intervals, which simplifies the picture and narrows the results. The 95% confidence intervals are shown in Table 6.

Table 6–95% Confidence intervals for age-based fatality ratios.

The confidence intervals are narrow, which is what you expect intuitively when there’s so much consistency in the data points.

The data in the table supports interesting conclusions, including:

  • The fatality risk roughly doubles between each age band.
  • The fatality risk to people age 80+ is 100 times as high as to ages 30–39, 200 times as high as to ages 20–29, and 400 times as high as to ages 0–19. (This last factor looks like it should be 300 due to rounding in Table 6, but it’s actually 400.)

The Last Step: From Ratios to Meaningful Age-Based Fatality Rates

Combining these fatality ratios with the overall fatality rate analysis I wrote about in “New Data Shows a Lower Covid-19 Fatality Rate,” we can calculate meaningful age-based fatality rates, with confidence intervals. As I stated in that article, the overall fatality rate in the US ranges from 0.4% to 0.6% with a mean of 0.5%. I used a Monte Carlo simulation to account for the interaction effects of variability in the age-based calculations and in the overall fatality rates.

The results of the Monte Carlo simulation are shown in Table A (the same table at the beginning of this article).

Table A Revisited — Results of a Monte Carlo simulation of the interaction between variability in age-based Covid-19 fatality rates and variability in overall Covid-19 fatality rates. Ranges shown are 95% confidence intervals.

I validated the results of the Monte Carlo simulation by doing a straight calculation of the confidence intervals for two independent samples. That produced very similar results.

As you can see from the table, for teens and pre-teens, even with worst-case assumptions, the high end of the fatality rate is no more than 0.02%. On the other hand, for people in the 80+ range, even with best-case assumptions, the low end of the fatality range is at least 3.4%.

This data shows why it’s largely meaningless to talk about an overall fatality rate for Covid-19, except for statistical purposes. With a 400-fold variation in fatality rate from young to old, the overall fatality rate grossly overstates the risk for younger people and grossly understates the risk for older people.

Conclusions

Data on age-based fatality of Covid-19 from different locations initially appears to be quite disparate. Upon closer examination, that data, in fact, shows a high degree of convergence. This convergence allows confident calculation of narrow age-based fatality rates for Covid-19.

Having a clear picture of age-based fatality rates provides people with the information they need to make well-informed personal risk decisions for themselves, their families, and other people they care about, regardless of whether they are on the high or low end of the risk spectrum.

This is Part 2 of a 5-part series on the fatality rates of Covid-19:

  • Part 1: Establishing an overall base IFR for the US
  • Part 2: Age-based IFRs
  • Part 3: Variation in IFRs across states and countries based on demographics
  • Part 4: IFRs with and without co-morbidities
  • Part 5: IFRs by individual year of age and co-morbidity status

More Details on the Covid-19 Information Website

For more US and state-level data, check out my Covid-19 Information website.

My Background

I have been focused for 20 years on understanding the data analytics of software development, including quality, productivity, and estimation. The techniques I’ve learned from working with noisy data, bad data, uncertainty, and forecasting all apply to COVID-19.

References

South Korea fatality rate.

China, Spain, and Sweden fatality rate.

Colorado state data.

Connecticut state data.

Illinois state data.

Indiana state data.

Maryland state data.

Massachusetts state data.

Mississippi state data.

New Hampshire state data.

New York state data.

North Dakota state data.

US age-based positive test and fatality graphs and report and links to data downloads from the CDC.

Virginia state data.

--

--

Author of Code Complete, Dog Walker, Motorcyclist, Cinephile, DIYer, Rotarian. See stevemcconnell.com.