Covid-19 Comorbidities are the Elephant in the Room
The presence or absence of comorbidities is not just a big deal. It’s a Godzilla-eating-a-major-city size deal. Why hasn’t anyone quantified how it affects risk to individuals?
A comorbidity is a condition that a person already has before they contract Covid-19. This is also known as a preexisting condition. Common comorbidities include diabetes, obesity, heart disease, hypertension, dementia, and cancer.
You’ve probably read statements like, “90% of Covid-19 deaths involve comorbidities,” and I have too. Those statements leave me wondering, “How does that affect me?” Because I don’t have any comorbidities, they give me a general sense that I’m at lower risk. But how much lower risk, exactly?
The analysis in this article answers those questions.
If you’re a healthy 21-year-old, your odds of dying from Covid-19 are about 1 in 100,000 — if you even get Covid-19 in the first place. But if you’re 21 with a preexisting condition, might as well be a healthy 60 year old as far as Covid-19 risk is concerned.
If you’re the parent of a high school student with no preexisting conditions, your child’s chances of dying from Covid-19 (if they even get it) is about 1 in 100,000. If your child is under 11 years old, the odds are literally 1 in a million.
On the other hand, if you’re 60 years old with a heart condition and thinking, “Covid-19 isn’t such a big deal,” think again. Your odds of dying from Covid-19 are approximately 1 in 100.
In this article, I’ll explain the math behind these statements. At the end of the article, I’ll provide year-of-age-specific tables that you can use to better understand your personal risk level.
Data Needed to Calculate Risk With and Without Comorbidities
We need three pieces of data to calculate age-specific fatality rates with and without comorbidities:
- Overall infection fatality rate (IFR), by year of age
- Percentage of Covid-19 deaths involving comorbidities, by year of age
- Percentage of the population with comorbidities, by year of age
Part 4 of this series explained how to calculate IFRs by year of age and presented the results, so we have that first bit of data covered. Let’s look at the second bit of data we need on the percentage of Covid-19 deaths involving comorbidities, by year of age.
Prevalence of Comorbidities in Covid-19 Fatalities
The CDC reports that, overall, only 6% of deaths involving Covid-19 indicate Covid-19 as the only cause mentioned. Ninety-four percent of Covid-19 deaths involved one or more comorbid conditions [source].
Figure 1 shows a summary of recent CDC data on the presence of specific co-morbidities.
The specific comorbidities vary significantly across ages. Vascular dementia plays a role in 22% of the fatalities in the 85+ age band, but 0% in the 0–24 age band. Obesity is the opposite — factoring in 22% of the youngest group’s deaths, but 0% of the older group’s.
Even though the details vary, the overall number of comorbidities involved in a Covid-19 fatality does not change much across age bands — it runs from a low of 2.02 in the 0–24 age band to a high of 2.37 in the 65–74 age band.
Does that mean we can treat co-morbidity as an age-independent factor?
At the end of May, the CDC published a “Coronavirus Disease 2019 Case Surveillance” report [source]. It listed the percentage of deaths that included comorbidities for each age group, which is shown in Figure 2.
At first glance, the data in the table appears to show that the effect of comorbidities does vary by age, with 67% of deaths in the 0–9 age group involving comorbidities, rising to about 95% for ages 30 and higher.
However, the sample size “n” (total deaths) is so small for the youngest three age groups that I don’t consider that data to be reliable.
For ages 30+, the effect of comorbidities is consistent. The data in Figure 1 also implies low variability across age bands. Consequently, I believe that it is accurate to treat the CDC’s figure of 94% of deaths involving comorbidities as an age-independent constant for purposes of the calculations described in this article.
Prevalence of Comorbidities in the General Population, by Age
The remaining data we need to calculate age-specific fatality risks is the percentage of the population that has comorbidities, by year of age. I reviewed four sources for this information:
- A 2017 CDC report on the prevalence of hypertension by age group [source].
- A 2020 CDC report on the prevalence of diabetes by age group [source].
- A 2018 CDC Survey on health statistics by age, which included coronary heart disease, hypertension, and stroke [source].
- A 2015 CDC report on the percentage of US adults with specific chronic conditions of arthritis, asthma, cancer, cardiovascular disease, COPD, and diabetes [source].
To make a long story short, I took the data from these reports, applied transforms to the data that were needed to make the different sources usable together, and arrived at the estimated prevalence of comorbidities by age shown in Figure 3. (The gory details of the data and the transformations are described in the Notes section.)
As shown in Figure 3, the prevalence of comorbidities varies significantly by age, which means we need to account for that in our age-based fatality calculations.
A Little Algebra Goes a Long Way
We now have all the bits of data we need, at least in approximate form:
- Overall infection fatality rate (IFR), by year of age (from Part 4 of this series)
- Percentage of Covid-19 deaths involving comorbidities, by year of age (we decided we didn’t need to adjust for year of age and can use 94% as a constant for all ages)
- Percentage of the population with comorbidities, by year of age (from the preceding section of this article)
We still have some work to do to calculate the output data we’re interested in. Once again making a long story short, this involves using algebra to produce equations with which we can calculate IFRs with and without comorbidities. Here are the equations:
These are identified as Equation 4 and 5 to follow the explanation in the Notes section. Take a look at that section if you want to see the algebra used to derive them.
Single-Year-of-Age-Specific Risk Factors, Including Comorbidity Status
With that data collection, data transformation, and algebra as prologue, we finally get to calculate the results, which are shown in Figure 4.
The overall age-based IFRs in the second column are the same as those described in Part 4 of this series. The % Population w/ Comorbidities column uses the estimates from Figure 3. Additional interpolation has been applied to those estimates to arrive at single-year-of-age estimates. (The interpolation approach is described in Part 4 of this series.)
The resulting IFRs with and without comorbidities are shown in the columns on the right.
The IFR numbers in the figure have been rounded to one significant digit, which is an appropriate degree of precision considering the approximate nature of some of the input data.
To provide a different perspective on the same calculations, Figure 5 presents the same data with IFRs expressed as ratios rather than percentages.
The ratios in this table are presented as round numbers. Some of the percentages listed in Figure 4 and the ratios listed in Figure 5 do not match exactly because of rounding.
Using This Discussion to Assess Your Personal Risk Level
Not all preexisting conditions are the same. If you are using this article to help assess your personal risk level, be sure to read the CDC’s discussion about the effects of specific preconditions.
A quantitative understanding of the effect of preexisting conditions dramatically improves our understanding of who’s really at risk from Covid-19.
All other things being equal, at younger ages, we see 100-fold differences in risk depending on the presence or absence of preexisting conditions.
At older ages, the risk differences drop to as low as 2.5- to 3-fold, mainly because so many people at the older ages have preexisting conditions.
Who’s really at high risk from Covid-19? Suppose a person believes that any fatality rate higher than the seasonal flu constitutes “high risk.” The seasonal flu’s fatality rate is about 1 in 750 [source]. By that definition, healthy people age 61 and younger are not at risk, and neither are people with preexisting conditions age 26 and younger. People over age 61 need to be careful, and so do people over the age of 26 who have preexisting conditions.
Setting “high risk” to be the same as the seasonal flu is conservative. If your risk tolerance is different, you will make a different assessment. The purpose of this series has been to help you to do that.
This is Part 5 of a 5-part series on the fatality rates of Covid-19:
- Part 1: Establishing an overall base IFR for the US
- Part 2: Age-based IFRs
- Part 3: Variation in IFRs across states and countries based on demographics
- Part 4: IFRs by individual year of age
- Part 5: IFRs with and without co-morbidities
The analysis in this article is limited by several factors:
- Data on the prevalence of comorbidities at older ages is combined from multiple sources. More accurate data on the presence of comorbidities would improve the accuracy of the numbers in Figures 4 and 5.
- Data on comorbidity at younger ages is estimated. Again, better data in this area would make more accurate estimates of age-based risk possible.
- Differences in IFRs at ages younger than 31 are entirely due to age-based IFRs. The percentage of population with comorbidities is treated as the same for all ages under 30 for reasons described in the Notes section.
- Data on the exact percentage of people dying with comorbidities is not completely consistent. The CDC reports that 94% of people die with comorbidities, but New York state reports that the number is 90% [source]. If the real number is 90% rather than 94%, that will not change the risk numbers for people with comorbidities much, but it will increase the risk numbers for people without comorbidities, especially at the younger ages (e.g., at age <11, the factor changes from 1 in 1,200,000 to 1 in 700,000).
- Calculating individual year of age factors using interpolation introduces possible error at individual ages (though less than is introduced by using the average for a wide age band).
- This analysis has not accounted for any difference in gender. That appears to be potentially a significant factor, especially because far more men than women have died from Covid-19, even though more women than men have preexisting conditions.
Despite these disclaimers, I believe the specific numbers listed in Figures 4 and 5 will provide individuals with a better general idea of their personal risk factors than a whole-population IFR does. And if you know of better data, I welcome pointers to better data sources.
Notes on the Data Transformation Performed on the Prevalence-of-Comorbidity Data
I was surprised at how difficult it was to find usable data on the prevalence of comorbidities. I reviewed four sources for this information (note the differences in age bands across the reports):
- A 2017 CDC report on the prevalence of hypertension by age group [source]. This report uses age bands of 18–39, 40–59, and 60+.
- A 2020 CDC report on the prevalence of diabetes by age group [source]. This report uses age bands of 18–44, 45–64, and 65+.
- A 2018 CDC Survey on health statistics by age, which included coronary heart disease, hypertension, and stroke [source]. This report uses age bands of 18–44, 45–64, 65–74, and 75+. I used this report’s data for hypertension rather than the 2017 report’s, both because this report was more recent and because it provided an additional age band.
- A 2015 CDC report on the percentage of US adults with specific chronic conditions of arthritis, asthma, cancer, cardiovascular disease, COPD, and diabetes [source]. This report uses age bands of 55–64 and 65+.
To calculate the percentage of the population, by age, with comorbidities I started with the 2015 report that specifically lists the percentage of people who have various numbers of preexisting conditions, aka comorbidities. That data is shown in Figure N-1.
Ideally this report would have provided data for all ages, but unfortunately it includes data only for people age 55+.
To fill in the data for younger ages, I used the 2018 report on hypertension and heart disease and the 2020 report on diabetes. Combining the data from those reports produced Figure N-2.
Information in the columns for Hypertension, Diabetes, and Coronary Heart Disease is taken directly from the reports. I calculated the numbers in the right-most column Estimate for 1 or More as if the diseases were stochastic events.
The stochastic probability of having any of the three comorbidities is Prob(a) + Prob(b) + Prob(c) -Prob(a and b) -Prob(b and c)-Prob( a and c)+Prob( a and b and c). Stochastic modeling assumes each of the events are independent from the others, which is not necessarily accurate in this case.
These calculations should provide meaningful ratios among the age bands, but the absolute numbers are questionable. The solution is to combine the data from Figures N-1 and N-2 to make the data more accurate.
I decided to use this approach:
- Line up the age bands from the two figures so that I have a reference age band that’s the same for both data sets.
- Compare the number I estimated for Estimate for 1 or More in the reference age band to the CDC’s number from the 2015 report for the same age band.
- Adjust my Estimate for 1 or More number for the reference age band so that it matches the CDC’s number for that age band.
- Apply that same adjustment factor to the younger age bands for my Estimates for 1 or More.
Lining Up the Age Bands
This is pretty straightforward, and the calculation would have been easy — except that the four different reports use four different sets of age bands, and none of them line up exactly. That means I needed to use the interpolation approach I described in Part 4 of this series to create comparable age bands.
So that’s what I did. I used the interpolation approach to transform the two data sets so that they used matching age bands, and then I proceed with steps 1–4. Figure N-3 shows a summary of that work.
Column (a) contains age bands regularized to the common 10-year age bands.
Column (b) contains the age-based factors calculated from Figure N-2’s right-most Estimate for 1 or More column. The only change in these values from Figure N-2 to Column (b) is that the factors have been interpolated to be correct for the revised age bands.
One point that might not be intuitive is the 13.5% factor for ages 20–29. The youngest age band in Figure N-2 for which there is data is 18–44. The population-weighted midpoint of that age band is 30.7. Interpolation does not provide any basis for determining values below the midpoint, so every age below 30.7 is assigned the midpoint value of 13.5%. (This constraint is flowed through the rest of the calculations used to create Figures 4 and 5, too.)
Column ( c ) contains the interpolated, age-based factors calculated from Figure N-1’s 1 Chronic Disease column.
That completes Steps 1 and 2 of the process I defined.
Calculating the Adjustment Factor
Column (d) computes the ratios of the first two columns. The reference age bands (for Step 3) are 60–69 and 70–79. The ratios from those age bands are averaged to produce the adjustment factor of 1.15 that’s used in column (e).
Calculating the Estimated Percent of Population with Comorbidities
Column (f) contains the resulting estimate of the percentage of population with at least one comorbidity. Ages 60–69, 70–79, and 80+ use the 2015 CDC data directly. Ages 20–29, 30–39, 40–49, and 50–59 use the adjustment factor. There is no data in any of the reports for groups younger than age 18, so the age 0–19 age band uses the same percentage as the next closest age group of 20–29. This data is shown in Figure 3 in the main article.
Creation of this data was an exercise in approximation. If better data becomes available, future calculations can be modified to use that better data.
Notes on the Algebra Needed to Calculate IFRs With and Without Comorbidities
Figure N-4 summarizes the data elements needed to calculate the IFRs and how we got them (or how we will get them).
The results we need algebra for are represented as ComorbidIFR and HealthyIFR.
Calculation of those factors is based on the fact that total deaths are the sum of the deaths with comorbidities and those without, which is expressed in Equation 1.
For most purposes, it’s easier to work with percentages than with population numbers, so we divide both sides of Equation 1 by TotalPop, which produces Equation 1a.
Of course the 100% is implied, so that simplifies to Equation 1b.
In essence, Equation 1 says that when you add up the IFR for the comorbid population, times the percentage of population in that category, plus the IFR for the healthy population, times the percentage of population in that category, that all needs to add up to the overall IFR for the whole population.
We already know the TotalIFR, so we don’t need to solve for that. (The calculation of TotalIFR is explained in Part 1 of this series.)
Equation 2 shows how we rearrange the terms to solve for HealthyIFR.
We still have two unknowns in Equation 2 — both ComorbidIFR and HealthyIFR. We can address that by bringing in Equation 3, which involves only the variables for comorbidity.
This equation basically says there are two ways to calculate the portion of overall IFR contributed by the population with comorbidities.
We can rearrange the terms in Equation 3 to solve for ComorbidIFR, and we know that the number for ComorbidDeaths% is approximately 94%. Those changes give us Equation 4.
With this equation it is possible to calculate the IFR for people with comorbidities using the overall IFR and the percentage of population with comorbidities.
Similarly, we can substitute the results of Equation 4 into Equation 2, which makes it possible to calculate the IFR for people without comorbidities, as shown in Equation 5.
In essence, Equation 5 subtracts the comorbidity part from the total, and that leaves the healthy part.
More Details on the Covid-19 Information Website
For more US and state-level data, check out my Covid-19 Information website.
I have been focused for 20 years on understanding the data analytics of software development, including quality, productivity, and estimation. The techniques I’ve learned from working with noisy data, bad data, uncertainty, and forecasting all apply to COVID-19.