COVID-19

COVID-19: Are we Testing the Highest Risk Communities?

In our last post, we learned that there are a large number of communities in the United States that are at risk for COVID-19 related morbidity and mortality. These communities are characterized by a large fraction of people who are elderly or have diseases that affect us all including autoimmune disorders, heart disease, asthma, COPD, cancer, kidney disease, and diabetes. Are we testing these communities? In this post, we dive deeper into the risk COVID-19 poses across the U.S. alongside current testing patterns.

The Chinese CDC has been documenting the risk factors for those infected with coronavirus, and they have reported those at highest risk (being put into intensive care, on a ventilator, or those who died) often had a co-morbid disease including cardiovascular disease, kidney disease, diabetes, pulmonary disease, and cancer. Additionally, it appears that older age individuals are at significantly greater risk.

At least 60% of the US has at least one of these health conditions and 16% of the population is over the age of 65 according to the US Census. We need to protect the health of all and the social distancing measures that are currently being put in place by multiple states are instrumental to protect individuals at risk.

In this blog post, we dive deeper into the risk that COVID-19 poses across the United States. Of the states that report a high rate of positive test results, what communities are at the highest risk, or have the largest fraction of people with relevant co-morbidities? Early in the epidemic, we observed an unfortunate explosion of cases in Washington state affecting a vulnerable population, the elderly.

First, let’s examine case reporting and testing results across the states, courtesy of The COVID Tracking Project, an awesome data source and API with state-level COVID-19 testing information.

How many are positive of those being tested? The answer to this question gives us a very rough (and, emphasis on the “very”, because not everyone is being tested) idea on the prevalence of COVID-19.

As of this writing (03/17/20) greater than 10-20% of people who are tested for COVID-19 are positive in states such as New Jersey, Louisiana, Massachusetts, and Pennsylvania.

Let’s now use geo-coded co-morbidity prevalence information from the Centers for Disease Control and Prevention (CDC) (see our blog post describing how to obtain and analyze these data here and our GitHub repo with full code here)to assess what communities within these states are at highest risk.

fh_covid_cases <- fh_cities %>% left_join(current_covid, by=c('stateabbr'='state')) %>% mutate(covid_percent = positive / (positive+negative))

top3_como_covid <- fh_covid_cases %>% group_by(covid_percent) %>% top_n(3, comorbidity_risk_score) %>% ungroup() %>% filter(covid_percent >= .15, total >= 100)

p <- ggplot(fh_covid_cases %>% filter(covid_percent >= .15, total >= 100), aes(covid_percent*100, comorbidity_risk_score, size=population_2010))
p <- p + geom_jitter(alpha=0.5, color='pink')
p <- p + geom_text_repel(data=top3_como_covid, aes(covid_percent*100, comorbidity_risk_score, label=placename))
p <- p + geom_label_repel(data=top3_como_covid %>% group_by(stateabbr) %>% top_n(1, comorbidity_risk_score), aes(covid_percent*100, -2, label=stateabbr), size=3)
p <- p + theme_fivethirtyeight() + theme(axis.title = element_text(), legend.position = 'none') + labs(x = 'Percent tested positive in state [min tests of 100]', y = 'Comorbidity Risk Score')

p

Here we plot on the x-axis the percent that tested positive in the state (e.g. New Jersey had greater than 50% of those tested test positive) and on the y-axis the Comorbidity Risk Score. We introduced preliminary work towards computing a COVID-19 Risk Score (the y-axis) in our previous post. This score is a function of disease prevalence and at the census tract level, and includes the prevalence of the following conditions: diabetes, asthma, cancer, stroke, chronic pulmonary disorder, and diabetes as well as disease risk factors such as obesity and smoking. We filter our results for states that have tested a 100 people minimally and 15% of these tests are being reported as a positive (as of March 17, 2020). Each pink point is a census tract and we label the tracts that have the highest risk scores for each state. The highest risk scores connote high burden of COVID-19 risk factors in the community or census tract. The text labels are proportional to the population size of the tract.

For example, in Louisiana, Shreveport has multiple communities that have a large co-morbidity score. As of March 17, 2020, Shreveport, Louisiana also has ~27 reported cases (the state also has 3 deaths associated with COVID-19 as of this writing).

Courtesy Louisiana Department of Public Health. As of 3/17/2020

Another community, Pembroke Pines, sits in the heart of Broward County, Florida (more here). While the state of Florida fortunately has a smaller fraction testing positive for COVID-19 (among all individuals tested, see first figure), Broward County is a crucible of difficult COVID-19 associated co-morbidities.

But how much does our assumptions about the “important” diseases input into the COVID-19 risk score play a role in highlighting communities? Second, how “strong” are the signals estimated by the COVID-19 community risk score?

To answer these two questions, many factors come into play, such as (1) the individual diseases we assume are the strongest risk factors for COVID-19 and (2) the algorithm by which we combine the individual disease prevalence estimates. Another hidden threat is lack of appropriate testing of the communities (the signal on the x-axis). We tried to mitigate one source of potential error by excluding smaller communities whose disease prevalence estimates might be “noisy”.

A way to dissect the signal for the COVID-19 risk score is to examine the contribution of each of the diseases, and re-highlight top communities by their disease-specific score:

fh_covid_long <- fh_covid_long %>% left_join(fh_covid_cases_sens %>% select(fips_place_tract, placename, stateabbr, population_2010, covid_percent, total) %>% unique(), by='fips_place_tract')

fh_covid_long$disease <- tolower(unlist(lapply(strsplit(fh_covid_long$disease, "_"), function(arr) {arr[[1]]})))

top_disease_covid <- fh_covid_long %>% group_by(covid_percent, disease) %>% top_n(1, score) %>% ungroup() %>% filter(covid_percent >= .15, total >= 100)

p <- p <- ggplot(fh_covid_long %>% filter(covid_percent >= .15, total >= 100), aes(covid_percent*100, score, size=population_2010))
p <- p + geom_jitter(alpha=0.5, color='pink')
p <- p + geom_label_repel(data=fh_covid_long %>% group_by(stateabbr, disease) %>% top_n(1, score) %>% ungroup() %>% filter(covid_percent >= .15, total >= 100), aes(covid_percent*100, 0, label=stateabbr), size=3)
p <- p + facet_wrap(.~disease, nrow=2)
p <- p + geom_text_repel(data=top_disease_covid, aes(covid_percent*100, score, label=placename))
p <- p + theme_fivethirtyeight() + theme(axis.title = element_text(), legend.position = 'none') + labs(x = 'Percent tested positive in state [min tests of 100]', y = 'Disease-Specific Risk Score')

p

The plot above is a “disease-specific risk score”; again, the higher the score, the higher the prevalence for a candidate COVID-19 risk disease factor (e.g., cancer, chronic asthma,heart disease, and diabetes). Comparing this plot to the one above, we can see that the disease-specific risk score highlights different (and similar!) communities.

We examine the similarities between the COVID-19 risk communities and specific disease: 

And differences:

It is clear that a priority for the United States is to “flatten the curve” to prevent the spread of infection in vulnerable communities such as those that we highlight above. Furthermore, we need to make tests accessible and develop data-driven approaches of identifying pockets of highest risk to mitigate the burden of COVID-19.




Written by Chirag Patel and Arjun Manrai on March 17, 2020