Using Social Determinants of Health to Fight COVID-19 and Get the Economy Back on Its Feet
By Wayne Gearey, PhD, and Rob Sentz
Graphic design and editing by Hannah Grieser, Levi Law, and Gwen Burrow
Publish Date: May 20, 2020
The data in this paper is at the MSA (metropolitan statistical area) level. If you would like to see a county-level analysis, please contact Rob Sentz: firstname.lastname@example.org.
In this paper we use data on the social determinants of health, which include the rich labor market and demographic data created throughout the United States, to better explain why certain regions have seen a bigger impact from COVID-19 while others have gone relatively unscathed.
Emsi has developed a Health Risk Index in order to better understand where and why COVID-19 has spread, and to help policymakers create localized responses to this (or any future) virus. The index accounts for the key factors that have given the coronavirus a stronger foothold in some regions while other places have been largely spared. We believe the Health Risk Index will help communities use data to make better decisions. This data-driven approach can help leaders achieve the following:
To do this work, leaders need comprehensive, interdisciplinary data—at the local level—from social scientists, epidemiologists, experts in logistics, data scientists, and economists. Such data will help us make better decisions that combat the virus, allow Americans to keep their jobs and support their families, and keep civil liberties intact.
It is for this reason that Emsi created the Health Risk Index.
The index features data on the social determinants of health, which we believe provide key insights to help us understand which geographical areas are prime targets for COVID-19 and which areas are not. Once we can differentiate between likely hotspots for the virus (like New York City) and lower-risk areas (like San Jose), we can create corresponding strategies.
The index features four notable risk factors unique to each community across America:
The truth is, multiple factors have shaped the spread of the virus, and these factors vary from place to place. Not every region shares the same risks. Therefore, not every region should employ the same response.
When COVID-19 began taking the world by storm, initial predictions were alarming. Medical experts knew very little about the virus or how it would affect various populations and communities. What we did know was that the virus was taking a severe toll in some parts of the world, and nobody wanted their town to become the next Wuhan or Lombardy.
So in the absence of solid data, we relied on predictive models, which projected overburdened hospitals and millions of fatalities. These sobering projections—combined with uncertainty about the nature of the virus—bred a great deal of fear and anxiety among medical experts, elected officials, media, business leaders, family, and friends.
The result: a steep downturn in the US economy. Sweeping policies at the state and national levels placed all of America—urban, suburban, and rural alike—under the same orders as hard-hit centers like New York City.
The virus has now claimed many victims: those who have lost lives and loved ones, those who have lost livelihoods, and those suffering from mental illness, domestic violence, suicidal thoughts, and hunger. Since March 14, jobless claims have risen to more than 36 million. Mental health and suicide hotlines are seeing huge spikes in traffic, doctors and police are reporting greater numbers of cases of domestic violence, and nearly 23% of households say they lack money to get enough food.
And so we must ask ourselves: Is there a more targeted, data-driven way to respond to the pandemic? Can we address concerns about the virus and the economy at the same time?
The answer to both these questions is yes. The Health Risk Index shows where the virus is likely to hit hard and key areas to focus on to limit the spread. It also uses the four risk factors listed above to explain why the virus affects regions as differently as it does individuals.
While we haven’t learned everything we need to know about COVID-19, we do know more today than we did two months ago:
To illustrate this last point, let’s compare Sweden and Michigan, both with a population of right around 10M. Both Sweden and Michigan have made headlines: Sweden for keeping its economy open, Michigan for its extensive lockdown. We would expect the two areas to have very different results, but they don’t.
As of May 13, Sweden has 28,582 reported cases and 3,529 deaths, which amounts to 2,858 cases and 352 deaths per 1M people. Michigan currently has 49,391 reported cases (175% that of Sweden) and 4,714 deaths, which amounts to 4,939 cases per 1M and 471 deaths per 1M. Given Michigan’s strict policies, we might expect the state’s numbers to be significantly lower than Sweden’s. However, that simply isn’t the case.
So here we have two regions of very similar size, one with a strict policy and the other with a relaxed policy, yet the better results have not followed the stricter policy decisions.
By contrast, California and New York state have very similar policies, yet with divergent outcomes. If we compare California (population 40M) to New York (population 20M), basic math and logic would indicate that California, with twice the population of New York, should have twice the cases and deaths of New York. But this has not happened. New York has had 350,848 cases and 27,290 deaths compared to California’s 73,172 cases and 2,974 deaths. That amounts to more than 18,000 cases (and 1,400 deaths) per 1M in New York and only 1,852 cases (75 deaths per 1M) in California.
In our first example we have two places with different policies and similar outcomes. In our second example we have two places with the same policy and very different outcomes. Why?
Both of these examples lead us to believe that we need more data to explain why the virus has impacted certain regions more than others. This is why we would like to make a case for a more targeted approach to the virus. By narrowing our focus to the city and county level, and by exploring the local underlying health indicators, we can better understand the widely varied impact of COVID-19, assess the risks, and customize the strategies for any region.
The Health Risk Index allows us to assess the risk of COVID-19 by region, and then model where the virus will likely have the biggest impact. This analysis is based on the four population health indicators listed above.
We combined these four factors to create an overall risk index. Risk indices range from 0 to 1. A low-risk index score, for example, might be 0.1, while a high-risk index score could be 0.8.
These risk factors provide a detailed, accurate description of the spread of the virus. Our belief is that such data can identify which regions are:
We applied these risk factors to the entire US in two steps:
First, we viewed the data at the city or metro level instead of at the state level, which is where most policy is being made and which doesn’t provide enough granularity. We separated cities into Tier 1 markets (metro areas with a labor force over 1M) and Tier 2 markets (smaller metros with labor forces between 1K and 1M) so we could compare cities with cities of similar sizes. Apples to apples.
Second, we ran the data from multiple sources to calculate risk, and visualized the data on a map to identify the prevalence of various risk factors by region.
NOTE: For the purpose of this analysis, we conducted our research at the MSA (metropolitan statistical area) level. If you would like to discuss county level analysis, please contact us.
The result is a risk index that bears a very tight correlation to the real-life COVID-19 impact, in terms of both cases and deaths. The next two charts portray actual COVID-19 cases or deaths. The Health Risk Index very closely predicts the number of cases or deaths per city, based on that city’s unique risk factors. For instance, the risk index accurately predicts that NYC would have the most significant trouble with COVID-19, whereas the impact in cities like San Jose and even Orlando would be much less significant.
*For more on this see our methodology.
The US map below displays the number of cases per 100K people, with red indicating the highest incidence rates. Notice how the East Coast is a hotbed for the virus, while the incidence rate diminishes further west.
Note: Numbers are approximate because they are based on COVID-19 cases that are updated daily. These numbers are current as of May 13, 2020.
Now let’s take a closer look at specific MSAs.
As the first chart illustrates, NYC (with a risk index of 0.94) ranks the highest, followed by Los Angeles (0.78) and, significantly lower, Philadelphia (0.66). For the sake of comparison, we also included two medium-high risk cities (Dallas, 0.53, and Las Vegas, 0.36), and a low-risk city (San Jose, 0.29).
Note that cities with three or four high risk scores have also seen worse outbreaks in real life. A possible exception to this is Los Angeles, which hasn’t been as ravaged as East Coast cities (likely due to a population with fewer preconditions).
In contrast, cities with just one higher risk factor have largely seen minimal impact. For instance, notice that San Jose, which has avoided an extreme outbreak, is only high risk for workplace interaction. Since many Bay Area tech companies quickly shifted to working from home, it further decreased the chances for spread in the work environment.
The point here is to illustrate the huge variation between large cities. To highlight this, we will discuss the key differences between NYC and San Jose further down.
For Tier 2 cities, we again showcase a representative few. Milwaukee (0.81) and Trenton (0.71) are among the most at risk, especially due to the high risk in population health and overall population density. New Orleans’ high risk score (0.69) sadly corresponds with the scene on the ground. By mid-March, the Big Easy was outstripping New York in cases per capita. The city has high risk for preconditions, population density, and workplace interaction. These three factors, combined with the seedbed event of Mardi Gras, likely turbo-charged the overall impact of the virus.
Note that Greeley, Colorado, with low overall risk and only moderate risk for workplace interaction (0.36), has seen greater impact than a place like Boise, Idaho. Boise, which scores higher on our Risk Index than Greeley because of population density and preconditions, has escaped the higher COVID-19 impact of Greeley where the JBS meat processing plant had 245 confirmed cases out of 6,000 employees. This outbreak has caused Colorado to lead the nation in numbers of COVID-19 deaths connected to meat-processing plants. We will elaborate on these factors in the section on rural areas below.
The New York MSA has been hit harder than any region in the US. Why? Because it was a petri dish for the virus on all fronts.
Forbes recently reported that incidence rates (cases per 100,000 people) are the best publicly reported metric we have for estimating risk of exposure. With more than 2,000 confirmed cases for every 100,000 people in NYC, the average New Yorker is more likely to come into contact with an infected person than someone in San Jose, for example, where the incidence rate is only 117 cases per 100,000 people.
Add to this an aging population (including many people with high-risk preconditions), high-density working conditions, a teeming public transit system, and dense population, and the conditions in NYC were right for a perfect storm. The incidence rates we see there reflect this:
Let’s walk through the four risk factors and contrast NYC (a high-risk Tier 1 city) with San Jose (a low-risk Tier 1 city) to illustrate how the Big Apple was set up to be the nation's COVID-19 epicenter, and why San Jose was spared the same.
The first risk factor is preconditions. According to the World Health Organization, the preconditions for COVID-19 include underlying medical conditions such as cardiovascular disease, diabetes, chronic respiratory disease, and cancer. For our preconditions index, we modeled region-specific data on the number of people who did any of the following within the past year:
We discovered a strong correlation between the preconditions index and known COVID-19 incidence rates, cases, and deaths. This correlation is painfully obvious in NYC. San Jose, on the other hand, doesn’t have nearly the concentration of underlying medical conditions in its residents.
Future mitigation: In the short term, regions can do very little to fix health preconditions. However, this data indicates that fostering healthier lifestyles is a key long-term preventative measure against a virus like COVID-19.
The second risk factor is population density. High-density environments—cruise ships, public transportation, multi-family living arrangements—can make a population more vulnerable to epidemics because of the frequent contact between people. We see a strong correlation between population density and COVID-19 incidence rates. NYC has over 26,000 people per square mile, making it the densest Tier 1 city in the US. San Jose, on the other hand, has 5,823 people per square mile—less than a fourth the density of NYC. Populations that are more spread out tend to reduce this risk factor.
Future mitigation: Cities with high population density should focus on protecting dense, high-traffic areas. They can reduce the number of passengers allowed at one time on public transportation, limit lines, regularly sanitize areas, and encourage travelers to use personal protective equipment (PPE).
The third risk factor is workplace interaction. The number of workplace interactions (which is based on industry-specific workplace density) correlates strongly to viral spread. NYC has numerous industries with high-density work environments, therefore its workplace interactions index is high. While San Jose also has its own share of population-dense work environments, it has significantly fewer than NYC, so its risk is moderate.
Future mitigation: Employers should adjust business operations to reduce risk to employees. They can divide shared work spaces to minimize interaction, transition employees to remote work, and relocate crowded activities to more open areas. This will be more challenging for certain industries (such as food processing) that depend on many employees working in close proximity.
The fourth risk factor is population health. A population’s health risk goes up due primarily to age, but also due to education level and income. As a result, vulnerable populations could suffer disproportionately from the spread of COVID-19. NYC has an extremely high population health risk, while San Jose does not.
Future mitigation: Nursing homes especially are a prime target for COVID-19. The New York Times found that residents and workers in nursing homes account for 33% of all coronavirus deaths in the US. Another study found that in 14 states, that number increases to 50%. Regions with a high concentration of nursing homes and a significant elderly population should therefore create preventative measures to protect nursing homes and elder care facilities.
Interestingly, not one but two Florida cities rank in our top 10 Tier 1 MSAs in terms of risk: Tampa (0.56) and Miami (0.54). In March, Florida looked primed to become the next NYC with its popular beaches, tourist attractions (especially over spring break), and the country’s greatest share of elderly residents: 3.5 million aged 65 and older. Yet, despite all the makings for a tragedy, the Sunshine State has been largely spared. Why?
The answer is, Florida has a few distinct advantages over NYC: a far less dense population, less workplace interaction (many residents are retired), and—despite the age of the populace—fewer health issues when compared to NYC. Single-family homes are also far more common in Florida. In fact, the owner-occupied housing rate in Florida is 65%, while in NYC, it’s about 33%. And only 4% of Flordians use public transportation to get to work, as opposed to NYC’s 56%.
In addition, Florida’s action to limit the spread via the state’s many nursing care facilities, which is their highest risk sector, likely has contributed heavily to suppressing the kind of outbreaks we have seen in the Northeast. These factors have undoubtedly helped Florida defy the predictions and become an overall success story.
Only 20% of all COVID-19 cases and deaths in the US have occured in America’s Tier 2 cities (those with labor forces between 100K and 1M), yet we still see some of these cities where the coronavirus has had a bigger impact. Notice how many of the cities in the chart below have an overall high risk index, with high risk scores for two or three of the four factors.
Throughout the outbreak, it has been easier to feel safe in rural America. After all, only 10% of COVID-19 cases and 7% of deaths in the US are in communities with a labor force of less than 100,000, where social distancing is the norm, neighbors are spread out, and public transportation is almost non-existent.
But there are a few notable examples of more significant rural outbreaks. For instance, Albany, Georgia (population of 75,000) had the fourth-worst outbreak per capita in the US. The culprit: two funerals. Even Albany’s relatively remote location (40 miles from the nearest interstate, and three hours from Atlanta) did not shield against the virus. And Blaine County, Idaho, home to the world-famous Sun Valley ski resort, got hit harder than any other county in the state. It’s a tourist destination. Every year from December to March, roughly 30,000 visitors pour through the area.
Though these two examples are largely outliers, we did find that the greatest risk factor for rural America is workplace interaction. This is because rural communities often rely on industries like food processing, which have higher density work environments. And unlike tech employees in Silicon Valley, these employees can’t work from home. Examples of this include:
So, while it is true that the overall risk in rural America is significantly lower than in urban America, rural communities should still look for ways to limit or ameliorate high-density work situations.
During this (or any future) viral outbreak, no location or person is ever entirely immune from risk. What we can do is use data to respond intelligently. Using risk analysis like the one provided here, we can understand and reduce the particular risk of COVID-19 (or any virus) to our most vulnerable cities, workplaces, and people.
Our goal is for community leaders to use this risk analysis to forge local, customized strategies that preserve lives, livelihoods, and liberty.
Risk analysis, like the one we portray here, tells you just that: risk. It is a great guide that points leaders to those aspects in their community that deserve more attention. But it does not promise or prophesy. Some low-risk communities, like Albany, got walloped. Some high-risk communities, like Orlando, escaped relatively unscathed. However, these outliers do not undermine our understanding of regional risk, and additional local data (about high-density events or targeted policies to protect at-risk people) can generally explain these variances.
If your community has a high risk index, use this data to focus your efforts where the risk is greatest. Conversely, if your community has a low risk index, use the data to remain sane and savvy so that you don’t underreact—or overreact.
When the virus first hit, America’s perceived risk was high across the board. We had no data to the contrary. But now, COVID-19 has manifestly affected people and communities very differently. Now that we have this data, we can increasingly treat this the way we would treat any virus in the body. Like a good immune system, we should isolate and attack the disease while preserving healthy cells (or people) in the process.
To this end, wherever possible, we should attempt to develop strategies at the regional or city level, not necessarily at the state or national level. Clearly, the virus does not care about state lines, as we see in the New York MSA which spans parts of Connecticut, New Jersey, and Pennsylvania. With the virus running rampant throughout, the metro area would benefit from a regional plan rather than an exclusive state plan.
To create local, custom-made reponses, we need data from as many fields of study as possible. We need all hands on deck: virologists, epidemiologists, economists, sociologists, and others. Why? Because data from multiple perspectives provides nuance and helps us craft fine-tuned solutions for each community, rather than a blunt-force, one-size-fits-all reaction that will help some but disadvantage others. As we develop this index, we would also welcome feedback and suggestions.
As Emsi CEO Andrew Crapuchettes pointed out, the COVID-19 crisis and the great economic shutdown are like a tsunami. Taking shelter can be a temporary response, but sooner or later the water will recede and we’ll be cleaning up a huge mess. How can we be ready?
Data is critical, but just as critical is teamwork. Here’s our vision for how we can use the risk analysis in this paper to work together:
State and local leaders can use this risk analysis to understand their biggest risk factors and develop fine-tuned strategies that limit risk for the most vulnerable places and people. This can also help planners determine future potential hotspots, so that reemployment efforts aren’t derailed by new outbreaks.
The pattern of the coronavirus spread makes it clear: not every hospital needs to devote every resource to fighting COVID-19. Regions can allocate hospital staff and adjust requirements for protective gear and medical devices based on their own unique level of risk. This will ensure that hospitals serving major hotspots (like NYC, New Orleans, and Albany) don’t run out of critical equipment, while simultaneously ensuring that hospitals not in virus hotspots don’t shut down due to lack of business. It will also save the lives of Americans suffering from non-COVID illnesses by allowing them to get the treatment they need and by keeping more hospitals open.
To get employees back to work, businesses need to evaluate their locations, the level of social interaction in the workplace, and the health of their workforce. A recent poll indicates that 54% of US employees are concerned about exposure to COVID-19 at work, but 71% also feel confident their employer can manage the back-to-work transition safely. For employees to feel safe, they need to know their employer has taken measured steps to mitigate viral outbreaks: making PPE gear and hand sanitizer available, requiring symptomatic employees to stay home, and the like.
The Health Risk Index can also help individual Americans make better decisions, minimizing the need for the government to step in. People should know the data, weigh the risks, and make smarter decisions for themselves.
The goal of this research and data is to inform leaders and to connect them to better strategies that will help them:
If you would like to learn more about the Health Risk Index, please contact us. For the sake of simplicity, we limited our discussion to MSA-level data in this paper, but if you are interested in the same data at the county level, we would love to discuss it with you. Contact Rob Sentz for more information: email@example.com.
In this study, we used government census data, market research data, and workforce data to determine the risk and impact of COVID-19 in over 200 MSAs (metropolitan statistical areas) in the United States.
Our methodology incorporates both a data model and a computational model to create a unique index that can predict risk regarding the impact of a virus on a geographical region. Additionally, we used correlation as a validity check.
The data model uses four key risk factors for COVID-19:
These factors are typically used in community health assessment and location economics. While a number of studies throughout peer-reviewed literature have examined these factors, few have used them in an interdisciplinary approach as we do here.
The primary data sources used to create the index are Emsi’s labor market analytics (a comprehensive set of state and federal labor market sources comprised of industries, occupations, and demographics), O*NET, Johns Hopkins, ESRI, and Synergos.
The computational model analyzes and visualizes the data to demonstrate where risk is geographically located. An arbitrary grouping of the numeric data called Jenks  (natural breaks) method is used to see the differences in the distribution of the results. The data is attached to different levels of US Census geographies such as MSA and county and is used as a process to understand geographical patterns.
To validate the results of our computational model, we analyzed the correlation between the index and the number of COVID-19 cases by geography. More specifically, we used the Pearson Correlation tool, which uses the Pearson correlation coefficient and is typically denoted by (r) to measure the correlation (linear dependence) between the variables.
We used correlation to validate the prediction of the impact of the virus by market. To do this, we examined the relationship between 1) the Emsi Health Risk Index and 2) COVID-19 indicators (including incidence rates, deaths, and confirmed cases) by MSA.
Correlation, in the broadest sense, is a measure of an association between variables. In correlated data, correlations range from -1.00 to 1.00. The change in the magnitude of 1 variable is associated with a change in the magnitude of another variable, either in the same direction (positive correlation) or in the opposite direction (negative correlation):
The number (or magnitude) indicates strength. The closer the magnitude is to 1.00, the stronger the relationship.
We used two samples of MSAs for this linear comparison:
We categorized the results for correlation coefficients using J. D. Evans’s guide. Strength of correlation can be described as follows:
Magnitude of correlation
Description of strength
The results below show that the Emsi Health Risk index has a strong positive relationship to the spread of COVID-19 for predicting Tier 1 market risk, and a moderate positive relationship for predicting Tier 2 market risk.
Additionally, we looked at a correlation coefficient for the 866 counties associated with the Tier 1 and Tier 2 markets, and noted that there is still a moderate positive relationship (over 0.5) in counties. This relationship is significant, for even when we broke the model down into a larger sample size with smaller geopolitical regions, we still saw a strong relationship in predicting risk.
Finally, we monitored the model over time, testing the relationship as COVID-19 cases increased. The results have been an upward increase in the correlation coefficient for confirmed cases.
This methodology shows the importance of an interdisciplinary approach for understanding risk in communities at both a macro and micro level. Below are two examples of literature that support the creation of a risk index:
SVP, DATA Science, Emsi
Dr. Wayne Gearey is a location economist with a background in social epidemiology, and serves as the senior vice president of data science at Emsi, where he utilizes spatial data and technology to contextualize global markets and submarkets to produce strategic opportunities for clients. Dr. Gearey previously worked at JLL as a global strategist in Location Economics, and developed a network of regional data scientists to connect the firm on Business Intelligence. Dr. Gearey has worked with Amazon, Hitachi, GoPro, Comcast, OOCL, Yahoo, Children's Healthcare of Atlanta, Brinks, BP, Conifer, Twitter, United Healthcare, ANZ Bank, Microsoft, MetLife, CODA, Whirlpool, Shell, ADP, and CustomInk. He currently serves as a professor at University of Texas at Dallas in Location Economics, where he also sits on the dean’s advisory board. Dr. Gearey holds a PhD in social science from Coventry University in Coventry, UK, a Master of Science degree in data science from the University of Salford in Manchester, UK, a post-graduate diploma in GIS technology from Simon Fraser University in Canada, and a bachelor’s in political science from the University of Calgary. Additionally, Dr. Gearey completed post-graduate work in Social Epidemiology from Johns Hopkins University.
Rob Sentz is the chief innovation officer at Emsi, where he leads in furthering Emsi’s vision and research around labor market data and its application in higher education, economic development, workforce development, and talent acquisition. Rob also heads up Emsi’s content creation to create meaning out of labor market data and to help a broad array of audiences improve the way they connect people, economies, and work. For the past 14 years, Rob has created newsletters, videos, articles, and white papers, and led courses on a wide range of economic and labor market topics. Under his leadership, Emsi data has become regularly featured by national news outlets. Rob is a Forbes contributor and has also taught as an adjunct faculty member at the NYU School of Professional Studies. He holds bachelor’s and master’s degrees in environmental science.
1. Jenks, George F. 1967. "The Data Model Concept in Statistical Mapping", International Yearbook of Cartography 7: 186-190.