Impacts of the Coronavirus on Various Demographics in Maryland from 04/2020 to 11/2021

Thomas J. Garbe O'Connor
University of Maryland, College Park - Computer Science
12/20/2021

Introduction

The global pandemic caused by the Coronavirus, known as COVID-19, has had numerous profound impacts on people around the world. It has also lead to a large amount of lives lost over the months since the Spring of 2020. Looking at data provided by the state of Maryland in the United States, this project attempts to examine how various groups within Maryland have been affected in terms of deaths caused by COVID-19.

Definitions

Probable Case/Death
The CDC provides a definition for "probable cases", stating:

A probable case or death is defined as any one of the following:
• Meets clinical criteria AND epidemiologic linkage with no confirmatory laboratory testing performed for SARS-CoV-2
• Meets presumptive laboratory evidence
• Meets vital records criteria with no confirmatory laboratory evidence for SARS-CoV-2
Any cases and deaths classified as probable are included in CDC case counts. The same applies to any cases and deaths classified as confirmed.

Since probable deaths are included in all CDC counts, they will be included here as well. More information on probable cases can be found at The COVID Tracking Project.
Confirmed Case/Death
As stated by covidtracking.com, a confirmed case or death occurs when the individual's PCR test returns positive. Confirmed deaths will be examined in addition with probable deaths.

About the Data

The data was gathered from the Open Data Portal provided by the state of Maryland. The following datasets were taken from this source and compiled into one dataset:

Probable Death Datasets

  1. MD COVID-19 - Total Probable Deaths Statewide
  2. MD COVID-19 - Total Probable Deaths by Date of Death
  3. MD COVID-19 - Probable Deaths by Gender Distribution
  4. MD COVID-19 - Probable Deaths by Race and Ethnicity Distribution
  5. MD COVID-19 - Probable Deaths by Age Distribution

Confirmed Death Datasets

  1. MD COVID-19 - Total Confirmed Deaths Statewide
  2. MD COVID-19 - Total Confirmed Deaths by Date of Death
  3. MD COVID-19 - Confirmed Deaths by Gender Distribution
  4. MD COVID-19 - Confirmed Deaths by Race and Ethnicity Distribution
  5. MD COVID-19 - Confirmed Deaths by Age Distribution

These datasets were merged on their date, and rows that consisted of mostly missing values were removed. Rows for December 2021 were also removed due to only a handleful of days being added to the datasets before the cyberattack on Maryland's Health Department.

Data Retrieval and Cleaning

Each dataset is read from a CSV file provided through a URL, then converted to an appropriately named DataFrame through use of the Pandas library. Once all of the data is obtained, the datasets are merged together, removing unecessary columns for object ID and renaming columns as necessary to keep a consistent naming convention. Since all of the numeric, non-date, values were imported as floating point numbers they were converted to type integer. Finally, rows missing data are removed, as are the handful of days that came in for December.
This process is done for datasets referring to probable deaths and datasets referring to confirmed death seperately, but a joint dataset is made at the end.

Probable Death Datasets

Confirmed Death Datasets

Joint Probable Death and Confirmed Death Dataset

A joint DataFrame is created that joins the two previous DataFrames, Probable and Confirmed, so that a total number of deaths can be analyzed. Note that the data for confirmed deaths begins at an earlier point in time, so probable death values for those days are added with zeros, assuming that no probable deaths were recorded for those days. Summations of corresponding columns (e.g. 'female_pdf' and 'female_cdf') make up the new joint dataframe.

Data Analysis

Now that the data is gathered into managable sets, analysis will be done to hopefully provide some insight regarding deaths recorded for each demographic group provided (i.e gender, race, age). The following graphs will show cumulative deaths within the indicated demographic over time from the joint dataset of both probable and confirmed COVID-19 deaths.

Gender

'Unknown' is left out of the plot due to it always remaining at zero.

Race

Age

Digging Deeper

Upon initial inspection it appears that older white males are the most common demographic to have died of COVID-19.
Male deaths do not exceed female deaths by a wide margin, meaning it would be inaccurate to say the vast majority of deaths from COVID-19 are male. While this even split between male and female is somewhat uninsightful, it is worthy of mention that the virus seems to affect men and women evenly. What might be more inisghtful, given more data, is seeing cumulative totals for other gender definitions such as nonbinary, trans-male, and trans-female. While it is not suspected that rate of death varies necessarily by gender alone, these groups might be exposed to higher risk professions or lack the healthcare that would allow them to become either a probable or confirmed case in the eyes of a healthcare facility.
Furthermore, it seems that the older an individual is, the more likely they are to die from COVID-19 with the '80+' age group making up nearly half of the total deaths. The average age of death was calculated by taking the cumulative counts from November 30th, multiplying them by the middle year of their age group (e.g. 20-29 => 25, 80+ => 85), adding those products together, and dividing by the sum of all cumulative counts (i.e. total cases). The resulting average age of those who lost their life to COVID-19 in Maryland is approximately 73 years.
The demographic that may need a deeper analysis is race. It seems intuitive that people who identify as White would make up the majority of deaths from COVID-19 because people who identify as White make up the majority of Maryland's population. Looking at 2020 Census Data reveals that people of White ethnicity make up 50 - 58.5% of Maryland's population whereas the second most prevelant group, Black and/or African American, make up 31.1%. Perhaps taking a look at the number of deaths as a percentage of each ethnicity's indivudual population would provide more insight.

Organizing Race Data

The percentages given by the Census Bureau will be stored for calculations that describe the percentage of lives lost within a specific race population. Also needed for this calculation is the total population of Maryland which can also be found in the 2020 Census Data.
The calculation done for each racial group is as follows:

racePopulation = totalPopulation*racePercentage
percentOfPopulation = raceCumulativeDeaths / racePopulation


Note that the percentage of those who identify as white being used is 50% because this is the value that does not include Hispanic or Latin people. Also note that, per the Census Bureau's website, "Hispanics may be of any race, so also are included in applicable race categories" which may be the reason why the percentages do not add up to 100%.
The 'unknown' category of race provided in the data gathered above will be left out of this analysis due to not having a racial population total to compare it against.

Visualizing Race Data

Now that the values representing the percentage of race population have been calculated and stored, they can be visualized in a plot, bar chart, and table. Numerical percentages are also given in decimal value.

Analyzing Race Data

As is shown in the visualizations above, more of the proportional Black/African American population has died from the Coronavirus than the proportional White population. This suggests that the COVID-19 pandemic has had a more severe impact on Black/African American people in Maryland than other racial groups. Furthermore, it is easier to understand the impact on other groups, such as Hispanic or Latin people, whose percentage conveys a more similar impact to White people than the first graph on race did. Likewise, the impact of the pandemic can be better appreciated for the other racial communities, such as Asians, illustrating a situation that does not diminish their comparitive loss of life.

In an attempt to see how consistent the observed trends are, linear regression models are trained to predict what proportions of cumulative death counts make up each racial demographic. In other words, if the number of cumulative deaths equals 7000, the model must accurately predict how many of those 7000 are Black/African American, White, Hispanic, Asian, Other, or Unknown. A plot is made for each seperate model to increase visual readability.

Visual Analysis

The models for African American and White racial groups seem to predict future proportions of total deaths fairly well. The third best model appears to be that for the "Other" racial group. Lastly, the models for the Hispanic and Asian groups are somewhat lacking in accuracy. Inaccuracies in each model are speculated to be in part due to dips in the actual data that may signify periods of enforced mask mandates, social distancing, etc, and peaks in the actual data reflecting surges of COVID-19 cases. Additional information from data on hospitalizations and vaccinations may provide further insight to trends in death from COVID-19.

Numerical Analysis

To understand the consistency of trends in deaths for each racial group numerically, Ordinary Least Squares linear regression will be performed in order to provide insight on statistical significance and proportion of variance for each racial group.

First, examining the R-Squared values of the OLS linear regression shows that the models fit the data fairly well, with each value being greater than .97 which is therefore close to 1.0. Second, the p-values show statistical significance with values either at or near 0.00, leading to a rejection of the null hypothesis for each. Overall, a consistent trend seems to be predictable for proportional deaths of each racial group within the total number of deaths from COVID-19.

Conclusion

Through visualization and analysis of the COVID-19 deaths data provided by the state of Maryland, United States, insightful findings have been discovered on the pandemic's impact on different demographics regarding gender, age, and race within Maryland.
Regarding gender, there is not a significant difference in the number of males dying from COVID-19 compared to females, and providing more gender groups may be somewhat more insightful.
Among the age groups, the vast majority of deaths are made up by the older population. Most COVID-19 deaths have been suffered by individuals 80 years old or higher, with each younger group by decade exceeding the next. It almost goes without saying that the pandemic has had a severe impact on senior citizens compared to younger age groups.
Initial findings seemed to suggest that people of White racial background suffered a more severe impact, with Black/African American people making up the second largest portion, and all other racial groups trailing with far smaller portions. However, examining the number of deaths for each race within its own racial population provided more of a perspective, showing not only that the different groups were more similarly impacted than what was initially evident, but also that people of Black/African American racial background are the most severely impacted.
Linear regression models are able to predict these trends, with error possibly due to surges in cases and/or mask mandates, conveying the consistency of these findings. Adding data on vaccination rates, hospitalizations, and possibly even individual economic status would likely provide even more information on who is effected by deaths from COVID-19. Understanding data such as this helps conceptualize the additional measures that may need to be taken to care for certain groups. It is clear that no one is safe from this pandemic, and all demographic groups analyzed here are severely impacted by the virus. However, groups that are sustaining more deaths than others need to be recogonized so their situations can be addressed.