Introduction

Column

Motivation

Predictive Modelling Of Mortality Rates for Diabetes: An Analysis Of Risk Factors

Diabetes is a chronic disease affecting millions globally and a leading cause of death. Understanding mortality trends and predicting future mortality rates is crucial for public health planning and intervention strategies. Our objective is to identify critical factors that contribute to mortality from diabetes and develop a suitable model for predicting future mortality rates. The outcomes of this study could inform policies and interventions aimed at reducing the burden of diabetes-related mortality.

An overlook of the dataset

This project analyzes mortality rates for diabetes in the United States from 2015 to 2020 and employs various predictive methods in Statistics and Machine Learning to forecast future mortality rates.

Variables:

  • Locations: States, Regions
  • Gender : Female, Male
  • Age: Five-Year age group from age 0-84
  • Race: White, Black or African American, American Indian or Alaska Native, Asian or Pacific Islander
  • Ethnicity: Hispanic or Latino, Not Hispanic or Latino
  • Year: From 2015 - 2020

Resource: CDC WONDER

Column

Glimpse

Mortality

Data Exploration

Column

Age

Ethnicity

Race

Sex

Year

Region

State

Panel Graph

Summary

According to the age distribution, the older the age, the higher the mortality rate of diabetics. the greatest mortality rate was recorded in the 80-84 age group, reaching 5963 per 100,000. More surprisingly, infants younger than one year of age also have a high mortality rate, ranking seventh out of 18 age groups, at 570 per 100,000. The lowest mortality rate was in the 5-9 age group, with a rate of 13 per 100,000.

In terms of ethnic distribution, the death rate for non-Hispanics and Latinos is higher than that for Hispanics, almost twice as high. American Indians or Alaska Natives have the highest mortality rate, with a rate of approximately 1061 per 100,000. Asian and Pacific Islanders have the lowest mortality rate, with a rate of approximately 364 per 100,000. By gender distribution, males have a higher mortality rate than females.

The geographical distribution is clear. According to the regional map, the North Central and Southern regions have the highest mortality rates, even almost identical, at around 729, but when it comes to mortality rates per state, seven of the top ten states are from the Southern region, while the top six states are all from the South.

Since 2015, through 2019, the diabetes death rate has steadily increased in each year based on trends, but not by much. But the mortality rate in 2020 is clearly and significantly higher, and this reason is supposedly related to the outbreak of corona virus in March 2020.

Modeling

Column

Negative Binomial Regression Models

Negative binomial regression is used to predict the count of deaths instead of Poisson regression. As checked, in this dataset, there is a huge gap between the mean and variance of the number of deaths.

Negative binomial regression assumes that the count data follows a negative binomial distribution, and the predictor variables have a linear relationship with the logarithm of the expected count.

Link Function:

\(\log{\mbox{deaths}} = \mbox{Intercept} + b_1x_1+b_2x_2+\cdot + b_kx_k\)

Models Considered

Models Predictors
Model 1 Sex
Model 2 Race
Model 3 Age Group
Model 4 Age Group, Race
Model 5 Race, Sex
Model 6 Age Group, Sex
Model 7 Age Group, Race, Sex
Model 8 Age Group, Race, Sex, Ethnicity
Model 9 Age Group, Race, Sex, Ethnicity, Year
Model 10 Age Group, Race, Sex, Ethnicity, State
Model 11 Age Group, Race, Sex, Ethnicity, Year, State

Column

Model Comparison

d.f. dev disp AIC 2Llik
Model 1 29655 38327.05 0.43 394593.2 -394587.2
Model 2 29653 37035.48 0.54 385368.9 -385358.9
Model 3 29639 36392.88 0.60 380666.2 -380628.2
Model 4 29636 34917.13 0.80 368987.6 -368943.6
Model 5 29652 37023.40 0.54 385282.5 -385270.5
Model 6 29638 36332.84 0.60 380225.4 -380185.4
Model 7 29635 34842.72 0.81 368383.2 -368337.2
Model 8 29634 34187.59 0.94 362565.6 -362517.6
Model 9 29633 34177.46 0.95 362472.7 -362422.7
Model 10 29585 31961.29 1.87 338436.4 -338290.4
Model 11 29584 31940.07 1.88 338160.2 -338012.2

Diagnosis

Column

Model 1

Quantile - Quantile Plot

Probability Density Plot of Death due to Diabetes

Model 2

Quantile - Quantile Plot

Probability Density Plot of Death due to Diabetes

Model 3

Quantile - Quantile Plot

Probability Density Plot of Death due to Diabetes

Model 4

Quantile - Quantile Plot

Probability Density Plot of Death due to Diabetes

Model 5

Quantile - Quantile Plot

Probability Density Plot of Death due to Diabetes

Model 6

Quantile - Quantile Plot

Probability Density Plot of Death due to Diabetes

Model 7

Quantile - Quantile Plot

Probability Density Plot of Death due to Diabetes

Model 8

Quantile - Quantile Plot

Probability Density Plot of Death due to Diabetes

Model 9

Quantile - Quantile Plot

Probability Density Plot of Death due to Diabetes

Model 10

Quantile - Quantile Plot

Probability Density Plot of Death due to Diabetes

Model 11

Quantile - Quantile Plot

Probability Density Plot of Death due to Diabetes

Discussion

Column

Result

Estimate Std. Error z value Pr(>|z|) Relative Rates
(Intercept) -83.9575 5.0791 -16.5301 0.0000 0.00
Age_Groups1-4 -1.7116 0.0343 -49.8432 0.0000 0.18
Age_Groups5-9 -2.1792 0.0395 -55.2057 0.0000 0.11
Age_Groups10-14 -1.9433 0.0358 -54.2072 0.0000 0.14
Age_Groups15-19 -0.7250 0.0289 -25.0662 0.0000 0.48
Age_Groups20-24 -0.1570 0.0272 -5.7687 0.0000 0.85
Age_Groups25-29 0.0695 0.0266 2.6083 0.0091 1.07
Age_Groups30-34 0.1861 0.0263 7.0690 0.0000 1.20
Age_Groups35-39 0.3233 0.0260 12.4187 0.0000 1.38
Age_Groups40-44 0.4668 0.0257 18.1496 0.0000 1.59
Age_Groups45-49 0.7794 0.0252 30.8859 0.0000 2.18
Age_Groups50-54 1.1258 0.0248 45.3247 0.0000 3.08
Age_Groups55-59 1.4469 0.0245 59.1772 0.0000 4.25
Age_Groups60-64 1.6606 0.0243 68.2836 0.0000 5.26
Age_Groups65-69 1.7824 0.0242 73.6662 0.0000 5.94
Age_Groups70-74 1.8682 0.0242 77.2892 0.0000 6.48
Age_Groups75-79 1.9507 0.0242 80.6815 0.0000 7.03
Age_Groups80-84 2.0577 0.0242 85.1017 0.0000 7.83
RaceAsian or Pacific Islander -0.3769 0.0203 -18.5608 0.0000 0.69
RaceBlack or African American 1.4995 0.0187 80.3979 0.0000 4.48
RaceWhite 2.8806 0.0178 161.9116 0.0000 17.82
SexMale 0.3888 0.0087 44.8165 0.0000 1.48
EthnicityNot Hispanic or Latino 2.2334 0.0119 187.2333 0.0000 9.33
Year 0.0417 0.0025 16.5794 0.0000 1.04
StateAlaska -0.9074 0.0507 -17.8897 0.0000 0.40
StateArizona 0.3625 0.0407 8.9172 0.0000 1.44
StateArkansas -0.6982 0.0460 -15.1911 0.0000 0.50
StateCalifornia 1.9188 0.0388 49.4843 0.0000 6.81
StateColorado -0.2995 0.0424 -7.0615 0.0000 0.74
StateConnecticut -0.8107 0.0446 -18.1629 0.0000 0.44
StateDelaware -1.7077 0.0508 -33.6317 0.0000 0.18
StateFlorida 1.0673 0.0398 26.7844 0.0000 2.91
StateGeorgia 0.4867 0.0419 11.6120 0.0000 1.63
StateHawaii 0.5450 0.0483 11.2953 0.0000 1.72
StateIdaho -1.4138 0.0530 -26.6877 0.0000 0.24
StateIllinois 0.6257 0.0417 15.0209 0.0000 1.87
StateIndiana -0.2253 0.0433 -5.1987 0.0000 0.80
StateIowa -1.1653 0.0477 -24.4357 0.0000 0.31
StateKansas -1.1252 0.0439 -25.6156 0.0000 0.32
StateKentucky -0.4378 0.0465 -9.4204 0.0000 0.65
StateLouisiana -0.0567 0.0435 -1.3022 0.1928 0.94
StateMaine -1.2720 0.0638 -19.9253 0.0000 0.28
StateMaryland 0.0025 0.0429 0.0574 0.9542 1.00
StateMassachusetts -0.3581 0.0427 -8.3789 0.0000 0.70
StateMichigan 0.1221 0.0414 2.9471 0.0032 1.13
StateMinnesota -0.7054 0.0421 -16.7632 0.0000 0.49
StateMississippi -0.1515 0.0473 -3.2066 0.0013 0.86
StateMissouri -0.2451 0.0437 -5.6122 0.0000 0.78
StateMontana -1.0990 0.0532 -20.6553 0.0000 0.33
StateNebraska -1.5997 0.0485 -32.9612 0.0000 0.20
StateNevada -0.5055 0.0427 -11.8427 0.0000 0.60
StateNew Hampshire -1.3846 0.0641 -21.5898 0.0000 0.25
StateNew Jersey 0.3224 0.0420 7.6727 0.0000 1.38
StateNew Mexico 0.0127 0.0443 0.2871 0.7741 1.01
StateNew York 0.9132 0.0398 22.9527 0.0000 2.49
StateNorth Carolina 0.2324 0.0409 5.6881 0.0000 1.26
StateNorth Dakota -1.6883 0.0566 -29.8230 0.0000 0.18
StateOhio 0.3377 0.0425 7.9365 0.0000 1.40
StateOklahoma -0.2144 0.0418 -5.1280 0.0000 0.81
StateOregon -0.8456 0.0434 -19.4801 0.0000 0.43
StatePennsylvania 0.3950 0.0420 9.3992 0.0000 1.48
StateRhode Island -2.0280 0.0523 -38.7602 0.0000 0.13
StateSouth Carolina -0.1325 0.0441 -3.0073 0.0026 0.88
StateSouth Dakota -0.9940 0.0534 -18.6271 0.0000 0.37
StateTennessee -0.0249 0.0433 -0.5753 0.5651 0.98
StateTexas 1.6531 0.0402 41.1271 0.0000 5.22
StateUtah -0.9879 0.0466 -21.1936 0.0000 0.37
StateVermont -2.0920 0.0668 -31.3332 0.0000 0.12
StateVirginia 0.0695 0.0422 1.6491 0.0991 1.07
StateWashington -0.1205 0.0411 -2.9362 0.0033 0.89
StateWest Virginia -1.0149 0.0531 -19.1294 0.0000 0.36
StateWisconsin -0.6356 0.0423 -15.0323 0.0000 0.53
StateWyoming -2.2503 0.0608 -37.0395 0.0000 0.11

Column

Analysis

The baselines of each factor are :

  • Age group: <1

  • Race: American Indian or Alaska native

  • Ethnicity: Hispanic or Latino

  • Sex: Female

  • State: Alabama

Analysis:

  • Ethnicity: The risk of dying due to diabetes among Non-Hispanic or Latino is about 9.33 times higher than that for the Hispanic or Latino holding all other variables constant.

  • Race: Highest risk of death due to diabetes is reported for white Americans. A person of white is 17.82 times more likely to die due to diabetes than American Indian or Alaska Native on average holding all other variables constant.

  • Sex: Males have about 1.48 times higher rick of dying due to diabetes than females on average holding all other variables constant.

Limitation

  • Not enough variables

  • Inability to optimize results

  • No breakdown of causes of death.

About the author

This dashboard is for my capstone project, under the supervision of Dr. Ying-Ju Tessa Chen.

  • Mathematics major

  • Data analysis minor

  • ENTP

---
title: "Diabetes Mortality"
output: 
  flexdashboard::flex_dashboard:
    theme:
      bootswatch: zephyr
      primary: "#96ACA3"
    orientation: columns
    vertical_layout: fill
    social: menu
    source_code: embed
---

<style type="text/css">
.chart-title {  /* chart_title  */
    font-size: 18px;
  }
body{ /* Normal */
      font_size: 16px
        }
</style>

```{r setup, include=FALSE}
pacman:: p_load(DT, flexdashboard, gridExtra, knitr, MASS, plotly, rpart, rpart.plot, tidyverse, usmap,dplyr,MASS, pacman, ggplot2, faraway)

```

```{r data cleaning, eval=FALSE}
#----------read file and combine them vertically by year.-----------
path = "../Data"
allpopfiles <- list.files(path, pattern = ".txt", full.names = TRUE)

diabetes <- data.frame()

for (i in 1:length(allpopfiles)){
  temp <- read_tsv(allpopfiles[i])
  temp <- temp %>% 
    mutate(Year = str_extract(allpopfiles[i], "20.."))
  diabetes <- rbind.data.frame(diabetes, temp)
}
```

```{r read_data}
diabetes <- read_csv("../Data/diabetes.csv")
#------------create a new variables "region"------------------------
Regions <- cbind.data.frame(state.name, state.region)

Northeast <- Regions$state.name[Regions$state.region=="Northeast"]
South <- Regions$state.name[Regions$state.region=="South"]
North_Central <- Regions$state.name[Regions$state.region=="North Central"]
West <- Regions$state.name[Regions$state.region=="West"]


diabetes <- diabetes %>% 
  rename(Age_Groups = `Five-Year Age Groups`,
         Ethnicity =`Hispanic Origin`,
         Crude_Rate = `Crude Rate`,
         Sex = Gender) %>% 
  dplyr::select(-c(Notes,`State Code`,`Gender Code`,`Race Code`,`Hispanic Origin Code`,`Five-Year Age Groups Code`)) %>%
  mutate(Region = case_when(
    State %in% Northeast ~ "Northeast",
    State %in% South ~ "South",
    State %in% North_Central ~ "North_Central",
    State %in% West ~ "West"))

all_ages_groups <- c("< 1", "1-4", "5-9", "10-14", "15-19",
                     "20-24", "25-29", "30-34", "35-39", "40-44",
                     "45-49", "50-54", "55-59", "60-64", "65-69",
                     "70-74", "75-79", "80-84")

diabetes$Age_Groups <- factor(gsub(" years| year", "",
                                   diabetes$Age_Groups),
                              levels = all_ages_groups)


#-------------create a new variable "Death_Rate" per 10,0000---------
diabetes <- diabetes %>% filter(Population != "Not Applicable")
diabetes$Population <- as.numeric(diabetes$Population)
diabetes <- mutate(diabetes,Mortality_rate = round(Deaths/Population * 100000,2)) %>% 
  dplyr::select(-Crude_Rate)

#--------------Data Cleaning------------------
diabetes[sapply(diabetes, is.character)] <- lapply(diabetes[sapply(diabetes, is.character)], as.factor)
diabetes <- diabetes %>% filter(State != "District of Columbia")
diabetes$Year <- as.numeric(diabetes$Year)
```


Introduction
======================================================================
Column {data-width=450}
----------------------------------------------------------------------

### Motivation

<font size=3>
**Predictive Modelling Of Mortality Rates for Diabetes: An Analysis Of Risk Factors**
</font>

Diabetes is a chronic disease affecting millions globally and a leading cause of death. Understanding mortality trends and predicting future mortality rates is crucial for public health planning and intervention strategies. Our objective is to identify critical factors that contribute to mortality from diabetes and develop a suitable model for predicting future mortality rates. The outcomes of this study could inform policies and interventions aimed at reducing the burden of diabetes-related mortality.



### An overlook of the dataset
This project analyzes mortality rates for diabetes in the United States from 2015 to 2020 and employs various predictive methods in Statistics and Machine Learning to forecast future mortality rates.

Variables: 

- Locations: States, Regions
- Gender : Female, Male
- Age: Five-Year age group from age 0-84
- Race: White, Black or African American, American Indian or Alaska Native, Asian or Pacific Islander
- Ethnicity: Hispanic or Latino, Not Hispanic or Latino
- Year: From 2015 - 2020

Resource: [CDC WONDER](https://wonder.cdc.gov/ucd-icd10.html)

Column {data-width=550}
-----------------------------------------------------------------------

### Glimpse

```{r}
datatable(diabetes, rownames = FALSE,
              colnames = c("State", "Sex", "Age Group", "Race", "Ethnicity", "Death", "Population", "Year", "Region", "Mortality (Per 100,000)"),
              class = "hover",
              options = list(
                columnDefs = list(list(className = 'dt-center', 
                                       targets = 1:5)),
                pageLength = 10,
                initComplete = JS("function(settings, json) {",
      "$(this.api().table().header()).css({'background-color': '#96ACA3', 'color': '#fff'});","}")
))
```


Mortality
======================================================================

```{r, align='center'}
diabetes_state <- diabetes %>% 
  group_by(State) %>% 
  summarise(Deaths = sum(Deaths), Population = sum(Population)) %>%
  mutate(Mortality_rate = round(Deaths/Population*100000,2)) %>% 
  right_join(Regions,by=c("State" = "state.name")) %>% 
  rename(Region = state.region)
US_map <- us_map("state") %>% filter(full !=  "District of Columbia")

rate_data <- diabetes_state %>% 
  left_join(US_map, by = c("State"="full"))

label <- rate_data %>% 
  group_by(abbr) %>% 
  summarise(x = mean(x), y = mean(y))

p <- ggplot(rate_data, aes(x = x, y = y,color = "black")) +
  geom_polygon(aes(group = group,  fill = Mortality_rate, 
        text = paste0(State,":\n", Mortality_rate, " deaths per 100,000")),
        colour = "black") +
  geom_text(aes(label = abbr),data = label,fontface = "bold",color="white") +
  scale_fill_continuous(low = "#F6F5E9", high = "#6F6360",name = "Mortality rate") +  
  theme_minimal() +
  theme(axis.title.x=element_blank(), axis.text.x=element_blank(),
        axis.ticks.x=element_blank(), axis.title.y=element_blank(),
        axis.text.y =element_blank(), axis.ticks.y=element_blank(), 
        panel.grid.major = element_blank(), 
        panel.background = element_blank())+
  labs(title = "Mortality Rate Per 100,000 in the United States")

ggplotly(p, tooltip = "text", width = 1100, height = 700)

```

Data Exploration
===

Column {.tabset data-width=700}
-----------------------------------------------------------------------

### Age
```{r}
diabetes_age <- diabetes %>% 
  group_by(Age_Groups) %>% 
  summarise(Deaths = sum(Deaths), Population = sum(Population)) %>%
  mutate(Mortality = round(Deaths/Population*100000,2))

a <- ggplot(diabetes_age,aes(y=reorder(Age_Groups, Mortality),x = Mortality)) + 
  geom_col(fill = "#A28A8F",aes(text = paste0("Mortality: ", Mortality, " deaths per 100,000"))) +
  xlab("Mortality Rate Per 100,000 People") + 
  ylab("Age Group (Years)") + 
  theme(axis.text.y = element_text(size = 12), axis.text.x = element_text(size = 12))

ggplotly(a,tooltip = "text")
```

### Ethnicity
```{r}
diabetes_Ethnicity <- diabetes %>% 
  group_by(Ethnicity) %>% 
  summarise(Deaths = sum(Deaths), Population = sum(Population)) %>%
  mutate(Mortality = round(Deaths/Population*100000,2))

h <- ggplot(diabetes_Ethnicity,aes(x=Ethnicity,y=Mortality)) + 
  geom_col(fill = "#73564C") + 
  ylab("Mortality Rate Per 100,000 People") + 
  xlab("Ethnicity")+
  theme(axis.text.y = element_text(size = 12), axis.text.x = element_text(size = 12))

ggplotly(h)
```


### Race 
```{r}
diabetes_race <- diabetes %>% 
  group_by(Race) %>% 
  summarise(Deaths = sum(Deaths), Population = sum(Population)) %>%
  mutate(Mortality= round(Deaths/Population*100000,2))


r <- ggplot(diabetes_race,aes(y=reorder(Race,Mortality),x=Mortality)) + 
  geom_col(fill = "#98A28A",aes(text = paste0("Mortality: ", Mortality, " deaths per 100,000")))+
  xlab("Mortality Rate Per 100,000 People") + 
  ylab("Race") +
  theme(axis.text.y = element_text(size = 12), 
        axis.text.x = element_text(size = 12))

ggplotly(r)
```


### Sex
```{r}
diabetes_sex <- diabetes %>% 
  group_by(Sex) %>% 
  summarise(Deaths = sum(Deaths), Population = sum(Population)) %>%
  mutate(Mortality = round(Deaths/Population*100000,2))

s <- ggplot(diabetes_sex,aes(x=Sex,y=Mortality)) + 
  geom_col(fill = "#BF9568")+
  ylab("Mortatility Rate Per 100,000 People")+ 
  theme(axis.text.y = element_text(size = 12), axis.text.x = element_text(size = 12))
ggplotly(s)
```

### Year

```{r}
diabetes_year <- diabetes %>% 
  group_by(Year) %>% 
  summarise(Deaths = sum(Deaths), Population = sum(Population)) %>%
  mutate(Mortality = round(Deaths/Population*100000,2))

y <- (ggplot(diabetes_year,aes(x=Year,y=Mortality))) + 
  geom_col(fill = "#316C6C")+
    ylab("Mortatility Rate Per 100,000 People")+ 
  theme(axis.text.y = element_text(size = 12), axis.text.x = element_text(size = 12))

ggplotly(y)
        


```

### Region

```{r}
## mortality rate by region

diabetes_region <- diabetes %>% 
  group_by(Region) %>% 
  summarise(Mortality = round(sum(Deaths)/sum(Population)*100000, 2))

r <- diabetes_region %>% 
  ggplot(aes(x = Region, y = Mortality)) +
  geom_bar(stat="identity", fill="#637493") +
  ylab("Mortatility Rate Per 100,000 People") +
  geom_polygon(aes(text = paste0("Mortality rate: ", Mortality, " deaths per 100,000")), colour = "black")+ 
  theme(axis.text.y = element_text(size = 12), axis.text.x = element_text(size = 12))

ggplotly(r)
```


### State
```{r}
diabetes_state <- diabetes %>% 
  group_by(State) %>% 
  summarise(Deaths = sum(Deaths), Population = sum(Population)) %>%
  mutate(Mortality= round(Deaths/Population*100000, 2)) %>% 
  right_join(Regions,by=c("State" = "state.name")) %>% 
  rename(Region = state.region)

q <- ggplot(diabetes_state, aes(y=reorder(State,Mortality),
                               x=Mortality,fill = Region)) + 
  geom_col(aes(text = paste0("Mortality: ", Mortality, " deaths per 100,000"))) +
  ggtitle("Mortality rate of 6 years in each state")+
  ylab("States")+
  xlab("Mortality Rate")+
  geom_polygon(aes(text = paste0("Mortality: ", Mortality, " deaths per 100,000")), colour = "black")

ggplotly(q,tooltip = "text")

```


### Panel Graph
```{r}
diabetes_Year <- diabetes %>% 
  group_by(Year, Sex, Race, Ethnicity) %>% 
  summarise(Mortality = round(sum(Deaths)/sum(Population)*100000, 2))

p_year <- diabetes_Year %>% 
  ggplot(aes(x=Mortality, y=Race, fill=Sex)) +
  geom_bar(stat = "identity",
           aes(text = paste0(Sex, Race, ":\n",
                             "Mortality Rate: ", Mortality, " Per 100,000 People."))) + 
  facet_grid(Ethnicity~Year) + 
  xlab("Mortatility rate Per 100,000 People") +
  scale_fill_manual(values = c("Male" = "#96ACA3",
                               "Female" = "#F5A997")) +
  theme(text = element_text(size = 12), legend.position = "none") +
  theme(axis.text.y = element_text(size = 8))  

ggplotly(p_year, tooltip = "text")

```


### Summary

According to the age distribution, the older the age, the higher the mortality rate of diabetics. the greatest mortality rate was recorded in the 80-84 age group, reaching 5963 per 100,000. More surprisingly, infants younger than one year of age also have a high mortality rate, ranking seventh out of 18 age groups, at 570 per 100,000. The lowest mortality rate was in the 5-9 age group, with a rate of 13 per 100,000.

In terms of ethnic distribution, the death rate for non-Hispanics and Latinos is higher than that for Hispanics, almost twice as high. American Indians or Alaska Natives have the highest mortality rate, with a rate of approximately 1061 per 100,000. Asian and Pacific Islanders have the lowest mortality rate, with a rate of approximately 364 per 100,000. By gender distribution, males have a higher mortality rate than females.

The geographical distribution is clear. According to the regional map, the North Central and Southern regions have the highest mortality rates, even almost identical, at around 729, but when it comes to mortality rates per state, seven of the top ten states are from the Southern region, while the top six states are all from the South.

Since 2015, through 2019, the diabetes death rate has steadily increased in each year based on trends, but not by much. But the mortality rate in 2020 is clearly and significantly higher, and this reason is supposedly related to the outbreak of corona virus in March 2020. 

Modeling
===

Column {.tabset data-width=350}
---------------------------------------------------------------------


### Negative Binomial Regression Models

Negative binomial regression is used to predict the count of deaths instead of Poisson regression. As checked, in this dataset, there is a huge gap between the mean and variance of the number of deaths.

Negative binomial regression assumes that the count data follows a negative binomial distribution, and the predictor variables have a linear relationship with the logarithm of the expected count.

**Link Function:**

$\log{\mbox{deaths}} = \mbox{Intercept} + b_1x_1+b_2x_2+\cdot + b_kx_k$

### Models Considered

```{r allmodels}
Models <- paste("Model", 1:11)
Predictors <- c("Sex", "Race", "Age Group", "Age Group, Race", 
               "Race, Sex", "Age Group, Sex", "Age Group, Race, Sex",
               "Age Group, Race, Sex, Ethnicity",
               "Age Group, Race, Sex, Ethnicity, Year",
               "Age Group, Race, Sex, Ethnicity, State",
               "Age Group, Race, Sex, Ethnicity, Year, State"
               )

all_models <- data.frame(Models = Models, Predictors = Predictors)
kable(all_models)
```


Column {data-width=650}
----------------------------------------------------------------------

### Model Comparison

```{r,include=FALSE}

# + offset(log(Population/100000)), link=log,

#Model 1
fit.sex = glm.nb(Deaths ~ Sex, data = diabetes)
p.val1 = pchisq(deviance(fit.sex), df.residual(fit.sex), lower = F)
p.val1

m1 <- round(c(df.residual(fit.sex), deviance(fit.sex), fit.sex$theta, 
               fit.sex$aic, fit.sex$twologlik),2)

#Model 2
fit.race = glm.nb(Deaths ~ Race,data = diabetes)

p.val2 = pchisq(deviance(fit.race), df.residual(fit.race), lower = F)
p.val2

m2 <- round(c(df.residual(fit.race), deviance(fit.race), fit.race$theta, 
              fit.race$aic, fit.race$twologlik),2)

#Model 3
fit.age = glm.nb(Deaths ~ Age_Groups, data = diabetes )

p.val3 = pchisq(deviance(fit.age), df.residual(fit.age), lower = F)
p.val3

m3 <- round(c(df.residual(fit.age), deviance(fit.age), fit.age$theta, 
              fit.age$aic, fit.age$twologlik),2)

#Model 4
fit.age.race = glm.nb(Deaths ~ Age_Groups + Race, data = diabetes)

p.val4 = pchisq(deviance(fit.age.race), df.residual(fit.age.race), lower = F)
p.val4

m4 <- round(c(df.residual(fit.age.race), deviance(fit.age.race), fit.age.race$theta, 
              fit.age.race$aic, fit.age.race$twologlik),2)

#Model 5
fit.sex.race = glm.nb(Deaths ~ Race + Sex, data = diabetes)

p.val5 = pchisq(deviance(fit.sex.race), df.residual(fit.sex.race), lower = F)
p.val5

m5 <- round(c(df.residual(fit.sex.race), deviance(fit.sex.race), fit.sex.race$theta, 
              fit.sex.race$aic, fit.sex.race$twologlik),2)

#Model 6
fit.sex.age = glm.nb(Deaths ~ Age_Groups + Sex, data = diabetes)

p.val6 = pchisq(deviance(fit.sex.age), df.residual(fit.sex.age), lower = F)
p.val6

m6 <- round(c(df.residual(fit.sex.age), deviance(fit.sex.age), fit.sex.age$theta, 
              fit.sex.age$aic, fit.sex.age$twologlik),2)

#Model 7
fit.sex.age.race = glm.nb(Deaths ~ Age_Groups + Race + Sex, data = diabetes)

p.val7 = pchisq(deviance(fit.sex.age.race), df.residual(fit.sex.age.race), lower = F)
p.val7

m7 <- round(c(df.residual(fit.sex.age.race), deviance(fit.sex.age.race), fit.sex.age.race$theta, 
              fit.sex.age.race$aic, fit.sex.age.race$twologlik),2)

#Model 8
fit.sex.age.race.his = glm.nb(Deaths ~ Age_Groups + Race + Sex +  Ethnicity,data = diabetes)

p.val8 = pchisq(deviance(fit.sex.age.race.his), df.residual(fit.sex.age.race.his), lower = F)
p.val8

m8 <- round(c(df.residual(fit.sex.age.race.his), deviance(fit.sex.age.race.his), fit.sex.age.race.his$theta, 
              fit.sex.age.race.his$aic, fit.sex.age.race.his$twologlik),2)

#Model 9
fit.sex.age.race.his.year = glm.nb(Deaths ~ Age_Groups + Race + Sex + Ethnicity  + Year, data = diabetes)

p.val9= pchisq(deviance(fit.sex.age.race.his.year), df.residual(fit.sex.age.race.his.year), lower = F)
p.val9

m9 <- round(c(df.residual(fit.sex.age.race.his.year), deviance(fit.sex.age.race.his.year), fit.sex.age.race.his.year$theta, 
              fit.sex.age.race.his.year$aic, fit.sex.age.race.his.year$twologlik),2)

#Model 10
fit.sex.age.race.his.State = glm.nb(Deaths ~ Age_Groups+ Race + Sex+ Ethnicity + State, data = diabetes)

p.val10= pchisq(deviance(fit.sex.age.race.his.State), df.residual(fit.sex.age.race.his.State), lower = F)
p.val10

m10 <- round(c(df.residual(fit.sex.age.race.his.State), deviance(fit.sex.age.race.his.State), fit.sex.age.race.his.State$theta, 
               fit.sex.age.race.his.State$aic, fit.sex.age.race.his.State$twologlik),2)

#Model 11
fit.sex.age.race.his.year.State = glm.nb(Deaths ~ Age_Groups + 
                                      Race+ Sex + Ethnicity+ Year  + State, data = diabetes)

p.val11= pchisq(deviance(fit.sex.age.race.his.year.State), df.residual(fit.sex.age.race.his.year.State), lower = F)
p.val11

m11 <- round(c(df.residual(fit.sex.age.race.his.year.State), deviance(fit.sex.age.race.his.year.State), fit.sex.age.race.his.year.State$theta, 
               fit.sex.age.race.his.year.State$aic, fit.sex.age.race.his.year.State$twologlik),2)
```

```{r}
################# SUMMARY Table-2 ####################
tab2 <- rbind(m1, m2, m3, m4, m5, m6, m7, m8, m9, m10,m11)
colnames(tab2) = c( "d.f.", "dev", "disp", "AIC", "2Llik")
rownames(tab2) = c("Model 1", "Model 2", "Model 3","Model 4","Model 5","Model 6","Model 7","Model 8","Model 9","Model 10","Model 11")
knitr::kable(tab2, digits=3) 
```


Diagnosis
===

Column {.tabset data-wigth=1000}
----------------------------------------------------------------------

### Model 1

::::::{style="display: flex;"}
:::{.column width=50%}

#### Quantile - Quantile Plot


```{r, fig.align = "center"}

halfnorm(residuals(fit.sex), main = " Model 1: Sex")
abline(a = 0, b = 1, lwd = 2, col = "red")
```

:::
:::{.column width=50%}


#### Probability Density Plot of Death due to Diabetes

```{r, fig.align = "center"}
plot(density(diabetes$Deaths), main = "Diagnostics of Model 1: Sex")
lines(density(predict(fit.sex, type="response")), col = "red")

```

:::
::::::




### Model 2

::::::{style="display: flex;"}
:::{.column width=50%}

#### Quantile - Quantile Plot

```{r}
halfnorm(residuals(fit.race), 
         main = "Model 2: Race")
abline(a = 0, b = 1, lwd = 2, col = "red")
```

:::
:::{.column width=50%}


#### Probability Density Plot of Death due to Diabetes

```{r}
plot(density(diabetes$Deaths), main = "Diagnostics of Model 2: Race")
lines(density(predict(fit.race, type="response")), col = "red")

```
:::
::::::


### Model 3

::::::{style="display: flex;"}
:::{.column width=50%}

#### Quantile - Quantile Plot

```{r}
halfnorm(residuals(fit.age), 
         main = "Model 3: Age Group")
abline(a = 0, b = 1, lwd = 2, col = "red")
```

:::
:::{.column width=50%}

#### Probability Density Plot of Death due to Diabetes


```{r}
plot(density(diabetes$Deaths), main = "Diagnostics of Model 3: Age Group")
lines(density(predict(fit.race, type="response")), col = "red")

```

:::
::::::

### Model 4

::::::{style="display: flex;"}
:::{.column width=50%}

#### Quantile - Quantile Plot

```{r}
halfnorm(residuals(fit.age.race), 
         main = "Model 4: Age Group + Race")
abline(a = 0, b = 1, lwd = 2, col = "red")
```

:::
:::{.column width=50%}


#### Probability Density Plot of Death due to Diabetes

```{r}
plot(density(diabetes$Deaths), main = "Diagnostics of Model 4: Age Group + Race")
lines(density(predict(fit.age.race, type="response")), col = "red")

```

:::
::::::


### Model 5

::::::{style="display: flex;"}
:::{.column width=50%}

#### Quantile - Quantile Plot

```{r}
halfnorm(residuals(fit.sex.race), 
         main = "Model 5: Race + Sex")
abline(a = 0, b = 1, lwd = 2, col = "red")
```

:::
:::{.column width=50%}


#### Probability Density Plot of Death due to Diabetes

```{r}
plot(density(diabetes$Deaths), main = "Diagnostics of Model 5: Race + Sex")
lines(density(predict(fit.sex.race, type="response")), col = "red")

```

:::
::::::


### Model 6

::::::{style="display: flex;"}
:::{.column width=50%}

#### Quantile - Quantile Plot

```{r}
halfnorm(residuals(fit.sex.age), 
         main = "Model 66: Age Group + sex")
abline(a = 0, b = 1, lwd = 2, col = "red")
```

:::
:::{.column width=50%}


#### Probability Density Plot of Death due to Diabetes

```{r}
plot(density(diabetes$Deaths), main = "Diagnostics of Model 6: Age Group + Sex")
lines(density(predict(fit.sex.age, type="response")), col = "red")

```

:::
::::::


### Model 7

::::::{style="display: flex;"}
:::{.column width=50%}

#### Quantile - Quantile Plot

```{r}
halfnorm(residuals(fit.sex.age.race), 
         main = "Model 7: Age Group + Race + Sex")
abline(a = 0, b = 1, lwd = 2, col = "red")
```

:::
:::{.column width=50%}


#### Probability Density Plot of Death due to Diabetes

```{r}
plot(density(diabetes$Deaths), main = "Diagnostics of Model 7: Age Group + Race + Sex")
lines(density(predict(fit.sex.age.race, type="response")), col = "red")

```

:::
::::::


### Model 8

::::::{style="display: flex;"}
:::{.column width=50%}

#### Quantile - Quantile Plot

```{r}
halfnorm(residuals(fit.sex.age.race.his), 
         main = "Age Group + Race + Sex + Ethnicity")
abline(a = 0, b = 1, lwd = 2, col = "red")
```

:::
:::{.column width=50%}


#### Probability Density Plot of Death due to Diabetes

```{r}
plot(density(diabetes$Deaths), main = "Diagnostics of Model 8: Age Group")
lines(density(predict(fit.sex.age.race.his, type="response")), col = "red")

```

:::
::::::


### Model 9

::::::{style="display: flex;"}
:::{.column width=50%}

#### Quantile - Quantile Plot

```{r}
halfnorm(residuals(fit.sex.age.race.his.year), 
         main = "Age Group + Race + Sex + Ethnicity + Year")
abline(a = 0, b = 1, lwd = 2, col = "red")
```

:::
:::{.column width=50%}


#### Probability Density Plot of Death due to Diabetes

```{r}
plot(density(diabetes$Deaths), main = "Diagnostics of Model 9: Age Group + Race + Sex + Ethnicity + Year")
lines(density(predict(fit.sex.age.race.his.year, type="response")), col = "red")

```

:::
::::::


### Model 10

::::::{style="display: flex;"}
:::{.column width=50%}

#### Quantile - Quantile Plot

```{r}
halfnorm(residuals(fit.sex.age.race.his.State), 
         main = "Age Group + Race + Sex + Ethnicity + State")
abline(a = 0, b = 1, lwd = 2, col = "red")
```

:::
:::{.column width=50%}


#### Probability Density Plot of Death due to Diabetes

```{r}
plot(density(diabetes$Deaths), main = "Diagnostics of Model 10: Age Group + Race + Sex + Ethnicity + State")
lines(density(predict(fit.sex.age.race.his.State, type="response")), col = "red")

```

:::
::::::


### Model 11

::::::{style="display: flex;"}
:::{.column width=50%}

#### Quantile - Quantile Plot

```{r}
halfnorm(residuals(fit.sex.age.race.his.year.State), 
         main = "Age Group + Race + Sex + Ethnicity + Year + State")
abline(a = 0, b = 1, lwd = 2, col = "red")
```


:::
:::{.column width=50%}


#### Probability Density Plot of Death due to Diabetes

```{r}
plot(density(diabetes$Deaths), main = "Diagnostics of Model 11: Age Group + Race + Sex + Ethnicity + Year + State")
lines(density(predict(fit.sex.age.race.his.year.State, type="response")), col = "red")

```

:::
::::::



Discussion
===

Column {data-width=700}
-----------------------------------------------------------------------

### Result

```{r}
m11_result <- summary(fit.sex.age.race.his.year.State)

m11_result_RR <- round(m11_result[[12]], 4)
m11_result_RR <- cbind(m11_result_RR, round(exp(m11_result_RR[,1]), 2))
colnames(m11_result_RR)[5] <- "Relative Rates"
#DT::datatable(m11_result_RR)

knitr::kable(m11_result_RR)

```



Column {.tabset}
-----------------------------------------------------------------------

### Analysis

The baselines of each factor are :

- Age group: <1

- Race: American Indian or Alaska native

- Ethnicity: Hispanic or Latino

- Sex: Female

- State: Alabama

<font size = 3>
**Analysis:**
</font>

- Ethnicity: The risk of dying due to diabetes among Non-Hispanic or Latino is about 9.33 times higher than that for the Hispanic or Latino holding all other variables constant. 

- Race: Highest risk of death due to diabetes is reported for white Americans. A person of white is 17.82 times more likely to die due to diabetes than American Indian or Alaska Native on average holding all other variables constant. 

- Sex: Males have about 1.48 times higher rick of dying due to diabetes than females on average holding all other variables constant. 


### Limitation

- Not enough variables

- Inability to optimize results

- No breakdown of causes of death.

### About the author

This dashboard is for my capstone project, under the supervision of Dr. Ying-Ju Tessa Chen. 

- Mathematics major

- Data analysis minor

- ENTP