Analysis of Opioid Drug Overdoses and Prescriptions in the U.S. from 2015 - 2020¶

Thomas Cutro

Joeseph Wagner

Project Plan¶

We worked with multiple datasets to analyze possible links between drug overdoses and other factors such as opioid prescriptions and income within the U.S. from 2015-2020. Our first dataset is The Provisional Drug Overdose Death Counts from the Center for Disease Control (CDC). It contains the specific types of overdoses and total counts in every state for the U.S. We plan on comparing this dataset to the Medicare Prescribing Rates Dataset. This dataset contains information sorted by state that describes a multitude of fatures for how states have prescribed opioids for the years we are interested in. We want to investigate if there is a correlation between the number of drug overdoses and the rate at which these drugs are prescribed. The data will help answer this question as it is sorted regionally, so we can study trends in specific places. Additionally, we loaded in a state average income dataset. We want to use this dataset to see if this economic factor has some correlation with drug overdose levels in states. Finally, we brought in a state population dataset from from the U.S. Census website. This enables us to normalize the different datasets we have, and therefore we can compare the trends between states accurately.

Goals¶

We aim to be able to be able to create a model that predicts the rate of drug overdose deaths in individual states within the United States.

Setup¶

Importing necessary libraries for project.

import matplotlib.pyplot as py
import numpy as np
import pandas as pd 
import seaborn as sns

ETL¶

Loading in Drug Overdose Deaths Data from CDC¶

We first load in the data for drug overdose deaths from the CDC website. The data is displayed in the table below. It contains the number of deaths for a class of drugs for each state and year/month combination. The year and month columns will prove especially helpful, as we can examine the date in a much more detailed manner than if we just had the year. This will help to answer our question of whether there is a causation/correlation between drug related deaths and prescriptions, as we can easily visualize trends over time in a detailed manner. We filtered the orginal dataset to remove extraneous columns and keep our data tidy. Additionally, we removed NaN values from the table.

Source: https://www.cdc.gov/nchs/nvss/vsrr/drug-overdose-data.htm

#Reads in datafile containg data on different drug deaths for every state 2015-2022
drug_deaths = pd.read_csv("https://data.cdc.gov/api/views/xkb8-kh2a/rows.csv?accessType=DOWNLOAD&bom=true&format=true")
#Gets rid of commas in data values column and converts object column to float
drug_deaths["Data Value"]= drug_deaths["Data Value"].str.replace(',','').astype(float)

#Filters dataframe for only select columns we want and renames Data Values column to Deaths
drug_deaths= drug_deaths[["State Name","Year","Month","Indicator","Data Value"]].loc[drug_deaths["Data Value"].isnull()==False]
drug_deaths.columns = drug_deaths.columns.str.replace('Data Value', 'Deaths')
display(drug_deaths)
display(drug_deaths.dtypes)

State Name     object
Year            int64
Month          object
Indicator      object
Deaths        float64
dtype: object

Below we filter the table further to obtain the total number of drug overdose deaths and the number of drug overdose deaths caused by opioids.

#Filters drug deaths dataframe to only contain values for total drug overdose deaths
drug_deaths_total = drug_deaths.loc[drug_deaths["Indicator"]=="Number of Drug Overdose Deaths"]
drug_deaths_total.columns = drug_deaths_total.columns.str.replace("Deaths","Total Drug Deaths")
#Filters drug deaths dataframe to only contain values for opioid drug overdose deaths
drug_deaths_opioids = drug_deaths.loc[drug_deaths["Indicator"]=="Natural, semi-synthetic, & synthetic opioids, incl. methadone (T40.2-T40.4)"]
drug_deaths_opioids.columns = drug_deaths_opioids.columns.str.replace("Deaths","Opioid Deaths")

Loading in Medicare Opioid Prescribing Rates¶

Next, we read in the Medicare Opioid Prescribing Rates for the years 2013-2020. This dataset contains the opioid prescription rates and total number of opioids dispensed for a state in a year. The data is collected by the Centers for Medicare & Medicaid Services, and it is based on claims data for Medicare beneficiaries. Unlike the previous drug overdose dataset, we do not receive a months column because this data is updated annually, so it will be a bit more difficult to visualize this data trend over small time intervals. We initially filter the table to only contain relevant data columns, which from this dataset was the Opioid Prescribing Rate and the Total Number of Opioid Claims.

Source: https://data.cms.gov/summary-statistics-on-use-and-payments/medicare-medicaid-opioid-prescribing-rates/medicaid-opioid-prescribing-rates-by-geography/data

#Reads in datafile containg opioid prescription data
opiod_presc = pd.read_csv("https://raw.githubusercontent.com/tcutro1/final-project/main/PrescriptionRates.csv")
opiod_presc = opiod_presc.loc[opiod_presc['Plan_Type']=="All"]

#Filters dataframe for only select columns we want and renames Geo_Desc column to State Name
opiod_presc = opiod_presc[["Year","Geo_Desc","Tot_Opioid_Clms","Opioid_Prscrbng_Rate"]].loc[opiod_presc["Opioid_Prscrbng_Rate"].isnull()==False]
opiod_presc.columns = opiod_presc.columns.str.replace('Geo_Desc', 'State Name')
display(opiod_presc)
display(opiod_presc.dtypes)

Year                      int64
State Name               object
Tot_Opioid_Clms         float64
Opioid_Prscrbng_Rate    float64
dtype: object

Loading in US Income Data¶

The data loaded in below contains every states average income over a certain amount of years. We will be using this data to test for any trends that income has with drug overdoses. First, we load in the data and then rename the columns to be the value of the first row, which contains the years for each column.

Source: https://nces.ed.gov/programs/digest/d20/tables/dt20_102.30.asp

income = pd.read_csv("https://raw.githubusercontent.com/tcutro1/final-project/main/Income%20Data.csv")
income.columns = income.iloc[1,:]
income = income.iloc[3:55,0:18]

Next, we select the certain columns we want, melt the table to make it easier to combine with our other data, and then rename the columns.

income = income[["State","2015","2016","2017","2018","2019"]]
income = pd.melt(income, id_vars="State")
income.columns = ["State Name","Year","Income"]

Now we reformat the income column, so it can be converted to an int. We also had to fix the formatting of the State's column because a few of them had extra spaces.

income["Year"]= pd.to_numeric(income["Year"])
income["Income"]= income["Income"].str.replace(',','')
income["Income"]= income["Income"].str.replace('$','').astype(float)
income["State Name"] = income["State Name"].apply(lambda x: x.strip())
display(income)
display(income.dtypes)

<ipython-input-34-2b46d3bcec3f>:3: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.
  income["Income"]= income["Income"].str.replace('$','').astype(float)

State Name     object
Year            int64
Income        float64
dtype: object

Loading in State Populations by Year for Normalization¶

The dataset below was retrieved from the US Census Data. It provides a breakdown of state population by year. First the dataset is read into a dataframe from the URL below.

Source: https://www.census.gov/programs-surveys/popest/technical-documentation/research/evaluation-estimates/2020-evaluation-estimates/2010s-state-total.html

url = 'https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/national/totals/nst-est2020-popchg2010-2020.csv'
population_df = pd.read_csv(url)

Filtering the dataframe down to the columns desired, which are the state names and population count for each year. Dropping extraneous columns

population_columns = ['NAME', 'POPESTIMATE2010','POPESTIMATE2011', 'POPESTIMATE2012',
       'POPESTIMATE2013', 'POPESTIMATE2014', 'POPESTIMATE2015',
       'POPESTIMATE2016', 'POPESTIMATE2017', 'POPESTIMATE2018',
       'POPESTIMATE2019', 'POPESTIMATE2020']
population_df = population_df[population_columns]
population_df = population_df.drop(population_df.index[1:5])

Setting the index for future melting. Removing the "popestimate" from each column so that the years are easily accessible and can be merged into the full dataframe.

population_df.set_index('NAME')
population_df.columns = population_df.columns.str.replace("POPESTIMATE", "")
population_df.head(10)

Melting the dataframe so that it is tidy and not too wide. Each row now represents a state's popuoation for a year ranging from 2010-2020. Setting the Year datatype to numeric.

population_melt = pd.melt(population_df, id_vars=['NAME'])
population_melt.rename(columns={'NAME': 'State Name', 'variable': 'Year', 'value':'Population'}, inplace=True)
population_melt["Year"]= pd.to_numeric(population_melt["Year"])
display(population_melt)
display(population_melt.dtypes)

State Name    object
Year           int64
Population     int64
dtype: object

Combining all Our Data Tables¶

We now combine the different datasets we have into one dataframe. This will allow us to model the data and do more effective anlysis in the following EDA section. We begin by using the groupby() function to get each dataset in the same form so that we can merge them together using the state name and year. We do this for the overdose deaths, opioid prescriptions, incomes, and population datasets.

#Summarizes and groups both dataframes by State Name and year 
deaths_by_year = drug_deaths_total.groupby(["State Name","Year"]).mean()
opioids_deaths_by_year = drug_deaths_opioids.groupby(["State Name","Year"]).mean()
prescr_by_year = opiod_presc.groupby(["State Name","Year"]).mean()
income_by_year = income.groupby(["State Name","Year"]).mean()
population_by_year = population_melt.groupby(["State Name","Year"]).mean()

We then merge the tables into each other one by one. We do this on the state name and year using an inner join.

prescr_and_deaths = deaths_by_year.merge(prescr_by_year, on = ["State Name","Year"],how = "inner")
prescr_and_two_deaths = prescr_and_deaths.merge(opioids_deaths_by_year,on = ["State Name","Year"],how = "inner" )
presc_deaths_income = prescr_and_two_deaths.merge(income_by_year, on= ["State Name","Year"],how = "inner")
presc_deaths_income_pop = presc_deaths_income.merge(population_by_year, on= ["State Name","Year"],how = "inner")

Below we create some new data columns that normalize the deaths and total opioid claims by dividing them with the population of each state for that year.

presc_deaths_income_pop["Opiod Claim Rate"] = presc_deaths_income_pop["Tot_Opioid_Clms"]/presc_deaths_income_pop['Population']
presc_deaths_income_pop["Death Rate"] = (presc_deaths_income_pop["Total Drug Deaths"]/presc_deaths_income_pop['Population'])*10000
display(presc_deaths_income_pop)

Getting averages for the states¶

#Group data by State and combine datasets to get Average Deaths and Average Opiod Prescriptions in one dataframe
avg_state_deaths = drug_deaths_total.groupby(["State Name"])[["Total Drug Deaths"]].mean()
avg_state_deaths_opiods = drug_deaths_opioids.groupby(["State Name"])[["Opioid Deaths"]].mean()
avg_state_presc = opiod_presc.groupby(["State Name"])[['Tot_Opioid_Clms','Opioid_Prscrbng_Rate']].mean()
avg_state_income = income.groupby(["State Name"])[["Income"]].mean()
avg_state_population = population_melt.groupby(["State Name"])[["Population"]].mean()
avg_state = avg_state_deaths.merge(avg_state_presc, on = "State Name", how = "inner" )
avg_state = pd.merge(avg_state,avg_state_deaths_opiods,on = "State Name", how = "inner")
avg_state = pd.merge(avg_state,avg_state_income,on = "State Name", how = "inner")
avg_state = pd.merge(avg_state, avg_state_population, on = "State Name", how = "inner")
avg_state['Opioid_Clms_Rate'] = avg_state['Tot_Opioid_Clms']/avg_state['Population']
avg_state['Deaths_Rate'] = (avg_state["Total Drug Deaths"]/avg_state['Population'])
avg_state["Opioid_Deaths_Rate"] = (avg_state["Opioid Deaths"]/avg_state['Population'])
avg_state

EDA¶

Overall Trends in Drug Overdose Deaths throughout the U.S.¶

Below is a plot of the Drug Deaths per 10,000 people in the United States from 2015-2020. Visualizes the increase in drug overdose deaths over the past five years.

py.plot((normalized_deaths.loc["United States"]["Total Drug Deaths"]/normalized_deaths.loc["United States"]["Population"])*10000)
py.title("Drug Deaths per 10,000 People in United States From 2015-2020")
py.xlabel("Year")
py.ylabel("Death Rate per 10,000 People")

Text(0, 0.5, 'Death Rate per 10,000 People')

To get an inital look at the issue we aim to explore, we graphed the number of drug overdose deaths from 2015 to 2022 for DC, Louisiana, and West Virginia. In order to do this, we grouped the total drug deaths dataframe by state name and year. We plotted the non-normalized graph first to show just how big the issue is, and it is only getting worse. We hope to evince the necessity to take action in certain states based on the high overdose death rates, and other factors that correlate with this.

drug_deaths_summary= drug_deaths_total.groupby(["State Name","Year"]).mean()
state_names = ["District of Columbia", "Louisiana", "West Virginia"]
for i in state_names:
  py.plot(drug_deaths_summary.loc[i])
py.title("Drug Deaths From 2015-2022 For Select States")
py.xlabel("Year")
py.ylabel("Number Of Deaths")

Text(0, 0.5, 'Number Of Deaths')

We then graph the drug overdose deaths per person in DC, Louisiana, and West Virginia from 2015-2020. From this graph, you can tell that many of the states with the highest drug overdose death rates over these years are still trending upward. We plan on focusing on the states with the highest overdose death rates, and look at factors that could be playing a major role in these increases.

normalized_deaths = pd.merge(drug_deaths_summary, population_by_year, on = ["State Name","Year"], how = "inner")
for i in state_names:
  py.plot(normalized_deaths.loc[i]["Total Drug Deaths"]/normalized_deaths.loc[i]["Population"])
py.title("Normalized Drug Deaths based on States Population From 2015-2020")
py.xlabel("Year")
py.ylabel("Death Rate per Person")

Text(0, 0.5, 'Death Rate per Person')

Comparing Drug Overdose Deaths and Total Opioid Claims in each state¶

Below is a scatterplot of the Total Drug Overdose Deaths vs Total Opioid Claims. We plotted this data to see if there was any correlation present between the two variables. Our results manifested the need to normalize our data based on populations. It is clear that states with very large or very small populations will appear as the outliers on this plot.

#Scatter Plot comparing Average Deaths and Average Opiod Prescriptions for each state
py.figure(figsize=(10,5))
py.scatter(avg_state['Tot_Opioid_Clms']**.1,avg_state["Total Drug Deaths"])
py.title('Total Drug Overdose Deaths vs Total Opioid Claims in the U.S.')
py.xlabel("Total Opiod Claims")
py.ylabel("Drug Overdose Deaths")

Text(0, 0.5, 'Drug Overdose Deaths')

The first two entries shown below, Florida and California, represent the states with the most drug overdose deaths and the most total opioid claims respectively. This proves interesting once you look at the state with the least overdose deaths, South Dakota, and the state with the least total opioid claims, Wyoming. While the opioid prescription rates between the different states are similar, the total number of deaths and total opioid claims differs by thousands between the entries.

#state with highest amount of opiod overdoses
display(avg_state.loc[avg_state["Total Drug Deaths"]== avg_state["Total Drug Deaths"].max()])
#state with highest amount of opiod prescriptions
display(avg_state.loc[avg_state["Tot_Opioid_Clms"]== avg_state["Tot_Opioid_Clms"].max()])
#state with lowest amount of opiod overdoses
display(avg_state.loc[avg_state["Total Drug Deaths"]== avg_state["Total Drug Deaths"].min()])
#state with lowest amount of opiod prescriptions
display(avg_state.loc[avg_state["Tot_Opioid_Clms"]== avg_state["Tot_Opioid_Clms"].min()])

Even though California stands out as having the most opiod overdoses and opiod claims, this is primarily due to California's high population. Based on these results, it is clear that we need to normalize this data using the population of each state. Thus, we can visualize trends in our data.

Below is the normalized scatterplot, which displayed the Drug Overdose Death Rate (deaths/population). We added a best fit line to show the data trend that seems to manifest itself on this graph.

#Scatter Plot comparing Average Deaths and Average Opiod Prescriptions for each state

py.figure(figsize=(10,5))
py.scatter(avg_state['Opioid_Clms_Rate'], avg_state["Deaths_Rate"])

#Line of Best Fit
a, b = np.polyfit(avg_state['Opioid_Clms_Rate'], avg_state["Deaths_Rate"], 1)
py.plot(avg_state['Opioid_Clms_Rate'], a*avg_state['Opioid_Clms_Rate']+b)

#Annotating Outliers that are shown below
py.annotate("West Virginia", (avg_state.loc["West Virginia"]["Opioid_Clms_Rate"]-.015, avg_state.loc["West Virginia"]["Deaths_Rate"]-.00003))
py.annotate("South Dakota", (avg_state.loc["South Dakota"]["Opioid_Clms_Rate"]+.002, avg_state.loc["South Dakota"]["Deaths_Rate"]))
py.annotate("District of Columbia", (avg_state.loc["District of Columbia"]["Opioid_Clms_Rate"]+.002, avg_state.loc["District of Columbia"]["Deaths_Rate"]))

py.title('Drug Overdose Deaths Rate vs Opioid Claims Rate')
py.ylabel("Drug Overdose Deaths Rate")
py.xlabel('Opioid Claims Rate')
py.show()

From this graph, it appaears there is a correlation between the rate at which opioids are being claimed and used as medication in a state, and the number of overdose deaths that these states experience. Below we extrapolate a few of the extreme cases to get a better idea of the data we are dealing with. In an upcoming section we will use a correlation matrix to see if we can corroborate these results.

#state with highest rate of opiod overdoses
display(avg_state.loc[avg_state["Deaths_Rate"]== avg_state["Deaths_Rate"].max()])
#state with highest rate of opiod prescriptions
display(avg_state.loc[avg_state["Opioid_Clms_Rate"]== avg_state["Opioid_Clms_Rate"].max()])
#state with lowest rate of opiod overdoses
display(avg_state.loc[avg_state["Deaths_Rate"]== avg_state["Deaths_Rate"].min()])
#state with lowest rate of opiod prescriptions
display(avg_state.loc[avg_state["Opioid_Clms_Rate"]== avg_state["Opioid_Clms_Rate"].min()])

West Virginia immediately stands out with the highest rate of drug overdoses and opioid prescriptions of any state. West Virginia is amidst an opioid epidemic worse than they have ever seen. The high rate of drug usage and overdoses there have created a situation where employees are sparse and children struggle to grow up with parents or friends unaffected by the crisis. More information can be found from the DEA at this link.

Correlation Matrix for Combined Dataframes¶

To further investigate if there was correlations present within the data, specifically whether any features had a correlation with the drug overdose death rate, we took a look at a correlation matrix.

First, we looked at the correlation matrix for our combined table which has the average data for each state from 2015-2020.

py.figure(figsize=(12,10))
cor = avg_state.corr()
sns.heatmap(cor, annot=True, cmap=py.cm.Reds)
py.show()

As expected, the population of a state has a strong correlation with the total drug overdose deaths and the total opioid claims. Of significant interest to us was the correlation value of 0.48 between the overdose death rate and the opioid claims rate. We graphed this relationship in Section B, and noticed a potentially linear relationship. The correlation value further supports our findings.

py.figure(figsize=(12,10))
cor = presc_deaths_income_pop.corr()
sns.heatmap(cor, annot=True, cmap=py.cm.Reds)
py.show()

This correlation matrix does not provide as interesting of results. When the data is not grouped together for each state the results do not seem to be as significant.

Income Correlation¶

After reviewing the correlation matrix, we wanted to visualize the correlation between income and opioid claim rates per state. Although it is slight, the graph below shows correlation between the two variables, and as average income increases opioid prescription rate decreases. I believe the average income across an entire state is not the best representation of economic factors for that region, so other economic data would be needed to further study this.

py.scatter(avg_state['Income'], avg_state["Opioid_Clms_Rate"]**2.8)
py.title("State's Average Income vs Average Opioid Claims Rate")
py.xlabel("Income")
py.ylabel("Opioid Claim Rate")
py.annotate("West Virginia", (avg_state.loc["West Virginia"]["Income"]+1000, avg_state.loc["West Virginia"]["Opioid_Clms_Rate"]**2.8-.0002))
py.show()

Model Question¶

Can we predict the number of drug overdoses a specific state will experience based on data from different years?¶

Our model predicts the drug overdose rate in a state for a given year, based on the features given and data from different years. We identified mutliple states that have climbing overdose death totals, so being able to predict the increase or decrease a state will experience would prove helpful in stopping the overdose problem throughout our country and helping to better prepare different areas of the country. Our independent variables are the state name, opioid prescription rate, income, population, and opioid claim rate. Our dependent variable is the number of drug overdose death rate (deaths per 10,000 people). We will test whether this model is correct by comparing our predicted values for the drug overdose deaths per year to the actual values contained in our datasets. This will be using an 80/20 train and test split for a Linear Regression model

Model¶

Below is a spread of all the variables contained in our current combined dataset. For the model, we will be dropping some of these features which contain repeat information or have an extreme skew.

presc_deaths_income_pop.hist(bins=50, figsize=(20,15))
py.show()

We drop the total opioid claims because we created a normalized rate which is the feature "Opioid Claim Rate". We also drop the "Opioid Deaths" column, which gets into some assumptions we made about the model. We assumed that with information on opioids and the rates at which they are prescribed and claimed, we can predict the drug overdose death rate per 10,000 people. This generalizes opioid information to encompass drug overdoses, but there are a multtitude of other factors that influence this death rate year to year. Another possible issue could stem from people overdosing in a state they are not a resident in, yet their overdose death would count towards that specific state's total. A possible future step would be to examine and account for all the different biases present, something we tried to do but it is not perfect. Additionally, we assumed grouping by state could yield significant results, but we definitely think having data by county and parish would be extremely beneficial. Many states are large and have a huge variation in different prescribing rates, income, and deaths even by county, so being able to examine the regions more specifically would be beneficial.

model_df = presc_deaths_income_pop.drop(["Tot_Opioid_Clms", "Opioid Deaths", "Total Drug Deaths"], axis = 1)

This is the dataframe that will be used for the model. The Death Rate is the target variable we are trying to predict with a Linear Regression model.

model_df

First, the data is split into an 80% training set and 20% test set for model evaluation.

from sklearn.model_selection import train_test_split
X = model_df.drop(["Death Rate"], axis = 1)
y = model_df["Death Rate"]

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=0)

We fit a linear regression model to the training data. This is done using sklearns Linear Regression function.

from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)

LinearRegression()

We use the mean absolute error to evaluate our model's ability to predict the death rate. This gives us the distance our prediction was from the true value.

#The mean absolute error of training set
from sklearn.metrics import mean_absolute_error

y_train_pred = lin_reg.predict(X_train)
y_test_pred = lin_reg.predict(X_val)

train_mae = mean_absolute_error(y_train, y_train_pred)
test_mae = mean_absolute_error(y_val, y_test_pred)
print('Train MAE: ' + str(train_mae))
print('Test MAE: ' + str(test_mae))

Train MAE: 0.5373734949536381
Test MAE: 0.6623537393234372

The training set mean absolute error was 0.536. This can be interpreted as the predicted drug overdose death rate being within 0.536 of the actual death rate per 10,0000 people. The test MAE was close, at 0.613.

Conclusions¶

While our model did a decent job of predicting the death rate, bringing in more features and a larger dataset would increase the accuracy of the model. We believe that with improvement we could help states predict trends for their future drug overdose deaths to stop a problem before it gets worse. If we were able to predict the states with the greatest number of opioid deaths, then our model could be used to target regions that need more drug education funding, narcan supplies, and other resources in hopes of lowering the predicted death number to save lives.

	State Name	Year	Month	Indicator	Deaths
6	Alaska	2015	April	Number of Deaths	4133.000000
9	Alaska	2015	April	Percent with drugs specified	88.095238
11	Alaska	2015	April	Number of Drug Overdose Deaths	126.000000
14	Alaska	2015	August	Percent with drugs specified	87.903226
20	Alaska	2015	August	Number of Deaths	4222.000000
...	...	...	...	...	...
53776	New York City	2022	May	Percent with drugs specified	99.544499
53777	New York City	2022	May	Natural & semi-synthetic opioids, incl. methad...	710.000000
53778	New York City	2022	May	Number of Drug Overdose Deaths	2854.000000
53779	New York City	2022	May	Methadone (T40.3)	333.000000
53780	New York City	2022	May	Synthetic opioids, excl. methadone (T40.4)	2255.000000

	Year	State Name	Tot_Opioid_Clms	Opioid_Prscrbng_Rate
0	2020	National	20144712.0	3.21
3	2020	Alabama	182840.0	2.71
6	2020	Alaska	57160.0	4.29
9	2020	Arizona	496944.0	3.68
12	2020	Arkansas	183044.0	3.88
...	...	...	...	...
1233	2013	Virginia	730742.0	6.95
1236	2013	Washington	620072.0	8.82
1239	2013	West Virginia	401118.0	6.92
1242	2013	Wisconsin	899000.0	8.85
1245	2013	Wyoming	41202.0	7.78

	State Name	Year	Income
0	United States	2015	60200.0
1	Alabama	2015	48300.0
2	Alaska	2015	79200.0
3	Arizona	2015	55600.0
4	Arkansas	2015	45300.0
...	...	...	...
255	Virginia	2019	76500.0
256	Washington	2019	78700.0
257	West Virginia	2019	48900.0
258	Wisconsin	2019	64200.0
259	Wyoming	2019	65000.0

	NAME	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020
0	United States	309327143	311583481	313877662	316059947	318386329	320738994	323071755	325122128	326838199	328329953	329484123
5	Alabama	4785514	4799642	4816632	4831586	4843737	4854803	4866824	4877989	4891628	4907965	4921532
6	Alaska	713982	722349	730810	737626	737075	738430	742575	740983	736624	733603	731158
7	Arizona	6407342	6473416	6556344	6634690	6732873	6832810	6944767	7048088	7164228	7291843	7421401
8	Arkansas	2921998	2941038	2952876	2960459	2968759	2979732	2991815	3003855	3012161	3020985	3030522
9	California	37319550	37636311	37944551	38253768	38586706	38904296	39149186	39337785	39437463	39437610	39368078
10	Colorado	5047539	5121900	5193660	5270774	5352637	5454328	5543844	5617421	5697155	5758486	5807719
11	Connecticut	3579173	3588632	3595211	3595792	3595697	3588561	3579830	3575324	3574561	3566022	3557006
12	Delaware	899647	907590	915518	924062	933131	942065	949989	957942	966985	976668	986809
13	District of Columbia	605282	620290	635737	651559	663603	677014	687576	697079	704147	708253	712816

	State Name	Year	Population
0	United States	2010	309327143
1	Alabama	2010	4785514
2	Alaska	2010	713982
3	Arizona	2010	6407342
4	Arkansas	2010	2921998
...	...	...	...
578	Washington	2020	7693612
579	West Virginia	2020	1784787
580	Wisconsin	2020	5832655
581	Wyoming	2020	582328
582	Puerto Rico	2020	3159343

	Total Drug Deaths	Tot_Opioid_Clms	Opioid_Prscrbng_Rate	Opioid Deaths	Income	Population	Opioid_Clms_Rate	Deaths_Rate	Opioid_Deaths_Rate
State Name
Alaska	147.274725	77177.000	6.52125	84.580247	77620.0	7.332014e+05	0.105260	0.000201	0.000115
Arizona	1873.978022	951808.125	6.46500	1245.370968	58820.0	6.864346e+06	0.138660	0.000273	0.000181
Arkansas	444.538462	291670.000	5.96125	368.800000	47440.0	2.980382e+06	0.097863	0.000149	0.000124
California	6625.769231	3534421.625	4.71125	4838.243243	74740.0	3.867048e+07	0.091398	0.000171	0.000125
Colorado	1200.538462	619904.875	8.37375	730.340426	72300.0	5.442315e+06	0.113905	0.000221	0.000134
Connecticut	1127.791209	426297.375	4.86000	900.384615	77820.0	3.581437e+06	0.119030	0.000315	0.000251
Delaware	362.417582	131720.625	5.27625	376.528302	66740.0	9.418551e+05	0.139852	0.000385	0.000400
District of Columbia	345.549451	86177.750	4.00625	238.065934	85380.0	6.693960e+05	0.128740	0.000516	0.000356
Florida	5564.417582	1154344.750	4.14250	5336.125000	55620.0	2.026482e+07	0.056963	0.000275	0.000263
Georgia	1598.791209	839710.875	5.32875	926.488095	58560.0	1.020295e+07	0.082301	0.000157	0.000091
Hawaii	220.362637	137171.125	5.75000	56.462687	80920.0	1.407845e+06	0.097433	0.000157	0.000040
Idaho	255.989011	133868.125	5.75375	184.800000	55880.0	1.674511e+06	0.079945	0.000153	0.000110
Illinois	2787.912088	1078913.250	4.77875	1857.617978	66060.0	1.280115e+07	0.084283	0.000218	0.000145
Indiana	1822.164835	789527.250	5.86250	1629.317073	56220.0	6.619022e+06	0.119282	0.000275	0.000246
Iowa	351.230769	282087.250	5.36250	163.764045	60560.0	3.115596e+06	0.090540	0.000113	0.000053
Kansas	405.428571	188696.750	5.48250	269.250000	59380.0	2.898537e+06	0.065101	0.000140	0.000093
Kentucky	1615.714286	601814.375	4.02000	1350.092593	50500.0	4.424540e+06	0.136017	0.000365	0.000305
Maine	398.329670	119402.750	4.81250	328.945055	57260.0	1.334291e+06	0.089488	0.000299	0.000247
Maryland	2203.648352	712545.750	5.52625	1831.065934	84320.0	5.961754e+06	0.119519	0.000370	0.000307
Massachusetts	2139.923077	570301.500	3.83750	1886.000000	80840.0	6.771586e+06	0.084220	0.000316	0.000279
Michigan	2490.208791	1608325.125	6.36250	2130.194444	57120.0	9.937251e+06	0.161848	0.000251	0.000214
Minnesota	820.285714	537136.125	4.62500	909.062500	71180.0	5.489749e+06	0.097843	0.000149	0.000166
Mississippi	415.505495	310771.125	5.63500	332.666667	45000.0	2.983226e+06	0.104173	0.000139	0.000112
Missouri	1578.824176	560065.875	5.00500	1260.918367	55640.0	6.075600e+06	0.092183	0.000260	0.000208
Montana	138.065934	182078.875	8.43750	76.592593	55180.0	1.033568e+06	0.176165	0.000134	0.000074
Nevada	780.252747	410353.250	7.24250	365.813187	59780.0	2.888736e+06	0.142053	0.000270	0.000127
New Hampshire	428.901099	68196.625	4.28750	373.230769	76440.0	1.339741e+06	0.050903	0.000320	0.000279
New Jersey	2425.054945	669843.000	3.86625	2451.160000	82300.0	8.863289e+06	0.075575	0.000274	0.000277
New Mexico	637.032967	296910.500	5.82500	343.230769	49500.0	2.090117e+06	0.142055	0.000305	0.000164
New York	2361.340659	1620774.125	2.59500	1745.384615	68320.0	1.954416e+07	0.082929	0.000121	0.000089
North Carolina	2416.736264	886286.750	5.58125	1733.055556	54520.0	1.006706e+07	0.088038	0.000240	0.000172
Ohio	4431.615385	1757815.125	4.79625	3359.909091	56580.0	1.162010e+07	0.151274	0.000381	0.000289
Oklahoma	759.109890	378104.750	6.55750	322.461538	52860.0	3.887131e+06	0.097271	0.000195	0.000083
Oregon	666.989011	586564.000	6.78875	295.285714	62840.0	4.036383e+06	0.145319	0.000165	0.000073
Rhode Island	352.032967	112548.750	3.78375	277.604396	66120.0	1.056424e+06	0.106537	0.000333	0.000263
South Carolina	1232.417582	329041.625	4.97750	873.406593	53180.0	4.906767e+06	0.067059	0.000251	0.000178
South Dakota	74.285714	41600.125	4.98625	32.434211	58180.0	8.561392e+05	0.048590	0.000087	0.000038
Tennessee	2221.494505	755051.000	5.82750	1570.715909	53120.0	6.609986e+06	0.114229	0.000336	0.000238
Texas	3341.846154	1090523.500	3.52375	1436.045455	61560.0	2.736909e+07	0.039845	0.000122	0.000052
Utah	634.241758	171272.500	6.22500	372.824176	71600.0	3.001934e+06	0.057054	0.000211	0.000124
Vermont	148.813187	83535.875	5.68625	113.967033	61560.0	6.253857e+05	0.133575	0.000238	0.000182
Virginia	1624.296703	971224.125	7.25875	1208.230769	73820.0	8.345389e+06	0.116379	0.000195	0.000145
Washington	1389.945055	832328.375	6.54375	663.725275	73760.0	7.202045e+06	0.115568	0.000193	0.000092
West Virginia	1005.318681	365201.125	4.30625	798.098901	46120.0	1.832192e+06	0.199325	0.000549	0.000436
Wisconsin	1229.516484	748482.250	6.87250	875.237500	61700.0	5.764201e+06	0.129850	0.000213	0.000152
Wyoming	86.230769	30742.375	6.10500	44.890625	63900.0	5.788710e+05	0.053107	0.000149	0.000078

		Total Drug Deaths	Tot_Opioid_Clms	Opioid_Prscrbng_Rate	Opioid Deaths	Income	Population	Opiod Claim Rate	Death Rate
State Name	Year
Alaska	2015	122.666667	73451.0	7.61	65.000000	79200.0	738430.0	0.099469	1.661182
	2016	132.833333	96490.0	7.90	71.333333	81400.0	742575.0	0.129940	1.788820
	2017	130.916667	97745.0	6.50	60.750000	76300.0	740983.0	0.131913	1.766797
	2018	120.166667	80393.0	5.76	62.916667	75700.0	736624.0	0.109137	1.631316
	2019	128.666667	65394.0	4.56	55.166667	75500.0	733603.0	0.089141	1.753900
...	...	...	...	...	...	...	...	...	...
Wisconsin	2018	1122.833333	591217.0	5.40	710.416667	61900.0	5809319.0	0.101770	1.932814
Wisconsin	2019	1134.750000	503993.0	4.70	743.916667	64200.0	5824581.0	0.086529	1.948209
Wyoming	2017	72.583333	29216.0	5.85	36.111111	63000.0	579994.0	0.050373	1.251450
	2018	63.583333	23396.0	4.94	37.416667	62700.0	579054.0	0.040404	1.098055
	2019	66.416667	19614.0	4.43	37.333333	65000.0	580116.0	0.033810	1.144886

		Opioid_Prscrbng_Rate	Income	Population	Opiod Claim Rate	Death Rate
State Name	Year
Alaska	2015	7.61	79200.0	738430.0	0.099469	1.661182
	2016	7.90	81400.0	742575.0	0.129940	1.788820
	2017	6.50	76300.0	740983.0	0.131913	1.766797
	2018	5.76	75700.0	736624.0	0.109137	1.631316
	2019	4.56	75500.0	733603.0	0.089141	1.753900
...	...	...	...	...	...	...
Wisconsin	2018	5.40	61900.0	5809319.0	0.101770	1.932814
Wisconsin	2019	4.70	64200.0	5824581.0	0.086529	1.948209
Wyoming	2017	5.85	63000.0	579994.0	0.050373	1.251450
	2018	4.94	62700.0	579054.0	0.040404	1.098055
	2019	4.43	65000.0	580116.0	0.033810	1.144886