 17th Feb 2024
 06:03 am
PUN105 Assessment Question
Your task is to write the statistical methods and results sections of a report that focuses on the analysis of countrylevel COVID19 data. The data set you are required to analyse is available on the Assessment 2 page of Blackboard. Please note this is an individual assignment and that you must be logged into Blackboard under your username to access the assessment tasks and data set that have been assigned to you. The data coding manual and the five specific research questions you are required to answer, are provided at the end of this document. You will be graded on both your analysis and interpretation, so you should avoid simple yes/no answers to the questions. Rather, be sure to explain your answer citing any relevant results. You are expected to include both descriptive and inferential statistics for Questions 1 to 4. Only statistical methods taught in this unit should be used in the assessment. Your grade will be negatively affected if alternate statistical methods are used. All questions can be answered using the methods taught.
Solution
Data downloaded from the ourworldindata.org website
 Statistical Method
This study aims to investigate the association between development indices, vaccination, and the death rate from the COVID19. For this purpose, we use the data bank of ourworldindata.org, which provides a variety of measures on COVID19, vaccination rate, and development measures. Table 1 shows selective variables and data statistics obtained from this databank. The COVID19 measures indicate the statistics of the pandemic in each of the countries, from the first of January 2020 till Jun 3^{rd}, 2022, the time of obtaining data.
Table 1

Number of observations 
Mean 
Standard Deviation 

COVID19 Measures 




Total_cases 
183,834 
3,296,788 
2.07e+07 

Total_deaths 
165,368 
64,774.8 
337,522.7 

ICU_patients 
25,498 
879.6 
2,579.7 

People_vaccinated 
49,909 
1.10e+08 
4.71e+08 

DCR 
156,445 
2.4 
9.6 

Development indices 


GDP_per_capita 
157,205 
19,636.0 
20,576.6 

Stringency_index 
148,621 
52.9 
20.814 

Life_expectency 
178,964 
73.6 
7.4 

Human_development_index 
153,621 
.72 
.14 

Extreme_poverty 
102,625 
13.6 
20.0 

In Table 1, measures of COVID19 include total cases, number of death, ICU patients, vaccination, and the Death to Case Ratio (DCR) which is calculated from new death smoothed per millionnew cases smoothed per milion*100.
The DCR indicates several aspects of the pandemic; e.g., how a government manages it, if the facilities are good enough, and if the vaccination is effective. However, identifying each of these factors requires an accurate statistical methodology. Anyhow, first we need to assess if this measure is statistically different in various countries. In other words, we first investigate if the DCR has any variation, rather than being the same all over the world.
Measuring DCR in various continents indicates considerable differences, with a significant difference with respect to Africa. Table 2 indicates the differences. The average DCR is the highest in South America, next comes Africa. The statistical differences between the continents are shown in columns 3 and 4. The differences are measured with respect to Africa. The model for calculating the differences is a multivariate linear model, DCRi=β0+β1Asiai+β2Europei+β3NorthAmericai+β4Oceaniai+β5SouthAmericai+εi. In this model, countries in each continent are categorized with a corresponding dummy variable for that continent; and also, Africa is the reference group.
An alternative approach is to run a univariate linear model DCRi=β0+β1Asiai+εi, which is restricted to observations in Africa and Asia.
Table 2
Continent 
Number of observations 
Mean DCR 
Difference w.r.t. Africa 
Standard deviation 
Africa 
39,144 
.027 
 
 
Asia 
34,992 
.022 
.0056*** 
.0007 
Europe 
37,192 
.024 
.0029*** 
.0007 
North America 
21,188 
.022 
.0050*** 
.0008 
Oceania 
4,255 
.016 
.0118*** 
.0016 
South America 
9,518 
.031 
.003** 
.0011 
The DCR is not only different over regions, but also over time. An overall lookout indicates that the DCR is changing over time. Figure 1 indicates this time variation.
Figure 1: Time Changes by Continent
We are interested to investigate more on the association between a variety of covariates and the DCR. The variables include measures of governance quality, development indices, regional fixed effects, and time dummies. The following multivariate model can be used for this purpose:
DCRit=β0+β1Vaccinationit+β2governanceit+β3Developmentit+di+time+εit
Where i indicate countries, t corresponds to time, and governance and development indicate indices for these two features. Countries’ fixed effects are controlled via di, and time variations are controlled via dummy variables for time.
Before showing the results of the above mentioned model, it is interesting to observe overall association between vaccination rate and the DCR. Figure 2 shows scatter and quadratic fitted linear model between the two variables, in January 2022. The overall figure indicates a negative correlation between the two variables, albeit without controlling for other factors such as regions fixed effects.
Figure 2: Association Between Vaccination Rate(Xaxis)
and the DCR (Yaxis)
 Results
Table 3 indicates the results of the multivariate model; in column 1 we control for vaccination, governance, and income proxies. In columns 2 and 3 we add two variable regarding health status of the country. Models in columns 13 are estimated using Ordinary Least Squares method; In column 4, we use Fixed Effect Estimator which controls for fixed characteristics of the countries, and also time dummies. The coefficiens in columns 13 can be interpreted at “level”; in other words, if vaccination increases by 100 people, the DCR reduces by 0.01 percentage point. Equivalently, if vaccination increases by 1 million people, the DCR decreases by 100 percentage points. Indeed, this is equivalent to alleviation of the disease. Although, we should be careful to generalize the results to such amount as the estimated coefficients are locally estimated.
Table 3: Multivariate Model Between Vaccination and DCR
Dependent variable: DCR 

1 
2 
3 
4 

total_vaccinations_per_hundred 
0.0094*** 
0.0091*** 
0.015*** 
0.0061*** 
0.0005 
0.00056 
0.0007 
0.00072 

stringency_index 
0.0088*** 
0.0081*** 
0.027*** 
0.00064 
0.0019 
0.0021 
0.0023 
0.0026 

gdp_per_capita 
0.000023*** 
0.000018*** 
0.000030*** 
0 
1.6E06 
2.1E06 
9.1E06 
. 

hospital_beds_per_thousand 
0.075*** 
0.31*** 
0 

0.014 
0.033 
. 

life_expectancy 
0.055*** 
0.050*** 
0 

0.0091 
0.014 
. 

extreme_poverty 
0.016*** 
0 

0.0033 
. 

cardiovasc_death_rate 
0.0012** 
0 

0.00049 
. 

diabetes_prevalence 
0.098*** 
0 

0.017 
. 

handwashing_facilities 
0.029*** 
0 

0.0024 
. 

_cons 
3.86*** 
7.64*** 
5.81*** 
2.97*** 
0.12 
0.62 
0.94 
0.16 

N 
39,779 
37,472 
9,779 
9,779 
r2_a 
0.022 
0.025 
0.084 
0.0031 
 Discussion
Our analysis uses a multivariate model, along with many univariate figures and estimations on the association between DCR and vaccination, time, and regions.
Although we control for a variety of features at country level, the interpretation should be restricted to “local effects”, as we know that there is no jump in the right hand side variables and their changes are local and smooth.
Besides, our results may be affected by endogenous correlation between error term and variables on the right hand side. For example, vaccination rate may be correlated by the development indeces, and other fixed effect features of the countries. Therefore, one should be careful in interpreting vaccination as a totally exogenous variable. As a suggestion, future studies may use instrumental variables for vaccination rate, and instrument it by factors such as rate of vaccination at neighboring countries.
 Appendix: Codes (in STATA)
set more off
clear all
use covid.dta, clear
gen dcr=100*new_deaths_smoothed_per_million /new_cases_smoothed_per_million
tabstat total_cases total_deaths icu_patients ///
people_vaccinated dcr gdp_per_capita ///
stringency_index life_expectancy ///
human_development_index extreme_poverty ///
, s( n mean sd) c(s)
*Differences over continents,
table continent, c(n dcr mean dcr)
encode continent, g(cont_id)
reg dcr i.cont_id
*Time variations
encode date, g(date_id)
twoway (qfit dcr date_id if continent=="Africa") ///
(qfit dcr date_id if continent=="Asia") ///
(qfit dcr date_id if continent=="North America") ///
(qfit dcr date_id if continent=="Oceania"), ///
legend(label(1 "Africa") label(2 "Asia") label(3 "North America") label(4 "Oceania") c(1))
gen dum2021July=regexm(date,"202107")
gen dum2022Jan=regexm(date,"202201")
reg dcr i.dum2022Jan if dum2022Jan  dum2021July
*VACCIN and DCR association
twoway (scatter dcr people_vaccinated_per_hundred if dum2022Jan)(qfit dcr people_vaccinated_per_hundred if dum2022Jan )
*Multivariate regression
encode iso_code, g(country_id)
xtset country_id date_id
est clear
eststo: reg dcr total_vaccinations_per_hundred stringency_index gdp_per_capita
eststo: reg dcr total_vaccinations_per_hundred stringency_index gdp_per_capita hospital_beds_per_thousand life_expectancy
eststo: reg dcr total_vaccinations_per_hundred stringency_index gdp_per_capita hospital_beds_per_thousand life_expectancy extreme_poverty cardiovasc_death_rate diabetes_prevalence handwashing_facilities
eststo: xtreg dcr total_vaccinations_per_hundred stringency_index gdp_per_capita hospital_beds_per_thousand life_expectancy extreme_poverty cardiovasc_death_rate diabetes_prevalence handwashing_facilities , fe
estout using TableCOVID, replace order(dcr total_vaccinations_per_hundred stringency_index gdp_per_capita hospital_beds_per_thousand life_expectancy extreme_poverty cardiovasc_death_rate diabetes_prevalence handwashing_facilities ) stats(N r2_a) starlevels(* 0.10 ** 0.05 *** 0.01) cells(b(fmt(a2) star) se(fmt(a2) ))