- 22nd Dec 2022
- 06:03 am
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(warn = -1)
```
## 1. Provide appropriate graphs to display frequency distributions for school, age, visperc and flags variables
Below are the graphs for school, age, visperc and flags
```{r, echo=FALSE, eval=TRUE}
holzinger <- read.csv("holzinger.csv")
library(ggplot2)
attach(holzinger)
ggplot(holzinger, aes(x = school)) + geom_bar()
ggplot(holzinger, aes(x = ageyr)) + geom_bar()
ggplot(holzinger, aes(x = visperc))+ geom_histogram()
ggplot(holzinger, aes(x = flags))+ geom_histogram()
```
## 2. Create a variable from ageyr and agemo of those above the median age and those equal to or below median age.
```{r,echo=FALSE, eval=TRUE}
age_abs = ageyr*12+agemo
med <- median(age_abs)
age_med <- as.factor(as.integer(age_abs>med))
holzinger <- cbind(holzinger, age_med)
```
We have created the median split on age and will compare with the school variable.
### (a) Is a median split on age related to school? (i.e., produce a crosstabulation and chi-square test).
```{r, echo=FALSE, eval=TRUE}
table(school, age_med)
chisq.test(table(school, age_med))
```
The p-value of the chi-square test is $0.000067$ which is extremely small and hence, we conclude that there's some association between age and school.
### (b) Produce a graph comparing educational attainment by household income.
No such variables found.
## 3. Using t-tests, are there gender differences on: (a) visperc (b) wordmean (c) addition.
We shall use 0.05 as out level of significance.
A. Visperc
```{r, echo=FALSE, eval=TRUE}
t.test(visperc~sex)
```
There is no gender differences in visperc as the t-test p-value is 0.1597 which is higher than out level of significance 0.05.
B. Wordmean
```{r, echo=FALSE, eval=TRUE}
t.test(wordmean~sex)
```
There is no gender differences in wordmean as the t-test p-value is 0.8491 which is higher than out level of significance 0.05. The test is _not_ statistically insignificant.
C. addition
```{r, echo=FALSE, eval=TRUE}
t.test(addition~sex)
```
There is significant gender differences in addition as the t-test p-value is 0.04365 which is lower than out level of significance 0.05. The test is statistically insignificant.
## 4. Provide graphs to show these gender differences in (a) visperc (b)wordmean (c) addition
```{r, echo=FALSE, eval=TRUE}
ggplot(holzinger, aes(x = as.factor(sex), y = visperc)) + geom_boxplot() + xlab("Gender") + ylab("scores on visual perception test, test 1") + labs(title = "Box plot")
ggplot(holzinger, aes(x = as.factor(sex), y = wordmean)) + geom_boxplot() + xlab("Gender") + ylab("scores on word meaning test, test 9") + labs(title = "Box plot")
ggplot(holzinger, aes(x = as.factor(sex), y = addition)) + geom_boxplot() + xlab("Gender") + ylab("scores on add test, test 10") + labs(title = "Box plot")
```
## 5. Run a multiple regression with visperc as the dependent variable and cubes, sencomp, wordmean, and addition as predictors
```{r, echo=FALSE, eval=TRUE}
lm1 <- lm(visperc ~ cubes + sencomp + wordmean + addition)
summary(lm1)
```
The regression model with visperc as dependent variable and cubes, sencomp, wordmean, and addition as predictors is significant with $R^2$ of 0.1865. This means that predictors were able to explain 18.65\% of variance in the data. Among predictor variables, only cubes and wordmean were statistically significant at 5\% level is significance.
## 6. Produce a scatterplot of visperc on cubes. Put the regression line with standard error on the graph.
```{r, echo=FALSE, eval=TRUE}
ggplot(holzinger, aes(y = visperc, x = cubes)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Scatterplot with regression line")
```
## Appendix
```{r, echo=TRUE, eval=FALSE}
holzinger <- read.csv("holzinger.csv")
library(ggplot2)
attach(holzinger)
ggplot(holzinger, aes(x = school)) + geom_bar()
ggplot(holzinger, aes(x = ageyr)) + geom_bar()
ggplot(holzinger, aes(x = visperc))+ geom_histogram()
ggplot(holzinger, aes(x = flags))+ geom_histogram()
age_abs = ageyr*12+agemo
med <- median(age_abs)
age_med <- as.factor(as.integer(age_abs>med))
holzinger <- cbind(holzinger, age_med)
table(school, age_med)
chisq.test(table(school, age_med))
t.test(visperc~sex)
t.test(wordmean~sex)
t.test(addition~sex)
ggplot(holzinger, aes(x = as.factor(sex), y = visperc)) + geom_boxplot() + xlab("Gender") + ylab("scores on visual perception test, test 1") + labs(title = "Box plot")
ggplot(holzinger, aes(x = as.factor(sex), y = wordmean)) + geom_boxplot() + xlab("Gender") + ylab("scores on word meaning test, test 9") + labs(title = "Box plot")
ggplot(holzinger, aes(x = as.factor(sex), y = addition)) + geom_boxplot() + xlab("Gender") + ylab("scores on add test, test 10") + labs(title = "Box plot")
lm1 <- lm(visperc ~ cubes + sencomp + wordmean + addition)
ggplot(holzinger, aes(y = visperc, x = cubes)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Scatterplot with regression line")
```