- 22nd Dec 2022
- 06:03 am

```{r setup, include=FALSE}

knitr::opts_chunk$set(echo = TRUE)

options(warn = -1)

```

## 1. Provide appropriate graphs to display frequency distributions for school, age, visperc and flags variables

Below are the graphs for school, age, visperc and flags

```{r, echo=FALSE, eval=TRUE}

holzinger <- read.csv("holzinger.csv")

library(ggplot2)

attach(holzinger)

ggplot(holzinger, aes(x = school)) + geom_bar()

ggplot(holzinger, aes(x = ageyr)) + geom_bar()

ggplot(holzinger, aes(x = visperc))+ geom_histogram()

ggplot(holzinger, aes(x = flags))+ geom_histogram()

```

## 2. Create a variable from ageyr and agemo of those above the median age and those equal to or below median age.

```{r,echo=FALSE, eval=TRUE}

age_abs = ageyr*12+agemo

med <- median(age_abs)

age_med <- as.factor(as.integer(age_abs>med))

holzinger <- cbind(holzinger, age_med)

```

We have created the median split on age and will compare with the school variable.

### (a) Is a median split on age related to school? (i.e., produce a crosstabulation and chi-square test).

```{r, echo=FALSE, eval=TRUE}

table(school, age_med)

chisq.test(table(school, age_med))

```

The p-value of the chi-square test is $0.000067$ which is extremely small and hence, we conclude that there's some association between age and school.

### (b) Produce a graph comparing educational attainment by household income.

No such variables found.

## 3. Using t-tests, are there gender differences on: (a) visperc (b) wordmean (c) addition.

We shall use 0.05 as out level of significance.

A. Visperc

```{r, echo=FALSE, eval=TRUE}

t.test(visperc~sex)

```

There is no gender differences in visperc as the t-test p-value is 0.1597 which is higher than out level of significance 0.05.

B. Wordmean

```{r, echo=FALSE, eval=TRUE}

t.test(wordmean~sex)

```

There is no gender differences in wordmean as the t-test p-value is 0.8491 which is higher than out level of significance 0.05. The test is _not_ statistically insignificant.

C. addition

```{r, echo=FALSE, eval=TRUE}

t.test(addition~sex)

```

There is significant gender differences in addition as the t-test p-value is 0.04365 which is lower than out level of significance 0.05. The test is statistically insignificant.

## 4. Provide graphs to show these gender differences in (a) visperc (b)wordmean (c) addition

```{r, echo=FALSE, eval=TRUE}

ggplot(holzinger, aes(x = as.factor(sex), y = visperc)) + geom_boxplot() + xlab("Gender") + ylab("scores on visual perception test, test 1") + labs(title = "Box plot")

ggplot(holzinger, aes(x = as.factor(sex), y = wordmean)) + geom_boxplot() + xlab("Gender") + ylab("scores on word meaning test, test 9") + labs(title = "Box plot")

ggplot(holzinger, aes(x = as.factor(sex), y = addition)) + geom_boxplot() + xlab("Gender") + ylab("scores on add test, test 10") + labs(title = "Box plot")

```

## 5. Run a multiple regression with visperc as the dependent variable and cubes, sencomp, wordmean, and addition as predictors

```{r, echo=FALSE, eval=TRUE}

lm1 <- lm(visperc ~ cubes + sencomp + wordmean + addition)

summary(lm1)

```

The regression model with visperc as dependent variable and cubes, sencomp, wordmean, and addition as predictors is significant with $R^2$ of 0.1865. This means that predictors were able to explain 18.65\% of variance in the data. Among predictor variables, only cubes and wordmean were statistically significant at 5\% level is significance.

## 6. Produce a scatterplot of visperc on cubes. Put the regression line with standard error on the graph.

```{r, echo=FALSE, eval=TRUE}

ggplot(holzinger, aes(y = visperc, x = cubes)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Scatterplot with regression line")

```

## Appendix

```{r, echo=TRUE, eval=FALSE}

holzinger <- read.csv("holzinger.csv")

library(ggplot2)

attach(holzinger)

ggplot(holzinger, aes(x = school)) + geom_bar()

ggplot(holzinger, aes(x = ageyr)) + geom_bar()

ggplot(holzinger, aes(x = visperc))+ geom_histogram()

ggplot(holzinger, aes(x = flags))+ geom_histogram()

age_abs = ageyr*12+agemo

med <- median(age_abs)

age_med <- as.factor(as.integer(age_abs>med))

holzinger <- cbind(holzinger, age_med)

table(school, age_med)

chisq.test(table(school, age_med))

t.test(visperc~sex)

t.test(wordmean~sex)

t.test(addition~sex)

ggplot(holzinger, aes(x = as.factor(sex), y = visperc)) + geom_boxplot() + xlab("Gender") + ylab("scores on visual perception test, test 1") + labs(title = "Box plot")

ggplot(holzinger, aes(x = as.factor(sex), y = wordmean)) + geom_boxplot() + xlab("Gender") + ylab("scores on word meaning test, test 9") + labs(title = "Box plot")

ggplot(holzinger, aes(x = as.factor(sex), y = addition)) + geom_boxplot() + xlab("Gender") + ylab("scores on add test, test 10") + labs(title = "Box plot")

lm1 <- lm(visperc ~ cubes + sencomp + wordmean + addition)

ggplot(holzinger, aes(y = visperc, x = cubes)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Scatterplot with regression line")

```