- 19th Jan 2023
- 06:03 am
---
title: "Probability and Statistics"
author:
output:
word_document: default
pdf_document: default
---
```{r start, include=FALSE}
options(warn = -1)
library(ggplot2)
```
## 1. Assume that out of all buyers of a particular digital camera, 60?dan extra memory card, 40?dan extra battery, and 30% buy both a card and a battery. Consider picking a buyer at random and having them buy A a memory card and B a battery:
### A. Find probabilities P(A) and P(B).
$$ P(A) = 0.6 $$
$$ P(B) = 0.4 $$
### B. What is the likelihood that an optional card was also bought, given that the chosen person bought an extra battery?
$$ P(A|B) = P(A \cap B)/P(B) = 0.3/0.4 = 0.75 $$
### What is the likelihood that a spare battery was also purchased, given that the chosen person bought an optional card?
$$ P(B|A) = P(A \cap B)/P(A) = 0.3/0.6 = 0.5 $$
### Are the events A and B independent ? Justify it.
No. The vents A and B are not independent as $P(A|B) \neq P(A)$
## 2. A business is thinking of putting in four oil wells. Regardless of the outcomes for any other wells, the likelihood of success for each well is 0.40. Each well costs $200,000 to build. Each productive well will be worth $600,000.
Let X be the number of successful well. Then $X \sim Bin(4, 0.4)$.
### How likely is it that at least one well will be successful?
$$ P(X \geq 1) = \sum_{i=2}^4 {4\choose 2}0.4^i0.6^{4-i} $$
We shall use R to calculate this:
```{r}
1 - pbinom(0, 4, 0.4)
```
Hence, $P(X \geq 1) = 0.8704$
### What is the expected number of successes?
The expected number of successes is $0.4 \times 4 = 1.6$
### What is the expected gain?
The expected gain is: $480,000.
```{r}
x <- 0:4
y <- 4 - x
probs <- dbinom(x,4,0.4)
gains <- sum((x*600000 - (4-x)*200000)*probs)
gains
```
## 3. An insurance company believes that people can be divided into two classes: those who are accident prone and those who are not. Their statistics show that an accident-prone person will have an accident at some time within a fixed 1-year period with probability 0.4, whereas this probability decreases to 0.2 for a non-accident-prone person.
### If we assume that 30 percent of the population is accident prone, what is the probability that a new policyholder will have an accident within a year of purchasing a policy ?
The probability that a new policyholder will have an accident within a year of purchasing a policy is:
$$ P(Accident \; within \; 1-Year) = P(Accident|Accident\; Prone)\times P(Accident\; Prone) + P(Accident|NOT\; Accident\; Prone)\times P(NOT \;Accident\; Prone) $$
Hence,
$$ P(Accident \; within \; 1-Year) = 0.4\times 0.3 + 0.2\times0.7 = 0.12+0.14 = 0.26 $$
### Suppose that a new policyholder has an accident within a year of purchasing a policy. What is the probability that he or she is accident prone?
$$ P(Accident \; Prone|Accident ) = \frac{P(A|A_P)\times P(AP)}{P(A|AP)\times P(AP) +P(A|AP^c)\times P(AP^c) } $$
Hence, $$ P(Accident \; Prone|Accident ) = \frac{0.4\times0.3}{0.4\times0.3 + 0.2\times 0.7} = \frac{0.12}{0.26}=.4615 $$
## (4) Investigate the rivers data. The U.S. Geological Survey recorded the lengths (in miles) of several rivers in North America. They are stored in the vector rivers in the datasets package (which ships with base R). Type ?rivers in R console. Plot its histogram and boxplot. Specify the median and 75% quantile and identify whether there are outliers. Include your R codes as well.
```{r}
data("rivers")
ggplot(data.frame(rivers), aes(x = rivers)) + geom_histogram()
ggplot(data.frame(rivers), aes(y = rivers)) + geom_boxplot()
summary(rivers)
```
Median of the data is 425. The 75% quantile is 680 and max is 3710. There is clearly some outliers as seen from histogram and boxplot.