- 6th Apr 2022
- 06:03 am

* Importing data file

use "/Users/**/Downloads/1590159615_gpa1.dta"

* Use describe function to explain all variables

describe

* Further Information regarding each variable can be obtained by codebook command

codebook

* Now we will make variable lcolGPA

gen lcolGPA = log(colGPA)

* Now we will make variable lage

gen lage = log(age)

* Average of GPA in high school

mean(hsGPA)

* Maximum age

summarize age

* SO maximum age is 30

* Multiple linear regression of required variables

regress lcolGPA lage hsGPA ACT male skipped

* As we can see the table in Results window that standard error of skipped is 0.0086627 and standard error of male is 0.019568

* Results shows that value if R square is 0.2355

* Number of observations are 141

* From output as we can see that 95% Confidence interval of male is (-0.0323534, 0.0450454)

* Here we need to find that which variables are significant at 1% level of significance.

* So the decision rule will be that if p value corresponding to a variable is less than 0.01 then it is significant at 1% level of significance otherwise not.

* From the table we can see that p value corresponding to only hsGPA is less than 0.01 . So only hsGPA variable is significant at 1% level of significance.

* Similarly here if p value corresponding to a variable is less than 0.2 then it implies that that variable is significant at 20% level of significant.

* So variable hsGPA and skipped are the only variables that are significant at alpha = 20% level of significance.

* So based on this model we can see that model is poor and we can further improve this model by removing insignificant variables and adding more imortant variables. Here we have learned how to manipulate data and run multiple linear regression in STATA.