- 6th Apr 2022
- 06:03 am
* Importing data file
use "/Users/**/Downloads/1590159615_gpa1.dta"
* Use describe function to explain all variables
describe
* Further Information regarding each variable can be obtained by codebook command
codebook
* Now we will make variable lcolGPA
gen lcolGPA = log(colGPA)
* Now we will make variable lage
gen lage = log(age)
* Average of GPA in high school
mean(hsGPA)
* Maximum age
summarize age
* SO maximum age is 30
* Multiple linear regression of required variables
regress lcolGPA lage hsGPA ACT male skipped
* As we can see the table in Results window that standard error of skipped is 0.0086627 and standard error of male is 0.019568
* Results shows that value if R square is 0.2355
* Number of observations are 141
* From output as we can see that 95% Confidence interval of male is (-0.0323534, 0.0450454)
* Here we need to find that which variables are significant at 1% level of significance.
* So the decision rule will be that if p value corresponding to a variable is less than 0.01 then it is significant at 1% level of significance otherwise not.
* From the table we can see that p value corresponding to only hsGPA is less than 0.01 . So only hsGPA variable is significant at 1% level of significance.
* Similarly here if p value corresponding to a variable is less than 0.2 then it implies that that variable is significant at 20% level of significant.
* So variable hsGPA and skipped are the only variables that are significant at alpha = 20% level of significance.
* So based on this model we can see that model is poor and we can further improve this model by removing insignificant variables and adding more imortant variables. Here we have learned how to manipulate data and run multiple linear regression in STATA.