Student note : you just need to add my ID and give me back the code just add my id and run the code I NEED THE SAS CODE FILE IN
TXT this is my ID 2128942/n ECO520 Homework 4 Regression Analysis Review on Median House Price by County Use the following American Community Survey data for the MSAS (Metropolitan Statistical Areas) in U.S.: TITLE "American Community Survey Data (2014): Counties in U.S." ; filename webdat url "https://bigblue.depaul.edu/jlee141/econdata/eco520/acs_msa_2014.csv"; PROC IMPORT OUT= acs msa DATAFILE= webdat DBMS=CSV REPLACE; RUN; proc contents ; run ; /* Create your own random sample data. Make sure type your student ID as seed number Replace your_depaul_id with your student id (only numbers) */ proc surveyselect data= acs_msa method=srs seed = your_depaul_id n = 100 out= msasample ; run; /* Create Regression data in $1000 */ Data regdata; set msasample ; Y = MHPRICE / 1000 ; X = MHINCOME / 1000 ; RUN ; 1. Simple regression Model (12 points, 1)-3) 1 point each, 4) 5 point, 6),7) 2 point each) Let's investigate the relationship between the median house price in $1,000 (Y) and the median household income in $1,000 (X) using SAS. 1) Find the correlation coefficient and covariance between X and Y. Variable Y X The CORR Procedure Y 2 Variables: YX X Covariance Matrix, DF = 99 Y X Simple Statistics N Std Dev Sum Minimum 100 174.03100 81.05497 17403 79.40000 100 50.31738 8.47661 5032 34.80100 Y 6569.907817 464.919250 X 464.919250 71.852860 Pearson Correlation Coefficients, N = 100 Prob > Ir| under H0: Rho=0 Y Mean 1.00000 0.67667 <.0001 X 0.67667 <.0001 1.00000 Maximum 590.60000 75.68200 2) Scatter plot between X and Y with regression line Y 600 500- 400 300 200- 100 40 Scatter Plot with Regression Line 00 0000 be@o X 50 ΟΥ /* Simple Regression */ Y = bo + blx + e proc reg data= regdata ; model Y = X ; run ; офф Source DF Model Error Corrected Total 99 1 98 X O O 3) Perform the following regression analysis using the SAS code: Regression The REG Procedure Model: MODEL1 Dependent Variable: Y O 60 Number of Observations Read 100 Number of Observations Used 100 Analysis of Variance Sum of Squares 297815 352606 3598.02176 650421 Parameter Estimates Parameter Standard Variable DF Estimate Intercept 1 -151.54433 36.28500 1 6.47043 0.71120 O O O Mean Square F Value Pr> F 297815 82.77 <.0001 59.98351 R-Square 0.4579 Root MSE Dependent Mean 174.03100 Adj R-Sq 0.4523 Coeff Var 34.46714 Error t Value Pr>|t| -4.18 <.0001 9.10 <.0001 O 70 O 00 Residual Residual Percent 200 100 0 -100 200 100 0 -100 00 40 30 20 10 0 100 150 200 250 300 350 Predicted Value -2 -1 0 Quantile 1 T -150 -30 90 O Residual O T 2 210 RStudent Y The REG Procedure Model: MODEL1 Dependent Variable: Y Fit Diagnostics for Y 4 N O -2 600 500 400 300 200 100 200 100 0 -100 100 150 200 250 300 350 Predicted Value 100 200 300 400 500 600 Predicted Value Fit-Mean Coooox Residual 0.0 0.4 0.8 0.0 0.4 0.8 Proportion Less RStudent Cook's D 4 2 O -2 1.0 0.8 0.6 0.4 0.2 0.0 00 O O O 0.02 0.04 0.06 0.08 0.10 Leverage Observations 100 Parameters 2 Error DF MSE R-Square 00 0 20 40 60 80 100 Observation 98 3598 0.4579 Adj R-Square 0.4523 Residual 200 100 00 -100 600 400 200 O O O O 0 O O 40 ooo o 40 O 8 of O O 8 O O ܘܘ ܘ ܘ 8 ooooo @800 50 Residuals for Y 50 O O O o O X X o O Fit 95% Confidence Limits O Fit Plot for Y O O O 60 60 O O O O 70 O 70 O O 95% Prediction Limits O 00 100 Observations Parameters Error DF 2 98 3598 MSE R-Square 0.4579 Adj R-Square 0.4523 4) Find or calculate the following statistics using the regression output and explain their meanings. a. Total Sum of Square (SST) The SST is 650421. SST is the total variation in the dependent variable i.e. Y around its mean. IT is the summation of SSR and SSE. b. Regression Sum of Square (SSR) The SSR is 297815. SSR is the variation in the dependent variable Y explained by the regression model. C. Error Sum of Square (SSE) The SSE is 352606. SSE tells the variation in the dependent variable Y that is not explained by the regression model. d. e. Variance and Standard Deviation of Y Variance of Y is 6569.9. It is the spread of Y from its mean. and the Standard Deviation is 81.05. Standard deviation is the measure of the average by which Y is deviated from the mean. f. R square and Adjusted R square R square is 0.4579. It tells how much percentage of variation in Y is explained by X. It ranges from 0 to 1. The higher the R square, the more efficient the predictability will be. The adjusted R square is 0.4523. It adds a penalty to the R-square for every new junk variable added. It increases only if the variable added is meaningful. Variance and Standard Deviation of error (e) Variance of e is 3598.02. It gives how much the actual values differ from the values predicted by regression model. and the Standard Deviation is 59.98. Standard deviation is the measure of the average by which the actual values differ from the values predicted by regression model. g. Standard Error and Variance of bo Standard error of b0 is 36.28500 and the variance is 1316.60. Standard error indicates uncertainty in the estimated intercept b0 and variances indicate the spread in intercept values. i. h. Standard Error and Variance of bl Standard error of b1 is 0.71120 and the variance is 0.50. Standard error of b1 indicates the precision in the slope coefficient estimated by the regression model and variance indicates how much estimated slope changes from each sample. t statistic and P-value of t statistic of b1 T static value of b1 is 9.10 and p value is <.0001. The T value is how many standard errors the slope is away from zero. The p value is how confident the model is of t static value. 5) Perform the following hypothesis tests using the output Ho: B1 = 0, Ha: B1 #0 Here for X, T static = 9.10 and p <.0001 The t value is much larger at the significance level of 0.05. We can reject the null hypothesis and can conclude that there is a statistically significant relationship between X and Y./n