Search for question
Question

student note : i have the code and answers also you just need to seed it with my ID and give it back run the code with my id and give

it back/n SAS Output Data Set Page Size Number of Data Set Pages First Data Page Max Obs per Page Obs in First Data Page Data Set Name Member Type Engine Created Last Modified Number of Data Set Repairs Filename Protection Data Set Type Label Data Representation Encoding ExtendObsCounter Release Created Host Created Owner Name File Size File Size (bytes) 65536 8 1 681 657 0 YES The SAS System The CONTENTS Procedure WORK.AIRBNBO DATA V9 576KB 589824 04/30/2023 22:55:34 04/30/2023 22:55:34 WINDOWS_64 wlatin1 Western (Windows) 9.0401M7 X64_SRV19 4_\airbnb0.sas7bdat Engine/Host Dependent Information C:\Users\LOC55A~1\Temp\SAS Temporary Files\_TD28732_IS-SHPRD- DPU-AADDS\MVENISHE Observations 5000 Variables 12 Indexes 0 Observation Length 96 Deleted Observations 0 NO NO Compressed Sorted Alphabetic List of Variables and Attributes # Variable 1 Listing Month 12 PricePerNight 3 accommodates 4 bathrooms Type Len Format Informat Num 8 BEST12. BEST32. Num 8 BEST12. BEST32. Num 8 BEST12. BEST32. Num 8 BEST12. BEST32. 5 bedrooms Num 8 BEST12. BEST32. 6 beds Num 8 BEST12. BEST32. 7 guests_included Num 8 BEST12. BEST32. 2 host_total listings Num 8 BEST12. BEST32. Page 1 of 91 file:///C:/Users/local_MVENISHE/Temp/SAS%20Temporary%20Files/_TD28732_IS-SH... 4/30/2023 SAS Output 8 minimum_nights Num 9 number_of_reviews Num 10 review_scores_rating Num 11 reviews_per_month Num 8 BEST12. BEST32. 8 BEST12. BEST32. 8 BEST12. BEST32. 8 BEST12. BEST32. Page 2 of 91 file:///C:/Users/local_MVENISHE/Temp/SAS%20Temporary%20Files/_TD28732_IS-SH... 4/30/2023 SAS Output The SAS System The SURVEYSELECT Procedure Selection Method Simple Random Sampling Input Data Set Random Number Seed Sample Size Selection Probability Sampling Weight Output Data Set AIRBNBO 2129790 2000 0.4 2.5 AIRBNB1 Page 3 of 91 file:///C:/Users/local_MVENISHE/Temp/SAS%20Temporary%20Files/_TD28732_IS-SH... 4/30/2023 SAS Output bathrooms bedrooms beds The SAS System The MEANS Procedure Variable N Listing Month 2000 4.3483000 2.2512932 host_total_listings 2000 53.3640000 200.9729822 accommodates 4.8090000 3.2130485 1.0000000 1.4142500 0.7665284 1.8065000 1.2461277 2.4745000 2.0867503 32.0000000 2000 2000 2000 2000 2000 2.5795000 2.1042561 1.0000000 16.0000000 2000 3.7160000 11.4773346 1.0000000 365.0000000 number_of_reviews 2000 51.6795000 63.3991286 1.0000000 583.0000000 review_scores_rating 2000 95.3960000 6.0902159 20.0000000 100.0000000 reviews_per_month 2000 2.3512300 1.9085478 0.0200000 12.5500000 PricePerNight 2000 147.8865000 126.3896150 6.0000000 953.0000000 hostclass 1888 1.6912076 0.7782735 1.0000000 3.0000000 guests_included minimum_nights Mean Std Dev Minimum Maximum 0.3000000 11.6000000 1283.00 32.0000000 11.0000000 12.0000000 0 0 0 0 Page 4 of 91 file:///C:/Users/local_MVENISHE/Temp/SAS%20Temporary%20Files/_TD28732_IS-SH... 4/30/2023 SAS Output N Mean Std Deviation Skewness Uncorrected SS The SAS System The UNIVARIATE Procedure Variable: Listing Month Moments 2000 Sum Weights 4.3483 Sum Observations 2.25129324 Variance 0.34387476 Kurtosis 47947 Corrected SS Coeff Variation 51.7741012 Std Error Mean Test Location Basic Statistical Measures Variability Mean 4.348300 Std Deviation Median 4.300000 Variance Mode 6.100000 Range Interquartile Range Tests for Location: Mu0=0 Statistic 86.37786 Pr> |t| p Value Student's t t Sign M 1000 Pr>= |MI Signed Rank S 1000500 Pr>= |S| Quantiles (Definition 5) Level Quantile 100% Max 11.60 99% 10.25 95% 8.10 90% 7.20 75% Q3 6.10 50% Median 4.30 25% Q1 2.40 10% 1.50 5% 1.00 1% 0.50 0% Min 0.30 Extreme Observations Lowest Highest 2000 8696.6 5.06832127 -0.2967753 10131.5742 0.05034045 2.25129 5.06832 11.30000 3.70000 <.0001 <.0001 <.0001 Page 5 of 91 file:///C:/Users/local_MVENISHE/Temp/SAS%20Temporary%20Files/_TD28732_IS-SH... 4/30/2023/n I. Airbnb Price in Chicago (Sample Data) Let's work on the Airbnb price in Chicago. Here are the selected variables: Listing Month host_total listings accommodates ● bathrooms bedrooms beds ● ● ● run; ECO520 Homework 5 Regression Analysis on Airbnb Price in Chicago guests_included minimum_nights number_of_reviews review_scores_rating reviews_per_month PricePerNight Here is the SAS code to load the data: The Number of Months since listing The total number of listings by the host Maximum number of peoples to stay The Number of bathrooms The Number of bedrooms The Number of beds The Number of guests included in the price Minimum nights per rent filename webdat url "https://bigblue.depaul.edu/jlee141/econdata/eco520/airbnb2019.csv"; /* Import Chicago Community data*/ PROC IMPORT OUT= airbnb0 DATAFILE= webdat DBMS=CSV REPLACE; RUN; proc contents; run ; Total number of Reviews for the rent unit The Average score of the rating for the rent unit The Number of Reviews per Month Price per night /* Create your own random sample data. Make sure type your student ID as seed number Replace your_depaul_id with your student id (only numbers) / run ; proc surveyselect data= airbnb0 method=srs seed = your_depaul_id n = 2000 out-airbnb1 ; /* The following code will create the class of host */ data airbnb2; set airbnb1; proc means ; run ; if 0 < host total listings <3 then hostclass 1 else if 3 <= host_total listings < 20 then hostclass 2 else if host_total_listings 20 then hostclass 3 /* More variables you would create */ >= ; ; ; 1. In the airbnb2 data step, add the following new variables: 1) the most popular hosts who have more than 65 reviews as popular_host. 2) big family units that accommodate more than 8 people as big_unit. 3) long-term rent units that have more than 7 days as minimum nights as longterm. 2. Find any outliers or missing cases on all variables. If necessary, remove the outliers or any missing cases. Show your works in SAS and explanation. 3. Use scatter plots to find potential variables to have nonlinear relationship with price. Create the square of rooms, the square of beds, and the square of bathrooms. If necessary, create some squared or logarithmic variables to analyze the potential nonlinear relationships. 4. Machine Learning using Regression Analysis: Let's consider creating regression models using a training data set, save the estimated models, and predict the prices using the rest of the testing data. (Use the example we covered in the PowerPoint slides). Make sure to include all class and dummy variables you created in 1. 1) Split the Airbnb2 data to 70% as training data and 30% as testing (validating) data with a seed number of 55555. Estimate regression models as the dependent variable of PricePerNight using only the training data with the following options. 1. Your own best model 2. Adjusted R square 3. Stepwise 2) Perform the out-of-sample prediction for the observations using the observations that were not used to estimate the regression models. Find the following statistics and compare the results. Which model is the best in terms of the following statistics? 1. MSE (mean square error) 2. RMSE (root mean square error) 3. MPE (mean percentage error) 4. MAE (mean absolute error) All questions need to be typed with appropriate graphs and tables from SAS in a PDF file. Submit your SAS code as a separate text file.

Fig: 1