Solved: ECO520 HOMEWORK WEEK3 Clustering Analysis on Mortgage Loan Approval T

Home
questions and answers
eco520 homework week3 clustering analysis on mortgage loan approval th

Question

This question hasn’t solved by tutor

Solution unavailable? No problem! Generate answers instantly with our AI tool, or receive a tailored solution from our expert tutors.

Found 8 similar results for your question:

3. The coating experiment, described in Excercise 7 of Chapter 7, was to study the effect of different spray parameters on thermal spray coating properties. In the experiment, the authors attempted to produce high-quality alumina (Al2O3) coatings by controlling the fuel ratio (factor A at 1:2.8 and 1:2.0), carrier gas flow rate (factor B at 1.33 and 3.21 L s-¹), frequency of detonations (factor C at 2 and 4 Hz), and spray distance (factor D at 180 and 220 mm). To quantify the quality of the coating, the researchers measured multiple response variables. In this example we will examine the porosity (vol. %). The data are shown in the table below and can be downloaded from http: //deanvossdraguljic.ietsandbox.net/DeanVossDraguljic/SAS-data.html. A B с D Yijkl A B C D 2 2 2 2 5.95 2 2 2 1 2 2 1 2 1 4.57 4.03 2.17 1 2 2 2 1 2 2 2 2 1 1 1 2 1 2 1 1 1 1 1 2 1 1 1 1 2 2 1 2 1 2 1 2 Yijkl 12.28 9.57 6.73 6.07 8.49 4.92 6.95 5.31 1 2 2 3.43 2 1 2 1 1.02 2 1 1 2 4.25 2 1 1 1 2.13 1 1 1 1 (a). [3pts]. Run a model with ALL main-effects and two-way interaction effects. Write down the SAS code copy the ANOVA table from SAS. and (b).[2pts] The 95% confidence interval for the difference between the two main effects of B is ( (c).[2pts] Do you believe there are significant interaction effects between B and C? Answer yes or no. (d).[3pts] Generate an interaction plot for the B*C interaction. State how the plot supports your answer in c. [Hint: use 1smeans B*C to get the least squares estimates, then plot it using any software of your choice.] (e).[3pts] The 90% confidence upper limit for the error variance o² is Show your work.

2. Consider the reaction time experiment described in Exercise 4 of Chapter 4. a. [3pts] Write down the two-way complete model for the experiment. Remember to explain each term in the model. b. [3pts] Find the sums of squares that are accounted for by the factors and their interactions, i.e., ssA, ssB and ssAB. c.[5pts] Generate an interaction plot. Do you see an obvious interaction between the two factors? Carry out a formal hypothesis testing for the interaction. d. [3pts] Test the hypothesis that different elapsed times have the same effects on the reaction time. e.[3pts] Find a 95% confidence interval for the difference between the average reaction time from the auditory cue and the average reaction time from the visual cue [Hint: this is to compare the two main effects of factor cue]. f.[3pts] Find an appropriate confidence interval for the difference between auditory cue and the visual cue, when the elapse time is 5 seconds.

2. An experiment is to be run with two factors: Factor A with 2 levels and factor B with 4 levels. The experimenter would like to examine the pairwise differences between the four levels of factor B, with a simultaneous confidence level of 90%. The experimenter is confident that the two factors do not interact and will employ a two-way main-effects model. Furthermore, the experimenter believes that the mse will be unlikely to exceed 25. Find the required sample size for the 90% simultaneous confidence intervals for the pairwise comparison of main effects of B to have a width at most 10. (a) [2pts, no partial credit]The required sample size is at least (b) [2pts] Provide the SAS code.

1. Considering the two-way main effects model with two factors: Yijt = μ+αi+Bj+€ijt, i = 1, 2, 3; j = 1,2, 3, t = 1,2,,r. Anwser True or False to the following statements. (a) [1pt] μ+0₁+ B₂ is estimable. (b) [1pt] μ+ a₁ + (31 +3₂) is estimable. (c)[1pt] 31 (3₂ +33) is estimable. (d) [1pt] B₁ (32 +33)/2 is estimable. () CCCC () () ()

1. In a completely randomized design, there are two factors, A with two levels and B with three levels. Suppose the 6 treatment means are 116, 12 = 10,13 = 8 H21 = 5, 22 = 5,/23 = 5. Note these are the true treatment means and are supposed to be known. Answer the following questions. a. [3pts] Are there interaction effects? Why? [Hint: Use the definition of interations.] b. [3pts] Find μ, ai, Bj and (aß)ij, i = 1, 2, j = 1, 2, 3, such that Hij = μl + ai + Bj + (aß)ij.

student note : i have the code and answers also you just need to seed it with my ID and give it back run the code with my id and give it back/n SAS Output Data Set Page Size Number of Data Set Pages First Data Page Max Obs per Page Obs in First Data Page Data Set Name Member Type Engine Created Last Modified Number of Data Set Repairs Filename Protection Data Set Type Label Data Representation Encoding ExtendObsCounter Release Created Host Created Owner Name File Size File Size (bytes) 65536 8 1 681 657 0 YES The SAS System The CONTENTS Procedure WORK.AIRBNBO DATA V9 576KB 589824 04/30/2023 22:55:34 04/30/2023 22:55:34 WINDOWS_64 wlatin1 Western (Windows) 9.0401M7 X64_SRV19 4_\airbnb0.sas7bdat Engine/Host Dependent Information C:\Users\LOC55A~1\Temp\SAS Temporary Files\_TD28732_IS-SHPRD- DPU-AADDS\MVENISHE Observations 5000 Variables 12 Indexes 0 Observation Length 96 Deleted Observations 0 NO NO Compressed Sorted Alphabetic List of Variables and Attributes # Variable 1 Listing Month 12 PricePerNight 3 accommodates 4 bathrooms Type Len Format Informat Num 8 BEST12. BEST32. Num 8 BEST12. BEST32. Num 8 BEST12. BEST32. Num 8 BEST12. BEST32. 5 bedrooms Num 8 BEST12. BEST32. 6 beds Num 8 BEST12. BEST32. 7 guests_included Num 8 BEST12. BEST32. 2 host_total listings Num 8 BEST12. BEST32. Page 1 of 91 file:///C:/Users/local_MVENISHE/Temp/SAS%20Temporary%20Files/_TD28732_IS-SH... 4/30/2023 SAS Output 8 minimum_nights Num 9 number_of_reviews Num 10 review_scores_rating Num 11 reviews_per_month Num 8 BEST12. BEST32. 8 BEST12. BEST32. 8 BEST12. BEST32. 8 BEST12. BEST32. Page 2 of 91 file:///C:/Users/local_MVENISHE/Temp/SAS%20Temporary%20Files/_TD28732_IS-SH... 4/30/2023 SAS Output The SAS System The SURVEYSELECT Procedure Selection Method Simple Random Sampling Input Data Set Random Number Seed Sample Size Selection Probability Sampling Weight Output Data Set AIRBNBO 2129790 2000 0.4 2.5 AIRBNB1 Page 3 of 91 file:///C:/Users/local_MVENISHE/Temp/SAS%20Temporary%20Files/_TD28732_IS-SH... 4/30/2023 SAS Output bathrooms bedrooms beds The SAS System The MEANS Procedure Variable N Listing Month 2000 4.3483000 2.2512932 host_total_listings 2000 53.3640000 200.9729822 accommodates 4.8090000 3.2130485 1.0000000 1.4142500 0.7665284 1.8065000 1.2461277 2.4745000 2.0867503 32.0000000 2000 2000 2000 2000 2000 2.5795000 2.1042561 1.0000000 16.0000000 2000 3.7160000 11.4773346 1.0000000 365.0000000 number_of_reviews 2000 51.6795000 63.3991286 1.0000000 583.0000000 review_scores_rating 2000 95.3960000 6.0902159 20.0000000 100.0000000 reviews_per_month 2000 2.3512300 1.9085478 0.0200000 12.5500000 PricePerNight 2000 147.8865000 126.3896150 6.0000000 953.0000000 hostclass 1888 1.6912076 0.7782735 1.0000000 3.0000000 guests_included minimum_nights Mean Std Dev Minimum Maximum 0.3000000 11.6000000 1283.00 32.0000000 11.0000000 12.0000000 0 0 0 0 Page 4 of 91 file:///C:/Users/local_MVENISHE/Temp/SAS%20Temporary%20Files/_TD28732_IS-SH... 4/30/2023 SAS Output N Mean Std Deviation Skewness Uncorrected SS The SAS System The UNIVARIATE Procedure Variable: Listing Month Moments 2000 Sum Weights 4.3483 Sum Observations 2.25129324 Variance 0.34387476 Kurtosis 47947 Corrected SS Coeff Variation 51.7741012 Std Error Mean Test Location Basic Statistical Measures Variability Mean 4.348300 Std Deviation Median 4.300000 Variance Mode 6.100000 Range Interquartile Range Tests for Location: Mu0=0 Statistic 86.37786 Pr> |t| p Value Student's t t Sign M 1000 Pr>= |MI Signed Rank S 1000500 Pr>= |S| Quantiles (Definition 5) Level Quantile 100% Max 11.60 99% 10.25 95% 8.10 90% 7.20 75% Q3 6.10 50% Median 4.30 25% Q1 2.40 10% 1.50 5% 1.00 1% 0.50 0% Min 0.30 Extreme Observations Lowest Highest 2000 8696.6 5.06832127 -0.2967753 10131.5742 0.05034045 2.25129 5.06832 11.30000 3.70000 <.0001 <.0001 <.0001 Page 5 of 91 file:///C:/Users/local_MVENISHE/Temp/SAS%20Temporary%20Files/_TD28732_IS-SH... 4/30/2023/n I. Airbnb Price in Chicago (Sample Data) Let's work on the Airbnb price in Chicago. Here are the selected variables: Listing Month host_total listings accommodates ● bathrooms bedrooms beds ● ● ● run; ECO520 Homework 5 Regression Analysis on Airbnb Price in Chicago guests_included minimum_nights number_of_reviews review_scores_rating reviews_per_month PricePerNight Here is the SAS code to load the data: The Number of Months since listing The total number of listings by the host Maximum number of peoples to stay The Number of bathrooms The Number of bedrooms The Number of beds The Number of guests included in the price Minimum nights per rent filename webdat url "https://bigblue.depaul.edu/jlee141/econdata/eco520/airbnb2019.csv"; /* Import Chicago Community data*/ PROC IMPORT OUT= airbnb0 DATAFILE= webdat DBMS=CSV REPLACE; RUN; proc contents; run ; Total number of Reviews for the rent unit The Average score of the rating for the rent unit The Number of Reviews per Month Price per night /* Create your own random sample data. Make sure type your student ID as seed number Replace your_depaul_id with your student id (only numbers) / run ; proc surveyselect data= airbnb0 method=srs seed = your_depaul_id n = 2000 out-airbnb1 ; /* The following code will create the class of host */ data airbnb2; set airbnb1; proc means ; run ; if 0 < host total listings <3 then hostclass 1 else if 3 <= host_total listings < 20 then hostclass 2 else if host_total_listings 20 then hostclass 3 /* More variables you would create */ >= ; ; ; 1. In the airbnb2 data step, add the following new variables: 1) the most popular hosts who have more than 65 reviews as popular_host. 2) big family units that accommodate more than 8 people as big_unit. 3) long-term rent units that have more than 7 days as minimum nights as longterm. 2. Find any outliers or missing cases on all variables. If necessary, remove the outliers or any missing cases. Show your works in SAS and explanation. 3. Use scatter plots to find potential variables to have nonlinear relationship with price. Create the square of rooms, the square of beds, and the square of bathrooms. If necessary, create some squared or logarithmic variables to analyze the potential nonlinear relationships. 4. Machine Learning using Regression Analysis: Let's consider creating regression models using a training data set, save the estimated models, and predict the prices using the rest of the testing data. (Use the example we covered in the PowerPoint slides). Make sure to include all class and dummy variables you created in 1. 1) Split the Airbnb2 data to 70% as training data and 30% as testing (validating) data with a seed number of 55555. Estimate regression models as the dependent variable of PricePerNight using only the training data with the following options. 1. Your own best model 2. Adjusted R square 3. Stepwise 2) Perform the out-of-sample prediction for the observations using the observations that were not used to estimate the regression models. Find the following statistics and compare the results. Which model is the best in terms of the following statistics? 1. MSE (mean square error) 2. RMSE (root mean square error) 3. MPE (mean percentage error) 4. MAE (mean absolute error) All questions need to be typed with appropriate graphs and tables from SAS in a PDF file. Submit your SAS code as a separate text file.

ECO520 HOMEWORK WEEK3 Clustering Analysis on Mortgage Loan Approval The Home Mortgage Disclosure Act (HMDA) is a federal law that requires certain financial institutions to provide mortgage data to the public. Congress passed the HMDA in 1975 to promote transparency in the mortgage lending market and protect consumers from discriminatory lending practices. The following variables are from the original data for Illinois in 2020. RUN; ● /* Read Data from bigblue */ filename webdat url "https://bigblue.depaul.edu/jlee141/econdata/eco520/hmda_il.csv" PROC IMPORT DATAFILE= webdat OUT= hmda DBMS=CSV REPLACE; approved: 1 if approved, 0 not approved loan_amount: Mortgage loan amount population: Total population size median_income: Median Family Income minority: the Rate of Minority Populations age_house: the Average age of buildings run; proc surveyselect data=hmda method=srs seed = your_depaul_id n = 200000 out= myhmda ; /* Create your own random sample data. Make sure type your student ID as seed number Replace your_depaul_id with your student id (only numbers) */ /*Creating the census_tract summary table by applying proc sql query */ PROC SQL; Create table tract summary as select distinct census tract, avg (loan_amount) as ave_amount, avg (median_income) as avg_income, avg (population) as population, avg (minority) as minority, avg (approved) as approval_rate from myhmda group by census tract; quit; proc means data=tract_summary ; run ; ; Three Variable Clustering Analysis (Use the “tract_summary" data) Let's find the best way to classify the item using three variables; avg_income, population, and minority. Use the clustering method to find the most suitable clusters. (Explain how you come up with the number of clusters and describe why you prefer the one you chose) Minimum Required work: ● ● Potential issues on outliers or problems of the data (remove only extreme outliers if necessary) ● Show the best number of clusters using various settings of clusters One hierarchical Model and one K-Means Model, and compare the differences. Use graphs to illustrate the different clusters Name each group utilizing the summary statistics by the clusters from the K-Means Model Using the ANOVA test, find if the clusters are related to the approval_rate All questions need to be typed with appropriate graphs and tables from SAS in a PDF file. Submit your SAS code as a separate text file. Do not make a zip file.

Use SAS code to answer the following questions using MYCOVID19 data. Submit the SAS code as a txt file and a pdf or doc file for the report. Only include necessary tables or graphs from the SAS output in the report. DO NOT INCLUDE ALL SAS OUTPUT IN THE REPORT ECO520 Midterm Project Topic: COVID19 Data in Wisconsin by Census Tract (25 points total) Wisconsin COVID-19 data by census tract boundary All data are laboratory-confirmed cases of COVID-19 that we freeze once a day to verify and ensure that we are reporting accurate information. The number of people with positive/negative test results includes only Wisconsin residents who had their results reported electronically to DHS. Here are descriptions of the variables in the data Variable Name GEOID State CENSUS_TRACT COUNTY DATE POSITIVE NEGATIVE DEATHS HOSP_YES HOSP_NO AREA_LAND AREA_WATER POPULATION POP_LT18 POP_65P HOUS_NO_VEH ADULT_LIMITED_ENGLISH ADULT_SPANISH_LENG POP_BELOWPOV POP_DISABILITY POP_MEDICAD POP_MEDICARE POP_HEALTHINS HOUS_NOSMARTPHN HOUS NOINTERNET Variable Description Geographic ID State Census Tract Number County Name Last Date of Report Number of Positive on COVID19 Test Number of Negative on COVID19 Test Number of Deaths by COVID19 Number of Hospitalized by COVID19 Number of Not Hospitalized by COVID19 Land Area Size Water Area Size Total Population Percent of Population that is Less Than 18 Years Percent of Population that is 65 Years and Over Percent of households with no vehicle available Percent of adults 18 years and over who have limited English ability Percent of adults 18 years and over who speak Spanish and have limited English ability Percent of Population whose income in the past 12 months is below poverty level Percent of Population with a Disability Percent of Population with Medicaid/Means-Tested Public Coverage Percent of Population with Medicare Coverage Percent of Population with No Health Insurance Coverage Percent of Households that Have No Smartphone Percent of Households with No Internet Access Here is the SAS code to load the data into your SAS program filename webdat url "https://bigblue.depaul.edu/jlee141/econdata/eco520/COVID19_WI22.csv"; proc import datafile=webdat out = COVID19 DBMS = CSV replace; run ; run ; /*Select 500 randomly selected census tracts in WI using YourDePaulID */ proc surveyselect data= COVID19 method=srs seed Your De PaulID N=500 out= MYCOVID19 ; run; proc contents data=MYCOVID19 ; run ; 1. Data Steps (2 points) 1) Create the following variables (the recommended variable names in parentheses). ● ● ● The percent of POSITIVE by POPULATION: Pct POSITIVE = 100* POSTIVE/POPULATION The percent of DEATH by POPULATION: Pct DEATH = 100*DEATHS/POPULATION The percent of HOSP_YES by POSITIVE: Pct_HOSP_POSITIVE= 100*HOSP_YES/POSITIVE The percent of POSITIVE results by TOTAL TESTS: Pct_POSTIVE_TEST = 100* POSITIVE / (NEGATIVE+POSITIVE) A category variable for the size of census tract by POPULATION (SIZE_CLASS). Let's define the size of the census tract: if population is less than 2000, then SIZE CLASS =1, if 2000 <= POPULATION < 5000 then SIZE CLASS 2, and SIZE_CLASS = 3 if the POPULATION >= 5000. Remove any observations that have missing or zero for GEOID or POPULATION. = Data Set Name Member Type Engine Created Last Modified Protection Data Set Type Label Data Representation Encoding The SURVEYSELECT Procedure DATA V9 Selection Method Simple Random Sampling Input Data Set Random Number Seed Sample Size Selection Probability Sampling Weight Output Data Set WORK.MYCOVID19 10/17/2023 17:57:18 10/17/2023 17:57:18 COVID19 2124308 500 0.359195 2.784 MYCOVID19 The CONTENTS Procedure SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX IA64 utf-8 Unicode (UTF-8) Observations 500 Variables 26 0 Indexes Observation Length 224 Deleted Observations 0 NO Compressed Sorted NO Data Set Page Size Number of Data Set Pages First Data Page 1 1 Max Obs per Page 584 Obs in First Data Page 500 Number of Data Set Repairs 0 Filename Release Created Host Created Inode Number Access Permission Owner Name 131072 File Size File Size (bytes) /saswork/SAS_workCB87000081CD_odaws02-usw2.oda.sas.com/SAS_work1F7B000081CD_odaws02-usw2.oda.sas.com/mycovid 19.sas7bdat 9.0401M7 Linux 2663537 rw-r--r-- u63571857 256KB 262144 Engine/Host Dependent Information # Type 12 ADULT_LIMITED_ENGLISH Num 13 ADULT_SPANISH_LENG Num 6 AREA_LAND Num 7 AREA_WATER Num 4 CENSUS_TRACT Num 5 COUNTY Char 25 DEATHS Num 26 Date Variable Alphabetic List of Variables and Attributes Len Format 8 BEST12. BEST32. Informat 8 BEST12. 8 BEST 12. 8 BEST12. 8 BEST12. BEST32. BEST32. BEST32. BEST32. $9. BEST32. $24. BEST32. BEST12. BEST32. 9 $9. 8 BEST12. 24 $24. 8 BEST12. 8 8 BEST12. BEST32. 2 GEOID 24 HOSP_NO 23 HOSP_YES Char Num Num Num Variable # 12 ADULT_LIMITED_ENGLISH Num 13 ADULT_SPANISH_LENG Num 6 AREA_LAND Num 7 AREA_WATER Num 4 Num 5 COUNTY Char 25 DEATHS Num Char Num Num Num Num Num Num Char Num Num Num Num Num Num Num Num Num Num Char Alphabetic List of Variables and Attributes Type Len Format Informat 8 BEST12. BEST32. 8 BEST12. BEST32. 8 BEST12. BEST32. 8 BEST12. BEST32. 8 BEST12. BEST32. 9 $9. $9. 8 BEST12. CENSUS_TRACT 26 Date 2 | GEOID 24 HOSP_NO 23 HOSP_YES 20 HOUS_NOINTERNET 19 HOUS_NOSMARTPHN 11 HOUS_NO_VEH 1 ID NEGATIVE POPULATION 22 8 10 POP_65P 14 POP BELOWPOV 15 POP_DISABILITY 18 POP_HEALTHINS 9 POP LT18 16 POP_MEDICAD 17 POP_MEDICARE 21 POSITIVE 3 State BEST32. $24. $24. BEST 12. BEST32. BEST12. BEST32. 8 BEST12. BEST32. 8 BEST12. BEST32. 8 BEST12. BEST32. BEST32. 8 BEST12. 4 $4. $4. BEST32. 8 BEST12. 8 BEST12. BEST32. BEST 12. BEST32. 8 BEST12. BEST32. 8 BEST12. BEST32. 8 BEST12. BEST32. 8 BEST12. BEST32. 8 BEST12. BEST32. 8 BEST 12. 8 BEST12. BEST32. BEST32. $11. 11 $11. 24/nstudent note : i have the code and answers also you just need to seed it with my ID and give it back run the code with my id and give it back