Search for question
Question

/n Instructions You will be using the Real Estate data set to build a model to predict what a house should sell for. This model will be used by a real

estate agency to help their clients understand what their house should sell for so they can make an educated decision about listing price. Secondarily, the model will be used by a home contractor. S/he would like to be able to tell clients the selling value of adding an additional bathroom. Part 1 of the project involves the first three steps in the data mining process: sample, explore and modify. You will be preparing the data for model building, which will be done in Part 2 of the project next week. You will need to make decisions regarding data that is in text form, missing data, potentially incorrect data, the inclusion of potential outliers, binning strategy and variable transformation. Please make sure your decisions are justified. Note that the specific requirements and relative weights are outlined in the rubric. Compute descriptive statistics on all of the continuous variables. Briefly discuss. Compute the frequencies of ALL of the categorical variables. Briefly discuss. Run a correlation table. Discuss at least 3 correlations. The descriptive statistics, frequencies and correlation table should be professionally presented on separate, labeled tabs/worksheets. There is quite a bit of discussion in this assignment. Rather than putting the discussion in Excel, it is preferred that you prepare this assignment as a Word (or PDF) document and include the relevant Excel output as figures. You are required to submit both the Word/PDF file as well as your Excel file. Please note only the Word/PDF file will be graded. I am asking you submit the Excel file in case I have a question about your work. Again, only the the Word/PDF file will be graded so it should contain all the requirements. The Excel file will only be opened if necessary. Different people will make different decisions which may ultimately impact the model they develop. While there are wrong things you could do (like using a 0 for all missing values), there is not one "right" answer. Make sure you document and justify the decisions you make. It is fine (perhaps even ideal) to note decisions that were made because this is an academic project. For example "Given the appropriate resources, I would have ___ to get the missing values for Lacking the resources for this option, I elected to recognizing that this decision _______." I suspect there will be lots of questions this week as you start working independently with the data. Just a reminder that questions can be posted in "Ask the Instructor" Discussion Forum so that other classmates can participate and will also have the benefit of our discussion.