Page 846
12.1 PROJECT 1: DETERMINING THE PRICE OF AIRBNB
NIGHTLY RENTALS IN NEW YORK CITY
LO 12.1
Propose the appropriate price to charge for Airbnb rentals using the SOAR analytics model.
Airbnb (Exhibit 12.2) is an online marketplace that connects people who would like to rent an available room (or home) from
someone who owns a room or home. Airbnb is the broker, or intermediary, that matches those wishing to rent with those who have the
accommodations. Airbnb offers listings in over 81,000 cities in 191 countries and has about 150 million users.¹
airbnb
Abob
wwww.
Lenovo Exhibit 12.2
Daniel Krason/Shutterstock
In this assignment, we do not give you the same level of detailed lab instructions found in the rest of this textbook. Instead, we provide
general directions. From there, you decide what needs to be done and the best ways to perform the analysis.
Specify the Question
Page 847
There are many potential questions, but the general question we want to answer is "What predicts the price for a nightly Airbnb rental in
New York City?"
Imagine that you're a homeowner in New York City and you are trying to decide how to set prices for a nightly rental. Which factors are
important? Which are not? What should you include in your model to predict the nightly rental price? Here are some factors you might
consider:
•
Location: Is it close to Central Park or the Empire State Building? Is it in Manhattan or the Bronx?
•
Size: How large (or small) is the rental unit?
•
Furnishings: Are they old and dilapidated, or shiny and new?
•
Privacy: Do you need to share the space with the renter, or will the renter have complete privacy?
•
Reviews: If past reviews are strong and the rental unit is popular, can you charge more for it?
⚫ Day of the week: Can you charge more on weekends than on weekdays?
•
Obtain the Data
Now that we've specified the question, it is time to consider which data are available to help us perform the analysis.
The data for this analysis are available at kaggle.com at this website (Exhibit 12.3): https://www.kaggle.com/dgomonov/new-york-city-ai
rbnb-open-data. Kaggle, which is a subsidiary of Google, is an online community of data scientists and machine learning practitioners. It
allows users to publish data sets, build models, and work with other members of the community.
New York City Airbnb Open Data
Airbnb listings and metrics in NYC, NY, USA (2019)
Ogomonov updated 2 years ago (Version 3)
2459
§ 205 0522
Exhibit 12.3 Kaggle Site with the New York City Airönd Open Data
Source: https://www.kaggle.com/datasets/dgomonov/new-york-city-airbnb-open-data. Accessed 8/2/2022.
Browse the site to get some ideas of what you can do with the data. There is plenty of information about the sample, much of it relevant
to the question we have specified.
Step 1: Download the Data and Review the Data Set
Download and open the file named "AB_NYC_2019.csv" in Excel and browse the data.
the data. (You may need to sign in using a Google account and agree to Kaggle's terms of use.)
Page 848
Exhibit 12.4 is the data dictionary describing
Exhibit 12.4 Data Dictionary for Airbnb Listing Data in New York City
(Source of Data: https://www.kaggle.com/datasets/dgomonov/new-york-city-airbnb-open-data. Accessed 8/2/2022.
Variable Name
id
name
host_id
host_name
neighbourhood_group
neighbourhood
latitude
Column
A
B
C
D
E
F
G
H
1
J
K
minimum_nights
L
num be rofrevi e ws
M
N
0
P
calculated host listings count
availability_365
longitude
room_type
price
last_review
reviewspermonth
Variable Explanation
listing id
name of the listing
host id
name of the host
broad location
more specific neighborhood
latitude coordinates
longitude coordinates
listing space type (entire house, private room, or shared room)
price in dollars
minimum rental nights required
how many reviews are made?
date of most recent review for this listing
reviews made within the last month
count of host listings
number of days available for rental per year
Ad Mail - Hinson, Laura - Outlook X M Question 1- Capstone (PLO) -(×
12.1 Project 1: Determining The X
P Complete | The Princeton Revie X
G Geeker
Describe Your Tasl
-ui.prod.mheducation.com/epub/sn_4b45/data-uuid-06b511943c284e41b4ed2cbdbbc281d2
र
!!!
E
< 263 of 922 >
Because we are trying to determine the appropriate nightly rental rate, price is our dependent (outcome) variable.
Step 3: Identify Potential Independent (or Explanatory) Variables
The first step in determining the best data to use is to consider the various explanatory, or independent, variables. Which variables are
most likely to explain the price charged for an Airbnb rental in New York City? Recall that the price of one night's rental is the
dependent (outcome) variable that we are trying to explain (or predict). As an analyst, it is your job to identify the variables that are most
likely to explain the price. Look over the data dictionary to determine which are likely to be the best candidate explanatory variables.
Deliverable 1:
Once you have identified the variables that may explain the price of an Airbnb rental, prepare a document (in Microsoft Word)
and save it as "Project 1, Deliverable 1: Airbnb Price Analysis First Name and Last Name" (inserting your first and last name).
Title the first section of the document "Dependent and Independent Variables". Next, within the document, identify and type the
names of the dependent (outcome) variable and at least five proposed independent variables that you think explain price.
Page 849
Step 4: Clean and Transform the Data in Preparation for Analysis
To prepare the data for analysis, complete this four-step process:
1. Because some of the listings have no date of last review, they are missing data. To remedy this problem and still be able to include the
listing in the analysis, replace date of last_review with March 28, 2011. Color the font in red so you know you inserted those data.
2. There are also missing data for reviews_per_month. To remedy this problem, insert the value of zero (0) if reviews_per_month is
missing. Color the font in red so you know you inserted those data.
3. Calculate a variable called EntDum (a dummy variable that can carry the value of 1 or 0, which equals 1 if "room_type" = "Entire
House" and 0 if "room_type" = "shared room" or "private room"). To do so, sort the database by room_type and insert a variable and
name it "EntDum".
4. The borough of Manhattan in New York is an important area for tourism, and it may well command a higher price for Airbnb rentals
than accommodations in the other four boroughs (Brooklyn, Queens, Staten Island, and the Bronx). Create a new Manhattan dummy
variable (a dummy variable that can carry the value of 1 or 0) that equals 1 if neighborhood_group = "Manhattan" and zero (0)
otherwise.
Aa
↑ Page 850
Analyze the Data
Now that you have obtained the data, you can begin your analysis.
Step 1: Hypothesize Links Between Dependent/Outcome Variable (Price) and the Independent
Variables You Chose
What do you hypothesize are the links between the dependent (outcome) variable of price and the independent variables that might
predict it? Look through the list of variables in the data dictionary. Which ones predict the price of Airbnb rentals, and are they positively
or negatively related to price?
Step 2: Select and Run Desired Analytics Techniques to Evaluate Relationships
What types of analysis will help show a relationship between price and the various independent variables? Consider the following:
•
•
Pivot tables: Create a pivot table of price (dependent variable) and each independent variable you proposed (Excel: Insert > Pivot
Table). After running the pivot analysis, do you still believe there is a relationship between the dependent variable and each
independent variable you proposed?
Correlations: Using the Data Analysis ToolPak (Excel: Data > Analysis > Data Analysis > Correlation), run a correlation between price
(dependent variable) and a few independent (now transformed) variables one by one. After running the correlation analysis, do you
still believe there is a relationship between the dependent variable and each independent variable you proposed?
Regression analysis: Using the Data Analysis ToolPak (Excel: Data > Analysis > Data Analysis > Regression), run a regression of price
(dependent variable) on your proposed independent variables. (For additional description of regression analysis, see the statistics
review in Chapter 3.)
SOARing to Success
Data Tip
Recall that Excel requires you to group the independent variables together in side-by-side columns before you run a