Airbnb Seattle data investigation

Arunkumar
5 min readJul 1, 2020

Introduction

Airbnb,‌ ‌I‌n‌c‌.‌ is a company based in San Francisco that operates an online marketplace and hospitality service. It allows people to lease or rent short-term lodging including holiday cottages, apartments, homestays, hostel beds, or hotel rooms, to make reservations at restaurants etc.

I have heard a lot about an Airbnb and thus when Udacity provided a challenge to write a blog post, I tried to see that data from an analyst’s perspective. I was curious about how we could use data science techniques to find trends within Airbnb data.

I am interested in answering these questions during this project :

  • How the best price is achieved on Airbnb ?
  • How good reviews are achieved?
  • Is it possible to make an accurate predictive model for listing price based on machine learning?

let’s now dive into the main questions :

Question 1: How the best price is achieved on Airbnb?

We answer this question by plotting a heatmap of correlations and we focus on the 7th row (price).

Overall dataset correlation heatmap (refer to 7th Row — price)

Accommodation’s linear relation with Price (at least from 1 to 8 rooms)

We can see from the correlation heatmap as well as the “accommodates vs price” chart that the main driver for the price is how many people the listing can accommodate. The relationship is almost perfectly linear. Other features that are latent features of accommodation (gives more or less the same information) such as a number of bedrooms, beds, bathrooms and guests included are also highly correlated. From the cell above we can also see that the average price for listings varies a lot for different neighbourhoods. Magnolia, Downtown and Queen Anne are the three most expensive areas, while Northgate, University District and Delridge are the cheapest ones. Amenities such as TV, internet, washing machine and air-con are also important.

There is a negative correlation to a number of reviews which is probably due to some bad reviews for most listings that have been reviewed.

So to summarise the key drivers according to this sample of data would be :

  • You should accommodate as many people as possible in order to get the best price on Airbnb.
  • The listings need to be in the right area and its also important to get good reviews but not as important as to accommodate as many as possible.
  • Having a TV and a parking spot is also advised; Install a washer and air condition

Question 2: How good reviews are achieved?

As mentioned above, the room-related features like the capacity of people would drive the prices. Similarly, having amenities which has a cost attached drive more prices. But one of the factors that might not influence pricing but can push the demand and general likeability of the property is, what others say?

In the section, we look into potential ways of achieving better reviews.

Review score mapped vs. Host Response Rate

From the correlation heatmap, we can see that the most important factor for getting good reviews is to respond to all requests that guests have(host response rate), which isn’t too surprising. We also mapped Review score vs. Host Response Rate (higher the rate, better the swiftness is the response)

It also seems that hosts with many listings get worse reviews than those who have 1 or few. One could think that having many listings could imply a worse response rate due to the increased amount of inquiries, but there is a very small correlation between response rate and listings count. I don’t have data to support this theory, but I believe that listings by hosts with many other listings are perceived as simple and sterile with focus on maximizing economic profits while listings from people who live in the same apartment are perceived as more cosy and warm.

There is a small correlation between bathrooms, price and review score. I believe this is due to standard. Higher standard listings get better reviews. limitations on maximum nights are bad for review score and availability is relevant.

So to get the best reviews, you should:

  • Not have too many listings
  • Respond on every inquiry you get
  • Don’t have limitations on nights to stay
  • Include amenities like parking space, TV and Internet
  • High availability (this one you might disregard if you want to make money :) )

Question 3: Is it possible to make an accurate predictive model for listing price based on machine learning?

I experimented with three different machine learning algorithms for this analysis: AdaBoost, Support vector machines and RandomForest.

In terms of preparing data for model building, while I have used imputation techniques but I have also trained the models on removing missing values to compare the performance.

In the end, Random forest turned out to be the most accurate machine learning algorithm for this task with an R2-score of 0.80, meaning that the model can explain 80% of the price. The dataset contained little information about standard besides amenities, and I believe that a substantial amount of the remaining unexplained variance of the price is related to the standard.

Summary

I found that the main driver for prices are how many people a listing can accommodate. Location is also very important as well as amenities such as TV, parking, Internet and air condition.

In order to get good reviews it vital to respond to every inquiry and dont have limitations on maximum nights. those who list many apartments gets worse reviews than those who only list their own. Amenities are also important for good reviews.

I also found that a Random Forest algorithm can predict AirBnB prices with an accuracy of 80%. I believe alot of the remaining variance in the price can be explained by the listings standard which is not a feature in the dataset.

Refer a link to view full project details: https://github.com/Arunk-rbs/MachineLearning

--

--