Mining data from hotel booking sites
In order to analyze patterns and trends about 5-star hotels tourists in Barcelona, we wanted to obtain the maximum number of reviews on 5-star hotels. Taking into account the seasonal peculiarity of the tourism, we had to cover 12 months of the year, including summer, winter, Easter or Christmas.
First issue: on what webs should we search data? We first listed the main booking sites. Analyzing them, we realized that, some of them use the review system of Tripadvisor. Therefore, analyzing Tripadvisor we covered 50% of the major Internet booking sites. We decided to fill it with Hotels.com, and Booking.com, and to better cover the Spanish market, we also scraped Atrapalo.com, a major source in the Spanish booking market (27.7% of Barcelona’s visitors in 2010 came from the rest of Spain).
For the period between January 1st 2010 and December 31st 2010, we obtained more than 6,260 reviews: mostly ratings on different variables, the country of origin of the tourists and the type of trip.
Keeping in in mind that the official statistics of Tourism Barcelona, are based on 4,900 interviews, and taking into account that 13.6% of tourists who stayed at Barcelona stayed at 5-star hotels, the sample of 5-star hotels tourists in that survey would be limited to 637 interviews. With the methods of fetching web reviews, we have managed to multiply the sample almost 10 times. In fact, in statistics terms, we have achieved a margin of error of 1.2% (6,260 reviews over a universe of 880,000 tourists at 5-star hotels in 2009) against 1.4% given in official surveys.
Now it’s time to visualize and analyze such data.
Related posts: Barcelona’s 5-star hotels tourists: how the idea came