The Augmented Advantage
Posts
Extract customer pain points from free-text hotel reviews

Extract customer pain points from free-text hotel reviews

Applied sentiment analysis with keyword extraction

Tobias Zwingmann
September 16, 2022

Read time: 5 minutes

Hey there,

In today's edition, we'll take a look at the third (and for a while last) use case related to turning unstructured data into structured data.

Today, we'll not only apply sentiment analysis to hotel booking reviews, but also find out what exactly customers are complaining about.

Let's go!

Problem

Suppose we're working as data analysts for a large hotel. Management is overwhelmed by the variety of customer feedback. Are there really problems that need to be addressed? And if so, what're they?

Our data source is a sample of text files with customer feedback that the hotel has collected over time through its website and booking portals.

Each file contains one review and the file name carries the timestamp of this review.

Hotel management wants to know what the customers' main pain points are and be able to dive deeper if needed.

Let's find out!

Solution Overview

Take a look at the high-level use case architecture:

The analysis layer is pretty simple: In order to analyze many text files automatically, we will use an NLP AI service.

As in the previous use cases, this AI service doesn’t need any training; instead, we can send data to the service and get a response right away (synchronous operation).

In this use case, our goal is to extract information about whether customer feedback is negative or positive, which is called sentiment analysis. On top of that, we also want to find the word phrases that triggered these emotions.

Finally, we want to display this information in a BI dashboard, in our case using Power BI (user layer).

The most difficult part is the data layer. First, we need to get the data in the right form, send it to the AI service, and then retrieve the results in a structured, tabular form for our BI system to process.

To accomplish this, we'll build a small data processing pipeline in Python (you can find the script in the resources below).

Walkthrough

Analysis Layer

First things first - let's activate the text analysis AI service of your choice. I chose Microsoft Azure Cognitive Services for Language, but feel free to try alternatives in the resources below. As a result we get an API endpoint and credentials.

Data Layer

Then, wel'll set up our small data preprocessing script that will essentially:

Read plain-text files from a file
Send the file contents to the AI service API with our credentials
Collect the results, transform them, and store them in a structured data object
Export the files as flat tables so they can be easily consumed by our BI software

Reading the files and sending the content to the AI service API is easy, but making sense of the results can become a bit tricky. Here's a sample output of what the response of the AI service looks like:

As you might notice from this preview output, the structure of the results is nested.

The AI service provides not only the sentiment score for each text, but also a detailed breakdown for opinions on a word level for each text that was analyzed.

For example, you can see from the sentence Basement room are pretty noisy (despite the grammar mistake) was recognized by the AI as a negative sentiment and that the AI service was able to identify the target room and the assessment noisy as the main drivers behind this negative opinion.

Isn’t that pretty?

This is useful for the interpretation of the data, but on the downside it creates some hassle for us to untangle the whole object and convert it back into some nice flat tables.

The code needed for this data wrangling can mostly be copy-pasted from the AI services’ Github repo. I just made few adjustments:

I extracted only the total sentiment scores per review and saved them as a flat CSV with reference to the original filename and extraction of the timestamp
I created a flat CSV file each for all positive and negative terms or opinions found in the text documents, again stored with the reference to the original filename so these terms can be linked back to their original context and also possibly filtered by time.

By the end of this process, we should have 3 CSV files - one for the overall sentiment, and one each for negative and positive target keywords.

User Layer

Let's visualize and synthesize our results in our Power BI (or any other BI tool).

First, we'll load in the CSV files and add them to our data model:

We will then combine the relationships between the three tables based on the filename column.

This way we can relate the different targets back to their original file (review) and the overall sentiment of that review.

Before we build the dashboard, let’s quickly recap the situation: management wants to know if something is going on that should be on their radar.

To get a high-level overview about the customer reviews, we need four elements:

The development of customer sentiment over time to check for trends
A list of items that customers complain about (negative targets)
A list of items that customers like (positive targets)
A reference back to the original data so we get more context

We could show this information in many ways.

I decided to build a report consisting of the following visuals:

a line chart containing the overall trend,
two treemaps highlighting the positive and negative targets,
a simple table that lists all the customer feedback with plain text.

As a result, the dashboard looks like this:

(Feel free to download the PBIX file from the resources below and play around with it!)

So what does this dashboard tell us?

For one, we can see that both the positive and negative sentiments seem to have a somewhat steady trend, with daily ups and downs.

If we aggregate the visual to a monthly level, the picture becomes a bit clearer:

In the month-to-month comparison, the negative sentiments seem to have increased strongly.

When we examine the reasons for this, we find that most of the complaints are about the rooms, breakfast and Wi-Fi.

While the rooms are difficult to fix in the short term, the breakfast and WiFi offer clear action items that can be quickly addressed by management.

We can even dive deeper:

If we want to find out exactly what customers complain about in terms of rooms, we simply click on Rooms in the Negative Targets treemap and find the related original reviews in the table below.

This is a great way to quickly explore the data set and find out more about actual customer feedback.

Even though this dashboard only touches the tip of the text analytics iceberg, I hope you've seen how powerful it can be to analyze text data at scale.

How did you like this week's use case?

Hit reply and let me know!

See you next Friday!

Resources

AI Services

Data & Code

Dashboard

Power BI file

AI-Powered Business Intelligence Book Cover

This use case was adapted from my book AI-Powered Business Intelligence (O’Reilly). You can read it in full detail here: https://www.aipoweredbi.com

Reply

or to participate.