My Reviews Are Better Than Yours!

Comparing Azure Cognitive Services vs. Open AI GPT3 for analyzing customer reviews

Read time: 8 minutes

Hey there,

In this post, I'll show you how to use AI to analyze customer reviews and identify issues customers are complaining about.

The early readers of this newsletter may remember a very similar post I wrote a few months ago. A lot has happened since then. For example, there is now chatGPT.

Today we compare how a "classic" text analytics AI service like Azure Cognitive Service (which I used in my earlier post) would extract keywords from customer reviews, and how a Large Language Model like Open AI's GPT-3 would.

The goal is to figure out which AI service is better suited for the task. In the end, we'll have two dashboards to compare the services like this:

Let's go!

Problem Statement

Let's say we're business analysts for a hotel that accepts bookings through online booking platforms such as Expedia, Booking.com or Agoda.

Our hotel receives reviews from customers after their stays, and we want to extract customer complaints from these reviews to improve the customer experience.

Here is an excerpt from some of the hotel reviews we have received:

The way reviews were collected was that customers could give an overall rating (score) and had the option to fill in two text boxes - one for a positive review (PositiveReview) and one for a negative review (NegativeReview). We also receive other descriptive information, such as when they visited the hotel, what type of room they booked, and whether they traveled with a group.

With this data, even our basic dashboard (without AI) gives us a lot of insights. For example, we can view the average rating score by booking date (line chart) and compare it to the total number of bookings (bar chart). The table on the bottom shows us the list of the reviews. With a page filter we can even view the ratings for a specific room type only.

This gives us a good overview of what is going on at a high level. Nevertheless, the information isn't yet actionable because we don't yet know what exactly the customers are complaining about.

That's where AI will help us.

So let's dive in!

Solution Overview

We'll use two services - Microsoft Azure Cognitive Services for Language and Open AI GPT 3 (text-davinci-003) - and compare how well they can extract keywords from reviews, which we can then analyze in our BI.

Of course, we'd end up choosing only one of these AI services, but to compare the results, let's try them both!

Here is the overall architecture of the use case:

Let's walk through this piece by piece!

Solution Breakdown

Here's a step-by-step guide to our implementation:

Data Layer

To analyze the reviews, we need access to the data. There are several approaches to do this. The simplest would be to collect the reviews manually, which of course doesn't scale well.

If we want to extract the reviews from the major booking platforms, we can either build a web scraper or rely on a service that provides this data. I won't go into too much detail here and trust your knowledge when using your favorite search engine to find "[your platform] reviews web scraper" or "[your platform] reviews api".

For our case study, we'll use a Booking.com review dataset from Kaggle (see resources below).

To keep things simple, we analyze reviews for only a single year - otherwise, we leave the data largely untouched.

Analysis Layer

In the analysis layer, we have a small Python script that takes each booking review (or in the case of Azure - batches of booking reviews to speed things up) and sends that to the different AI services:

With regards to Azure Cognitive Services for Language, we use the Keyphrase Extraction service. Given an input text, this service returns a list of keywords that are contained in that text. We don't need to perform sentiment analysis here, because the sentiment is already given by the context ("negative review text field"). We'll see later how problematic this can become.

Regarding Open AI GPT-3, we take the text-davinci-003 model for text completion with a specific prompt for keyword extraction. This way we get a list of keywords.

So, in both cases, we should get a list of the most important keywords / key phrases for each review, which will eventually give us an overview of which topics or keywords were most frequently mentioned in the context of negative reviews.

Let's take a look at the results:

Azure Cognitive Services for Language

The AI service overall does a pretty good job at retrieving keywords. For example, given the review "There was no hair dryer" it will return the keyword "hair dryer".

That's what we would expect.

However, if a user wrote:

"I liked everything"

Azure Cognitive Services would (correctly) extract the keyword "everything".

But in this case, obviously the customer is not complaining about everything but has nothing to complain!

It's essentially a data problem because the text is in the wrong input field.

Handling these things automatically with Azure Cognitive Services becomes problematic - but thanks to the implementation in our BI, which we'll see later on, it's easy for us to spot these errors.

Open AI GPT-3

Let's see how well Open AI's Davinci model performs in keyword extraction. As this model is essentially a text completion model, we need to design a prompt that gives us a list of keywords given some input text.

We can even provide more context such as that we're looking for complaints. In this case, I used the following prompt:

"Extract the complaints as keywords from this negative review:"

[review_text]

"Format output as a comma separated list."

This prompt worked pretty well. For example, the review:

"It takes too long hot water to reach tap/shower in the room" 

results in the following list of keywords:

  • Room

  • hot water

  • tap/shower

  • long

Which is a pretty descent representation of the problem.

Also, when a user writes "Everything was perfect!" the AI service would return "No keywords" which is more in line with what we want compared to "Everything".

However, due to the nature of the model, the results may vary. While the Azure Cognitive AI service results are deterministic, the results from Open AI will vary here and there.

The final outputs of the analysis layer will be 3 files:

  • The original list of review where each review was enriched by an ID

  • The list of keywords extracted by Azure for each review ID

  • The list of keywords extracted by Open AI for each review ID

This allows us to build a simple relational model in our BI tool to analyze and explore the data quickly.

User Layer

The user layer is quite simple:

We take the original review data plus the extracted list of keywords with their corresponding review IDs and feed everything into our BI system of choice.

In Power BI, for example, we can then create a data model which links the review IDs from the keywords to the review IDs from the original table:

This way we can easily cross-filter reviews according to their contained keywords and vice versa.

So how do both services compare?

Let's take a look at the output!

First, let's inspect the keywords extracted by Azure Cognitive Services:

As you can see from the horizontal bar charts on the right, the most popular keywords were "hotel", "room", "everything" and "airport", followed by "price" and "noise".

Without diving deeper into the analysis, it seems that customers complained about the room and also the noise, probably (?) caused by the nearby airport.

Thanks to our relational data model we could now select one of these keywords, e.g. airport and see the corresponding reviews in the table at the bottom.

And - surprise - it's actually not (only) the airport noise that people complain about, but also that the airport isn't as close to the hotel as advertised. Here you can see how important it's to look at reviews / keywords in context, and that's exactly what we can do with this dashboard.

On to OpenAI!

This is essentially the same dashboard as before, only this time we're using the Open AI keywords table. The top 3 keywords are "nothing", "no keywords", and "airport". Looking at the actual reviews, we see that the large language model did a good job of identifying the reviews that didn't actually have any complaints and spotted the topics "airport", "noise" and "room" as the main criticisms from customers.

Just like in the example above, the dashboard allows us to cross-filter these reviews. For example, if we select reviews received in the low season, we see that customers complained mostly about the cold rooms, the lack of restaurants, and the smell in the bathrooms, while in the high season, on average, there wasn't much to complain about.

We could continue this exercise by filtering by other criteria such as room or group types, but I'll leave that as an exercise for you! :)

You can recreate the entire demo using the resources linked below.

Conclusion

We saw that both AI services did a pretty good job of identifying key phrases in customer reviews, which helped us better understand what customers were complaining about and attribute those complaints to time, customer groups, or other descriptive criteria, which helped us gain actionable insights.

Azure's Cognitive AI service managed to produce reliable and deterministic results for keyword extraction. However, it wasn't able to consider more context, such as whether it was really a negative review.

Open AI's GPT-3, on the other hand, allows us to query more specifically for complaint keywords, but at the same time, results can vary when we rerun the model, and the list of generated keywords can also become quite large.

It's up to you which AI service you want to use - or whether it makes sense to rely on a combination of both.

Personally, I'm leaning towards the Open AI solution, as it works comparably well to Azure Cognitive Services, but is much cheaper!

I hope this use case has given you a good overview and idea of how you can use AI services to gain more insights from what your customers are trying to tell you.

Did you like it? Hit reply and let me know what you found most helpful!

See you next Friday!

Best,

Tobias

Resources

Want to learn more? Here are 3 ways I could help:

  1. Read my book: If you want to further improve your AI/ML skills and apply them to real-world use cases, check out my book AI-Powered Business Intelligence (O'Reilly).

  2. Book a meeting: If you want to pick my brain, book a coffee chat with me so we can discuss more details.

  3. Follow me: I'm regulary sharing free content on LinkedIn and Twitter.

Reply

or to participate.