- The Augmented Advantage
- Posts
- 10x Faster Data Exploration (Part 1)
10x Faster Data Exploration (Part 1)
Learn how to use AI to automatically identify key influencers in your data.
Read time: 5 minutes
Hey there,
Scanning large data sets for relevant patterns is usually a very time-consuming and tedious process.
Today, you'll learn a technique that helped my 10x my data exploration process with the help of AI.
For this use case, we'll leverage some of the basic AI/ML capabilities that are capable of identifying complex patterns in data at scale, and go well beyond manual data combing.
Let's start!
If you like this content, check out my book AI-Powered Business Intelligence for a more detailed walkthrough of today's use case and many more!
Problem
Suppose we're analysts at a fictitious manufacturing company. Sales management has contacted us because they've noticed that sales seem to be slowly recovering after a period of sharp declines. Above all, they want to understand:
Why did sales plummet so dramatically between 2006 and 2010? (Phase A)
What factors explain the slow recovery between 2010 and 2014? (Phase B)
And yes, this is indeed a manufacturer with long sales cycles, and no, we don't know why they didn't ask this question earlier.
Since we also have tons of other work to do, we want to get to initial insights as quickly as possible.
Solution Overview
Our goal is to automatically find patterns in our dataset by scanning all available information with as little human assistance as possible. To this end, we'll use AI/ML-powered techniques to deliver these insights quickly and interactively.
Take a look at the high-level use case architecture:
For this use case, we'll again rely on the built-in features of Power BI. Therefore, we don't need to call an external AI service in the analysis layer, but can solve everything locally, in the user front-end (user layer). Two tools from Power BI would help us with this (Key Influencer and Decomposition Tree). Today we'll rely only on the Key Influencer tool.
Note: Although we are looking at Power BI, other BI platforms provide similiar capabilities. For Tableau, check Einstein Discovery.
The Key Influencer visual helps understand the aspects that influence a particular metric by showing the top contributors to the selected metric. We'll use this tool to find out which factors drove the (rising and falling) revenue trends
Here's how it works in Power BI:
Walkthrough
If you want to follow along with this example, then download the Sales & Marketing sample PBIX.pbix file linked in the resources below.
About the dataset
The dataset contains various sales and marketing data from a fictitious manufacturing company. For our case study, let's assume that all manufacturers in this dataset belong to one holding group so that we can analyze the total revenue as a metric of interest. You can learn more about the dataset from this Power BI website.
Open the Sales & Marketing sample PBIX.pbix file in Power BI Desktop or Power BI Service. (If you're new to Power BI, you might find this resource helpful: Tour the report editor in Power BI.)
Analyzing the negative revenue trend (Phase A)
Let's begin by examining Phase A (decreasing revenue trend). First, create a chart showing the total revenue by year between 2006 and 2010. If you like, try to use the Q&A visual, typing:
> sum of revenue by year between 2006 and 2010.
You should get a chart similar to this one:
Now select this chart and open the Visualizations pane on the right. Click the "Key influencers" icon to convert the line chart into the key influencers visual :
Once you select the new visual type, the chart will automatically update to something that should look similar to this:
Now, this output is probably not what you expected at all and definitely looks a bit weird.
What went wrong?
Open the Visualizations pane and inspect the properties of the key influencers visual. This should look like this:
In the visual properties you'll find two crucial fields: "Analyze" and "Explain by"
As the names suggest, "Analyze" refers to the metric you want to examine, and "Explain by" refers to the different dimensions you could consider to have an impact on the metric.
In our previous line chart, Power BI suggests explaining the dimension "Year" by the field "Sum of Revenue", which makes absolutely no sense. (See, you can't leave everything to the machines.)
To change that, drag "Sum of Revenue" from the "Explain by" field to the "Analyze" field instead of the "Year" field.
From a business perspective, useful candidate fields for "Explain by" include Segment, City, State, Region, Average of Score, Category, and Manufacturer. Add these dimensions to the "Explain by" field by dragging them from the data fields repository on the right until it looks like this:
Note: We left out the Product dimension because too many data points are missing. For example, some products have been replaced by others, some products have just been introduced, and some products were deprecated. If we want to analyze key influencers down to the product level, we should look at a single year rather than a four-year period.
Let's inspect the key influencer chart again. It should look something like this now (make sure you toggle the dropdown menu at the top to "Decrease"):
So this visual intuitively makes more sense than the previous version. Note that this would be on the report page you could share or publish for business users to work with the data themselves.
Let's take a look at this visual to see how it works in detail.
On the left side of the visual, we see which variables Power BI has identified as key influencers of the given metric (in this case, Revenue ).
In our case, we can see that Power BI has identified the attribute "Youth" across the dimensions "Category" and "Segment" as the most important influencing factors leading to a decrease in the Revenue metric.
On the right side, we can see the underlying data distribution for the selected influencer, which shows the average of the Revenue metric across all attributes of the selected dimension (in this example, the Category dimension).
Depending on whether the variable is categorical or continuous, this chart will adapt. So instead of averages we could also see statements like "the outcome z is x times more likely when y happens".
Note: The Key Influencer tool doesn't necessarily consider the lowest value of a dimension as an influencer.
For example, let's take the influencer "Manufacturer is Victoria" in our analysis. If you click on this influencer, you'll see this chart:
As you can see, Victoria is not the manufacturer with the smallest revenue on average, but Salvus.
Now why hasn’t Salvus been identified as a key influencer?
To find out, let's do a quick head-to-head comparison of Salvus and Victoria. Here's how their revenues look like in the period we observed:
As you can see - albeit having the lowest revenue on average - the actual trend of Salvus' revenues isn't negative, but positive. While Victoria's revenues decreased from $6.2 million in 2006 to $4.3 million in 2010, Salvus actually increased its revenues between 2009 and 2010.
That's why Victoria was correctly identified as a key influencer for the declining overall revenue trend, whereas Salvus was not.
AI-powered techniques like the key influencer tool make it easier for us to find such interesting patterns without combing through all of the data manually.
With most of the work done for this exercise, we can now pass the report to our sales colleague to manually review the suggestions and circle back with them if needed.
Now it's your turn!
Take a look at the suggested key Influencers yourself. Can you can spot any other interesting patterns?
Next week, we'll look at how we can analyze Phase B - the increasing revenue trend - with some other little AI assistance.
See you again next Friday!
PS: Did you like today's use case? Hit reply and send "useful" so I can craft more content like this.
Resources
This use case was adapted from my book AI-Powered Business Intelligence (O’Reilly). You can read it in full detail here: https://www.aipoweredbi.com
Reply