Auto-Detect Anomalies In Time Series Data Using ML

Automatically identify data points that are much higher or lower than expected

Read time: 8 minutes

Hey there,

This week, I'm back with another use case for AI in analytics.

Today you'll learn how to plug an AI anomaly detection service into your BI dashboards so you can automatically identify data points that are higher or lower than expected.

That's how our final dashboard will look like and we'll use Power BI and Azure Anomaly Detector to build this:

Let's go!

Problem Statement

Let's say we work for an airline and our job is to monitor the taxi-out times of our planes.

Taxi-out time is the time between the plane leaving the gates and actually taking off.

It's the responsibility of airport operators to keep them low. High taxi-out times lead to flight delays.

Natural fluctuation of the taxi-out time at airports is normal, but too many peaks can lead to recurring delays. If they are too high, we would need to have a word with the airport.

The main problem is, what is an abnormal point?

The dashboard below shows the average daily taxi-out time for a single month for seven high-traffic airports.

  • The dashed blue line is the average taxi-out time for each airports

  • The solid red line is the 90-percentile for taxi-out for each airport

Currently, all daily taxi-out averages that exceed the 90-percentile threshold are flagged and considered anomalies that need further investigation.

However, this approach does have limits:

  • The 90-percentile mark is static and does not consider any trends that are happening in the data

  • It is sensitive to extreme values, meaning one day of high average taxi-out values can raise the bar to unnecessary heights.

  • It is tedious and time-consuming to look at these charts manually and flag problematic events on a case-by-case basis.

Therefore, the team is looking for an overall improved approach of identifying peak values for taxi-out times for these airports.

Here's how we tackle this:

Solution Overview

Our goal is to make the overall prediction of abnormal data points more dynamic and customized for each airport.

To achieve that, we'll use the following high-level use case architecture:

Anomaly detection use case architecture

The analysis layer is simple: We use Azure Anomaly Detector as an off-the-shelf AI model that's been pre-trained for us so we can use it directly.

Here's how Azure's Anomaly detector works:

  1. Analyze a series of events (values) over a period of time (timestamp)

  2. Calculate dynamic upper and lower bounds for the expected value

  3. Flag all values that exceed these bounds as positive (above the upper bound) or negative (below the lower bound) anomalies.

We can access this service from the user layer via Power BI by running a small Python or R script that prepares our historic flight data from the data layer in the form the AI service needs it and calls an API endpoint on Azure.

This way, we can integrate the enhanced version of the taxi-out anomaly detection report into our dashboard.

Note: Power BI and Azure ML can also be integrated natively, without R or Python programming. However, this requires a Pro or Premium Power BI licence. Also, the R/Python script gives you more flexibility if you want to connect another AI/AutoML service that's not hosted on Azure.

Setting up Anomaly Detection on Azure

Setting up an Anomaly Detection resource on Azure is simple: Visit portal.azure.com, search for Cognitive Services, and in the Decision section you will find the card “Anomaly detector.” Hit "Create":

After specifying your Azure subscription, resource group and geographic region of the service, you can give it a descriptive name which must be globally unique; think of it as a subdomain for the final endpoint.

Next is pricing. The free tier gives you 10 calls per second and 20,000 transactions per month. Otherwise, costs are $0.314 per 1,000 transactions.

When the service deployment is complete, you can see the keys and endpoints you need to access the service (- and no, it's typically not a good idea to share them in a newsletter):

Now we’re all set to start making prediction requests from our newly set-up AI service.

If you want to follow allong, check out the resource links below!

Getting Model Predictions with Python or R

In the resources below, you will find the files azure-anomaly-detection-flights.r and azure-anomaly-detection-flights.py. You can use either to follow along.

The script consists of five sections:

Section 0: Loads the required packages and lets you specify your custom keys and endpoints.

Section 1: Contains the function for making the inference request, essentially turning tabular data into a JSON object.

Section 2: Handles the data preparation for bringing data from Power BI into the right format.

Section 3: Submits the inference request for each airport and fetches the results.

Section 4: Joins the new information back the original Power BI data and returns the updated table.

Calling AI Service From Within Power BI

In our Power BI dashboard file Anomaly_Detection.pbix, we can open Power Query to simply paste the R/Python code here as an additional data processing step.

But be careful though - we can't simply pass Date/Time columns from Power Query to R/Python as they come in a proprietary Datetime format. That's why I have converted Datetimes to string before adding the R/Python script in the example above.

Once you run the steps including the script, you will see the additional columns in your table: upperMargin, lowerMargin, isPositiveAnomaly, and isNegativeAnomaly.

We're particularly interested in the isPositiveAnomaly variable because it indicates when an airport has longer taxi-out times than usual.

The heavy lifting is done.

Now, we can proceed to the dashboard and make our predictions visible for report users.

Building the Dashboard in Power BI

With the new columns, we can simply update our existing dashboard and change the red line to be the dynamic upperMaring instead of static 90-percentile threshold.

And further than that, we can assign a marker to all data points that have a isPositiveAnomaly flag of 1.

This way, the red line indicates the decision threshold for flagging an anomaly, and red squares highlight taxi-out anomalies that were identified accordingly.

Here's the resulting dashboard for two airports (DFW and LAK) as examples:

You can clearly see that two days of unusually high taxi-out times were flagged in the case of DFW and a whopping eight days in the case of LAX.

Obviously, this was a pretty busy month - even by Los Angeles standards.

With the isAnomaly attribute in our dataset, we can now easily generate further analysis or automated reports for this metric.

To find out more about how the anomaly detection is working in detail, how to run inference results in real time, and general best practices around this API, check out the “Best Practices for Using the Anomaly Detector” resource by Microsoft.

Conclusion

With our AI-powered approach, we helped the operations team detect anomalies faster without adhering to a fixed set of rules.

Of course, we need to carefully monitor the anomaly detection service, so I always like to visualize the decision threshold. Hence, people can still check whether it's appropriate.

I hope you were able to see how automated anomaly detection can add value to your dashboards.

Feel free to replicate this exercise using the resources below.

I hope you enjoyed this use case!

See you again next Friday!

Resources

AI-Powered Business Intelligence Book Cover

This content was adapted from my book AI-Powered Business Intelligence (O’Reilly). You can read it in full detail here: https://www.aipoweredbi.com