Use Case Breakdown: Portfolio Optimization Using AutoML and Linear Programming

How predictive and prescriptive analytics fit together, using credit risk optimization as an example

Hey there,

Today I'll walk you through a powerful machine learning use case for credit risk portfolio optimization.

The approach blends predictive analytics (to assess credit default risk) with prescriptive analytics (to selectively approve profitable loans within given budget and risk constraints).

What may seem like a super complex task can be surprisingly simplified with structured data preparation, no-code AutoML, and solid Excel skills.

Let's see how it works!

Problem Statement

Credit risk management is far from simple. It entails more than just sizing up the risk tied to individual borrowers.

Some key challenges include:

  • Unraveling risk indicators

  • Analyzing large volumes of data

  • Navigating regulatory requirements

  • Striking a balance between risk and return

Let's say we're risk analysts at a bank.

Our job is to approve or reject loan applications from small business owners.

We have a pool of applicants, each with a unique profile, credit history, and loan amount.

We can't just choose the loans with the lowest risk because we work under certain constraints:

  • We have to consider the overall risk profile of our loan portfolio.

  • We have to make sure we don't exceed our lending budget.

  • We need to keep the overall risk of the portfolio below a certain threshold to meet regulatory standards and the bank's risk appetite.

(If you thought machine learning was all about predicting the right number, welcome to the world of business logic!)

The question is, how do we determine the optimal mix of loans to approve to maximize return while staying within acceptable risk limits?

Enter the power of AutoML for risk prediction and linear programming for portfolio optimization.

Let's have a look at how this can be done.

Solution Overview

The proposed solution consists of three main layers as shown in the following figure: the User layer, the Data layer, and the Analysis layer.

The User layer is the front-end application we can interact with. In our case, we'll use a simple Excel spreadsheet. It shows the predicted risk for each loan application and allows us to select which loans to approve while staying within the defined constraints. The tool will automatically calculate the total risk and return of the portfolio based on the analyst's selection.

In the Data layer, we collect, prepare, and store all the data we need for our analysis. This includes a historical data set with labels for credit defaults and other attributes about our customers. This data is used to train our AutoML model. And there is an inference dataset consisting of new loan applications. This is the data we actually need to make predictions on.

The Analysis layer is divided into two parts:

  1. A predictive part where we use Azure AutoML to create predictions for the default risk for each loan application in the inference dataset.

  2. A prescriptive part where, given the risk predictions, the loan amounts, and our constraints (budget, maximum loan per client, maximum total risk), we will find the combination of loans that maximizes return while staying within acceptable risk limits. In our example, we'll use the Excel Solver tool to do this.

This layered approach allows us to leverage advanced machine learning and optimization techniques in a user-friendly Excel interface.

A Note on Tool Selection

While this example uses Excel and Azure AutoML for simplicity and relatability, please remember that in a production environment, different considerations would guide the choice of tools.

In a real-world scenario, you'd likely select tools that align with your enterprise architecture and satisfy regulatory governance requirements.

Excel and Azure AutoML are great for illustrating the process, but there's a wide range of powerful platforms and technologies available for these tasks. The fundamental takeaway is the combined power of machine learning and optimization for credit risk management, not the specific tools used here.

Solution Breakdown by Layer

Let's dive deeper into each layer and understand how we can leverage them to make data-driven decisions.

Data Layer

The Data Layer starts with collecting the necessary data.

For our historical training dataset, we gather data from past loan applications, including customer information and whether the loan defaulted (label).

Here’s an example of how this training dataset could look like with “Status” being the label and everything else being the feature variables.

Our inference dataset for new loan applications would look just the same, except that we don't have a label ("status") because we don't yet know whether a loan will default or not.

After gathering the data, we need to clean it up so it's reliable and consistent. This involves removing duplicate entries, handling outliers, etc. It's essential that the data is in a good state before we move on to the next steps.

I can’t stress it enough: Most work actually falls into the data preparation phase!

Analysis Layer

There are two key parts at work here: Risk prediction with AutoML and portfolio optimization with Linear Programming.

The Predictive Part: Risk Assessment with AutoML

We begin by uploading the cleaned historical dataset to Azure AutoML (or any other one you prefer). Start a new AutoML job to predict the labels “default” or “ok” based on all other variables (Classification using Supervised Machine Learning)

Once the model is trained, we use it to predict the default risk for each loan in our inference dataset.

To make the whole process more transparent, we can use explainability features in Azure ML studio to track the individual factors that led to a particular prediction made by the model.

The Prescriptive Part: Portfolio Optimization with Linear Programming

With our risk predictions in hand, we can now optimize our portfolio. For example, we could approximate the associated risk value per loan by multiplying the loan amount by the probability of default that AutoML gave us.

In Excel, we use the Solver function to find the best combination of loans to approve, considering our constraints as for example:

  • Max. risk total: Constant as per our requirement, e.g. 50,000$

  • Max. budget / loans: Constant as per our requirement, e.g. 100,000$

The Solver function uses Linear Programming techniques to maximize or minimize an objective (in our case, expected return) subject to certain constraints (budget, maximum total risk, etc.) as shown in the screenshot below:

User Layer

Finally, we have the User Layer, where our risk analyst interacts with the results. The Excel gets the scored risk data set, for example using Power Query, and gives us a spreadsheet that shows the predicted risk along with the expected return based on the loan amount and interest rate.

The "Approved" column allows the analyst to flag loan applications that should be included ("1") or excluded ("0") from the current portfolio, starting from the solver's recommendations.

This allows the analyst to override the suggestions made by the Solver if necessary and make an informed decision while staying within the defined constraints.

Outcome

The fruit of our hard work is a powerful tool that delivers two major outcomes:

1. Understanding Risk Factors

The AutoML model provides us with insights into the factors that contribute to credit risk. This allows us to identify patterns and trends that can help us make better lending decisions in the future.

2. Optimized Credit Portfolio

With the combination of more accurate risk predictions and Linear Programming, we can optimize our credit portfolio according to our constraints. We can maximize return while keeping risk under control. This leads to more profitable lending decisions and a more resilient portfolio.

The tool also opens up opportunities for further analysis:

  • Identify which loans contribute the most to portfolio risk.

  • Assess the impact of changing constraints on the portfolio.

  • Make data-driven decisions on lending policy.

Overall, this approach provides valuable insights and actionable recommendations to enhance portfolio management and drive profitability. It puts the power of advanced analytics into the hands of the business user, enabling them to make more informed and strategic decisions.

Limitations

While the approach we've outlined is powerful, it's important to remember that it does come with certain limitations:

  1. Data Quality: The effectiveness of our AutoML model relies heavily on the quality of our historical data. If there are inaccuracies or inconsistencies in this data, it may lead to incorrect risk predictions.

  2. Solver Limitations: The Excel Solver has its limitations in terms of the size and complexity of the problem it can handle. For larger portfolios, a more robust optimization tool (like Python!) is necessary.

  3. Static Constraints: In this setup, our constraints (interest rates, budget, maximum loan per client, maximal total risk) are static. However, in reality, these may change over time and need regular updating.

Despite these limitations, the outlined approach still provides a solid starting point for optimizing a credit risk portfolio. It's a flexible model that can be expanded and adjusted to fit specific needs.

Conclusions

By following the architecture and steps outlined above, you can transform a complex, manual process into an automated, data-driven one.

This approach allows you to make more informed decisions about which loans to approve, keeping in mind the overall risk profile of your portfolio and various constraints. It also provides you with a better understanding of the risk factors associated with your portfolio, enabling you to make proactive adjustments as necessary.

I encourage you to test this solution with your own data and experience.

If you want to get the Excel spreadsheet that I created for this use case, just reply to this email and I’ll send it to you.

How did you find today's use case?

I'd love to hear your thoughts!

See you next Friday,

Tobias

Want to learn more? Here are 3 ways I could help:

  1. Read my book: Improve your AI/ML skills and apply them to real-world use cases with AI-Powered Business Intelligence (O'Reilly).

  2. Book a meeting: Let's get to know each other in a coffee chat.

  3. Follow me: I regularly share free content on LinkedIn and Twitter.

Reply

or to participate.