Critical Point

Leveraging Generative AI for Effective Risk Management Part II: Actionable Strategies for Businesses

2023-09-26T00:00:00+10:00

In today’s rapidly evolving business landscape, risk management is more critical than ever before. As industries grow increasingly complex and interconnected, the need for sophisticated strategies to identify, assess, and mitigate risks has become paramount. Enter generative artificial intelligence (AI), a revolutionary technology that has the potential to transform the way businesses design and implement risk controls. In this article, we’ll explore how businesses can harness the power of generative AI to enhance their risk management practices, backed by actionable tips and real-life case studies.

This post is part of a series. The link for the part I can be found here.

Understanding Generative AI in Risk Management

Generative AI involves using algorithms to create new, original content based on patterns and data it has learned from. In the context of risk management, generative AI can be employed to simulate scenarios, model potential risks, and design effective controls. Here’s how businesses can put this innovative technology to work:

1. Identifying Emerging Risks

Generative AI can analyze massive datasets from various sources to identify emerging risks. By recognizing subtle patterns and correlations, businesses can stay ahead of potential threats. For instance, a financial institution could use (generative) AI to analyze market trends, news sentiment, economic indicators to predict potential financial crises, and detect policy infringements suck as financial crime.

2. Scenario Modeling

Generating realistic risk scenarios is crucial for preparing effective risk controls. Generative AI can simulate a wide range of scenarios, helping businesses understand the possible impacts of different risks. This approach empowers businesses to design controls that are agile and responsive. For example, financial company could use generative AI to simulate scenarios on how the market could react for specific types of news and develop strategies to ensure stability.

3. Designing Tailored Controls

Generative AI can assist in designing controls that are tailor-fitted to a business’s unique risk profile. By considering multiple variables and data points, AI can suggest controls that are both effective and efficient. An insurance company could employ generative AI to create customized policies for its customers.

Actionable Tips for Businesses

Implementing generative AI for risk management requires a strategic approach. Here are some actionable tips for businesses looking to harness its potential:

1. Data Quality Matters

Generative AI thrives on data. To ensure accurate risk assessments, gather high-quality, relevant data. Clean, comprehensive data sets will enhance the AI’s ability to generate meaningful insights.

2. Collaboration is Key

Engage a cross-functional team to work with the AI system. Risk management involves input from various departments, each with unique insights. Collaborative efforts will lead to more robust risk controls.

3. Human Oversight and Interpretation

While generative AI is powerful, human expertise is irreplaceable. Interpret AI-generated insights through the lens of domain knowledge experts to make informed decisions.

Real-Life Case Studies

Let’s examine how leading businesses have successfully integrated generative AI into their risk management strategies:

Case Study 1: Proactive Supply Chain Management

A global retailer employed generative AI to analyze supply chain data and identify potential disruptions. The AI-generated scenarios allowed the company to optimize inventory levels and establish backup suppliers, minimizing the impact of unforeseen events.

Case Study 2: Healthcare System Enhancement

A healthcare provider used generative AI to predict patient admission surges. By considering factors like weather, disease outbreaks, and historical admission data, the hospital developed staffing strategies to handle influxes efficiently, ensuring optimal patient care.

Conclusion

Generative AI is a game-changer in risk management, enabling businesses to anticipate, plan for, and mitigate a wide array of risks. By harnessing its capabilities, organizations can design tailored controls, model scenarios, and make informed decisions to secure their future. Embrace the power of generative AI, and elevate your risk management practices to new heights.

Remember, while generative AI offers incredible insights, it’s essential to balance its outputs with human judgment. The marriage of cutting-edge technology and human expertise will undoubtedly drive businesses toward more effective risk management in this dynamic world.

Leveraging Generative AI for Effective Risk Management Part I: Introduction to Business Risk

2023-09-01T00:00:00+10:00

In life, there are certainties like death and taxes, but there’s one more constant: risk. The COVID-19 pandemic starkly reminded us of this fact as we grappled with evaluating and reevaluating personal risks with each wave of the pandemic. Businesses face similar challenges, and their ability to manage and mitigate risk plays a crucial role in their success.

This post is part of a series. The link for the part II can be found here.

The Origins of Business Risk

Businesses encounter risk from both external and internal sources. External factors like inflation, supply chain disruptions, geopolitical shifts, climate-related disasters, competition, reputation issues, and cyberattacks can significantly impact an organization’s plans. Internally, poor leadership decisions or unauthorized disclosures of sensitive information can also pose risks. Yet, perhaps the most dangerous risk is missing opportunities for innovation and growth.

The modern era is marked by frequent shocks related to socioeconomic, economic, and climate factors. In 2019 alone, there were 40 weather-related disasters causing over $1 billion in damages each. To stay competitive, organizations must adopt flexible risk management strategies, which involve forecasting new threats, recognizing shifts in existing threats, and forming comprehensive response plans. While there’s no magic formula to navigate crises, a well-structured risk management strategy can shield an organization from critical disruptions.

Understanding Risk Management

Risk management involves identifying, handling, and mitigating threats through various approaches and activities. After recognizing a risk, organizations develop measures to reduce its potential impact. While eliminating risk is ideal, other methods include loss mitigation (like insurance) and redundancy (using backup systems to prevent data loss during outages).

The Three Key Elements of a Comprehensive Risk Management Strategy

A proactive risk management plan comprises three critical components:

1. Detecting Risks and Addressing Vulnerabilities

Organizations must maintain a proactive stance by analyzing how risks might evolve over time, handling systemic risks, and identifying new risks that may emerge.

2. Evaluating Risk Tolerance

Companies should define risk tolerance levels that align with their values, strategies, capabilities, and competitive landscapes. This involves reassessing risk profiles, rejecting some risks unequivocally, and considering the effectiveness of control mechanisms.

3. Choosing a Risk Management Approach

Organizations must decide how to respond when confronted with new risks. This decision-making process should involve leaders from various departments and adapt to changing circumstances.

Developing Adaptable Risk Management

Effective risk management is crucial for survival, especially during severe or abrupt risks. Here are five actions leaders can take:

Reframe the vision for risk management: Set clear goals, define risk levels, and engage in conversations with business leaders to foster well-informed decision-making regarding risk versus reward.
Establish agile risk management procedures: Form cross-functional teams with the authority to make swift risk management decisions.
Leverage data and analytics: Digital tools and data can enhance risk management efforts, providing better insights and predictions.
Cultivate future-ready risk expertise: Equip risk managers with fresh competencies and knowledge to understand evolving risks.
Strengthen risk culture: Foster an organizational mindset that responds swiftly to threats.

The Role of Scenarios in Grasping Uncertainty

Scenario planning helps leaders turn abstract hypotheses into narratives that depict plausible future scenarios. This offers advantages like expanding thinking, identifying likely futures, safeguarding against groupthink, and challenging conventional wisdom.

Insights on Risk in Financial Institutions

According to chief risk officers (CROs), banks face heightened exposure to rapidly evolving market dynamics, climate change, and cybercrime. While the pandemic’s impact on nonfinancial risk is expected to diminish, climate change is anticipated to become a substantial concern. Cybercrime remains a top risk for financial institutions.

Understanding Cyber Risk

Cyber risk encompasses potential digital losses, including financial, reputational, operational, productivity, and regulatory aspects. It can also manifest as physical world losses, such as damage to operational equipment. Cyber threats, like privilege escalation, vulnerability exploitation, and phishing, create the potential for cyber risk.

A Risk-Based Cybersecurity Approach

A risk-based cybersecurity approach prioritizes risk reduction over achieving a specific level of cybersecurity maturity. It focuses on addressing the most critical vulnerabilities effectively. Steps include integrating cybersecurity into enterprise risk management, evaluating vulnerabilities across people, processes, and technology, comprehending threat actors, and monitoring risks against risk appetite.

Prudent Investments in Risk Management

To manage high-consequence, low-likelihood risks (or “big bets”), organizations must prioritize existential threats. A two-by-two risk grid assesses the impact of an event against the certainty of that impact. Investments aimed at safeguarding value propositions can enhance an organization’s resilience.

In a constantly changing world, managing risk effectively is not just about creating plans; it’s about regularly evaluating and updating them to remain relevant and resilient.

The Traveling Salesman Problem

2023-08-29T00:00:00+10:00

The Traveling Salesman Problem (TSP), a captivating conundrum in mathematical optimization, has seamlessly integrated itself into a multitude of industries, reshaping the way we approach efficiency and problem-solving.

The problem consists of finding the path that minizes the overall traveled distance between locations such that all locations are visited only once.

In the realm of logistics, the TSP takes center stage, offering a beacon of optimization for supply chains and delivery routes. Giants like Amazon leverage TSP-inspired algorithms to orchestrate last-mile deliveries, aligning packages with real-time traffic data and delivery priorities. This dynamic approach shaves miles off routes, minimizes fuel consumption, and ensures prompt deliveries.

Manufacturing, too, bows to the TSP’s prowess. Within bustling factories, where precision and organization reign supreme, the TSP guides the sequencing of production steps. Whether in the assembly of intricate automobiles or the manufacturing of various goods, the TSP minimizes idle time, reduces bottlenecks, and ultimately enhances productivity.

Surprisingly, the TSP extends its influence beyond the realms of logistics and manufacturing. The world of DNA sequencing benefits from its route-optimizing abilities, accelerating genetic research by reducing sequencing time and costs. Additionally, the TSP finds its place in the intricate landscape of circuit design, optimizing signal propagation in integrated circuits and exemplifying the synergy between mathematics and engineering.

Modules

import pulp
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
import math
import networkx as nx
import itertools

%load_ext watermark
%watermark -n -u -v -iv -w

Last updated: Mon Aug 28 2023

Python implementation: CPython
Python version       : 3.11.1
IPython version      : 8.12.0

matplotlib: 3.7.2
networkx  : 3.1
numpy     : 1.25.2
pulp      : 2.7.0

Watermark: 2.3.1

Data

The data used for this exercise is a dictionary containing the name of the cities and distances between them.

cities = {
'name': []
,'distance': []
}

# We will generate synthetic coordinates for n_cities
n_cities = 4
M = np.random.rand(n_cities,n_cities)
cities['distance'] = (M + M.T)/2
for i in range(n_cities):
    cities['name'].append(f'city{i}')
    cities['distance'][i, i] = 0
cities

{'name': ['city0', 'city1', 'city2', 'city3'],
 'distance': array([[0.        , 0.80122558, 0.50666924, 0.13568748],
        [0.80122558, 0.        , 0.64279003, 0.31393952],
        [0.50666924, 0.64279003, 0.        , 0.51592309],
        [0.13568748, 0.31393952, 0.51592309, 0.        ]])}

We use networkx to plot the graph generated by the distance matrix.

fig, ax = plt.subplots(figsize=(20, 10))
G = nx.DiGraph(cities['distance'])
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, ax=ax);
nx.draw_networkx_labels(G, pos, ax=ax);
nx.draw_networkx_edges(G, pos, ax=ax, width=2);

Formulation

For each $i,j=1,\ldots,n$, let $x_{ij}$ be a binary variable defined as 1 if there exists a path between city $i$ to $j$ and 0 otherwise.

x = pulp.LpVariable.dicts("x", [(i, j) for i in range(len(cities['name'])) for j in range(len(cities['name'])) if i != j], cat='Binary') 

Minimize the distance $d_{ij}$ between cities $i$ and $j$

\[\min\sum_{i=1}^n\sum_{j\neq i,j=1}^nd_{ij}x_{ij}\]

# Define the TSP problem
prob = pulp.LpProblem("TSP", pulp.LpMinimize)

# Define the objective function
prob += pulp.lpSum([cities['distance'][i, j] * x[(i, j)] for i in range(len(cities['name'])) for j in range(len(cities['name'])) if i != j])

\[x_{ij}\in\{0,1\}\quad i,j=1,\ldots,n\] \[u_i\in\mathbb{Z}\quad i=1,\ldots,n\]

For each city $j$, the salesman must arrive exactly one time

\[\sum_{i=1,i\neq j}^n x_{ij}=1\quad j=1,\ldots,n\]

For each city $i$, the salesman must leave exactly one time:

\[\sum_{j=1,j\neq i}^n x_{ij}=1\quad i=1,\ldots,n\]

for i in range(len(cities['name'])):
    prob += pulp.lpSum([x[(i, j)] for j in range(len(cities['name'])) if i != j]) == 1
    prob += pulp.lpSum([x[(j, i)] for j in range(len(cities['name'])) if i != j]) == 1

Subtour elimination constraint—ensures no proper subset $Q$ can form a sub-tour, so the solution returned is a single tour and not the union of smaller tours

\[\sum_{i\in Q}\sum_{j\neq i,j\in Q}^n x_{ij}\leq|Q|-1\quad \forall Q\subsetneq\{1,\ldots,n\},|Q|\geq2\]

for k in range(len(cities['name'])):
    for S in range(2, len(cities['name'])):
        for subset in itertools.combinations([i for i in range(len(cities['name'])) if i != k], S):
            prob += pulp.lpSum([x[(i, j)] for i in subset for j in subset if i != j]) <= len(subset) - 1

Solver

Here, we used the PuLP solver to obtain a solution to the formulated problem.

# Solve the problem using the CBC solver
prob.solve(pulp.PULP_CBC_CMD())

# Print the status of the solution
print("Status:", pulp.LpStatus[prob.status])

# Print the optimal objective value
print("Total distance traveled:", pulp.value(prob.objective))

Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /home/docker.datascience/.pyenv/versions/3.11.1/lib/python3.11/site-packages/pulp/solverdir/cbc/linux/64/cbc /tmp/111feaee2b8746bebc1aed7c856d7248-pulp.mps timeMode elapsed branch printingOptions all solution /tmp/111feaee2b8746bebc1aed7c856d7248-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 29 COLUMNS
At line 138 RHS
At line 163 BOUNDS
At line 176 ENDATA
Problem MODEL has 24 rows, 12 columns and 72 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Continuous objective value is 1.59909 - 0.02 seconds
Cgl0004I processed model has 18 rows, 12 columns (12 integer (12 of which binary)) and 60 elements
Cbc0038I Initial state - 0 integers unsatisfied sum - 0
Cbc0038I Solution found of 1.59909
Cbc0038I Before mini branch and bound, 12 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (0.06 seconds)
Cbc0038I After 0.06 seconds - Feasibility pump exiting with objective of 1.59909 - took 0.00 seconds
Cbc0012I Integer solution of 1.5990863 found by feasibility pump after 0 iterations and 0 nodes (0.06 seconds)
Cbc0001I Search completed - best objective 1.5990862673485, took 0 iterations and 0 nodes (0.06 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from 1.59909 to 1.59909
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
ZeroHalf was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)

Result - Optimal solution found

Objective value:                1.59908627
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.07
Time (Wallclock seconds):       0.07

Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.08   (Wallclock seconds):       0.08

Status: Optimal
Total distance traveled: 1.5990862673485606

Solution Analysis

Let’s understand what the solution is.

The minimal tour is the minimal distance necessary to visit all the nodes once.

print("minimal tour: ", prob.objective.value())

minimal tour:  1.5990862673485606

The followin code extracts the route of the optimal solution.

# Extract the solution 
solution = []
start_city = 0
next_city = start_city
while True:
    for j in range(len(cities['name'])):
        if j != next_city and x[(next_city, j)].value() == 1:
            solution.append((next_city, j))
            next_city = j
            break
    if next_city == start_city:
        break

# Print the solution
print("Route:")
for i in range(len(solution)):
    print(str(solution[i][0]) + " -> " + str(solution[i][1]))

Route:
-> 2
-> 1
-> 3
-> 0

Plot Solution

Here, we can plot the solution as a graph. The optimal solution gives the distance from city $i$ (row) to city $j$ (column).

sol_matrix = np.zeros((len(solution), len(solution)))
for i in range(len(solution)):
    sol_matrix[solution[i]] = cities['distance'][solution[i]]
sol_matrix

array([[0.        , 0.        , 0.50666924, 0.        ],
       [0.        , 0.        , 0.        , 0.31393952],
       [0.        , 0.64279003, 0.        , 0.        ],
       [0.13568748, 0.        , 0.        , 0.        ]])

The following function allow us to draw this graph using the distances as weights on the vertices.

class Draw_Graph():
    def __init__(self, G, pos=None, arc: float = 0.25, edge_attribute: str='weight'):
        self.G = G
        self.arc = arc
        self.edge_attribute = edge_attribute
        self.pos = pos
    
    def draw_edges(self):
        self.curved_edges = [edge for edge in self.G.edges() if reversed(edge) in self.G.edges()]
        self.straight_edges = list(set(self.G.edges()) - set(self.curved_edges))
        nx.draw_networkx_edges(self.G, self.pos, ax=ax, edgelist=self.straight_edges, width=2)
        nx.draw_networkx_edges(self.G, self.pos, ax=ax, edgelist=self.curved_edges, connectionstyle=f'arc3, rad = {self.arc}', width=2)
        
    
    def draw_labels(self):
        self.edge_weights = nx.get_edge_attributes(self.G, self.edge_attribute)
        self.curved_edge_labels = {edge: self.edge_weights[edge] for edge in self.curved_edges}
        self.straight_edge_labels = {edge: self.edge_weights[edge] for edge in self.straight_edges}
        self.draw_networkx_edge_labels(self.G, self.pos, ax=ax, edge_labels=self.curved_edge_labels, rotate=False,rad = self.arc)
        nx.draw_networkx_edge_labels(self.G, self.pos, ax=ax, edge_labels=self.straight_edge_labels, rotate=False)
        
    @staticmethod
    def draw_networkx_edge_labels(
        G,
        pos,
        edge_labels=None,
        label_pos=0.5,
        font_size=10,
        font_color="k",
        font_family="sans-serif",
        font_weight="normal",
        alpha=None,
        bbox=None,
        horizontalalignment="center",
        verticalalignment="center",
        ax=None,
        rotate=True,
        clip_on=True,
        rad=0
        ):
        """Draw edge labels.

        Parameters
        ----------
        G : graph
            A networkx graph

        pos : dictionary
            A dictionary with nodes as keys and positions as values.
            Positions should be sequences of length 2.

        edge_labels : dictionary (default={})
            Edge labels in a dictionary of labels keyed by edge two-tuple.
            Only labels for the keys in the dictionary are drawn.

        label_pos : float (default=0.5)
            Position of edge label along edge (0=head, 0.5=center, 1=tail)

        font_size : int (default=10)
            Font size for text labels

        font_color : string (default='k' black)
            Font color string

        font_weight : string (default='normal')
            Font weight

        font_family : string (default='sans-serif')
            Font family

        alpha : float or None (default=None)
            The text transparency

        bbox : Matplotlib bbox, optional
            Specify text box properties (e.g. shape, color etc.) for edge labels.
            Default is {boxstyle='round', ec=(1.0, 1.0, 1.0), fc=(1.0, 1.0, 1.0)}.

        horizontalalignment : string (default='center')
            Horizontal alignment {'center', 'right', 'left'}

        verticalalignment : string (default='center')
            Vertical alignment {'center', 'top', 'bottom', 'baseline', 'center_baseline'}

        ax : Matplotlib Axes object, optional
            Draw the graph in the specified Matplotlib axes.

        rotate : bool (deafult=True)
            Rotate edge labels to lie parallel to edges

        clip_on : bool (default=True)
            Turn on clipping of edge labels at axis boundaries

        Returns
        -------
        dict
            `dict` of labels keyed by edge

        Examples
        --------
        >>> G = nx.dodecahedral_graph()
        >>> edge_labels = nx.draw_networkx_edge_labels(G, pos=nx.spring_layout(G))

        Also see the NetworkX drawing examples at
        https://networkx.org/documentation/latest/auto_examples/index.html

        See Also
        --------
        draw
        draw_networkx
        draw_networkx_nodes
        draw_networkx_edges
        draw_networkx_labels
        """
        if ax is None:
            ax = plt.gca()
        if edge_labels is None:
            labels = {(u, v): d for u, v, d in G.edges(data=True)}
        else:
            labels = edge_labels
        text_items = {}
        for (n1, n2), label in labels.items():
            (x1, y1) = pos[n1]
            (x2, y2) = pos[n2]
            (x, y) = (
                x1 * label_pos + x2 * (1.0 - label_pos),
                y1 * label_pos + y2 * (1.0 - label_pos),
            )
            pos_1 = ax.transData.transform(np.array(pos[n1]))
            pos_2 = ax.transData.transform(np.array(pos[n2]))
            linear_mid = 0.5*pos_1 + 0.5*pos_2
            d_pos = pos_2 - pos_1
            rotation_matrix = np.array([(0,1), (-1,0)])
            ctrl_1 = linear_mid + rad*rotation_matrix@d_pos
            ctrl_mid_1 = 0.5*pos_1 + 0.5*ctrl_1
            ctrl_mid_2 = 0.5*pos_2 + 0.5*ctrl_1
            bezier_mid = 0.5*ctrl_mid_1 + 0.5*ctrl_mid_2
            (x, y) = ax.transData.inverted().transform(bezier_mid)

            if rotate:
                # in degrees
                angle = np.arctan2(y2 - y1, x2 - x1) / (2.0 * np.pi) * 360
                # make label orientation "right-side-up"
                if angle > 90:
                    angle -= 180
                if angle < -90:
                    angle += 180
                # transform data coordinate angle to screen coordinate angle
                xy = np.array((x, y))
                trans_angle = ax.transData.transform_angles(
                    np.array((angle,)), xy.reshape((1, 2))
                )[0]
            else:
                trans_angle = 0.0
            # use default box of white with white border
            if bbox is None:
                bbox = dict(boxstyle="round", ec=(1.0, 1.0, 1.0), fc=(1.0, 1.0, 1.0))
            if not isinstance(label, str):
                label = f"{label:0.2f}"  # this makes "1" and 1 labeled the same

            t = ax.text(
                x,
                y,
                label,
                size=font_size,
                color=font_color,
                family=font_family,
                weight=font_weight,
                alpha=alpha,
                horizontalalignment=horizontalalignment,
                verticalalignment=verticalalignment,
                rotation=trans_angle,
                transform=ax.transData,
                bbox=bbox,
                zorder=1,
                clip_on=clip_on,
            )
            text_items[(n1, n2)] = t

        ax.tick_params(
            axis="both",
            which="both",
            bottom=False,
            left=False,
            labelbottom=False,
            labelleft=False,
        )

        return text_items
    
    def plot(self):
        self.__call__()
    
    def __call__(self):
        self.draw_edges()
        self.draw_labels()

The graph can be shown below

G = nx.DiGraph(sol_matrix)
pos = nx.spring_layout(G)
fig, ax = plt.subplots(figsize=(20, 10))
p = Draw_Graph(G, pos)
p.plot()
nx.draw_networkx_nodes(G, pos, ax=ax);
nx.draw_networkx_labels(G, pos, ax=ax);

References

[1] https://soumenatta.medium.com/solving-the-traveling-salesman-problem-using-pulp-in-python-edd23a6aee4d

[2] https://towardsdatascience.com/solving-geographic-travelling-salesman-problems-using-python-e57284b14cd7

[3] https://soumenatta.medium.com/solving-the-p-median-problem-using-pulp-in-python-31d9bc13cc2d

Are Model Performance Metrics Enough?

2023-07-05T00:00:00+10:00

My model has a “good enough” performance, is this sufficient for deployment? This post highlights the limitations of relying solely on performance metrics when assessing the readiness of a machine learning model for deployment. It emphasizes the importance of considering both correctness and performance as separate components in the evaluation process. By visualizing the relationship between correctness and performance in different regions, the blog post illustrates the need for critical evaluation and avoiding overconfidence in performance metrics. Furthermore, it emphasizes the impact of business assumptions on correctness and stresses the significance of scientifically-based decision-making.

Introduction

In general, we are inclined to assume that if the performance metrics of a model is above a certain threshold, then it is ready to be moved to the next stage of the development. As I show in this blog post, we must be more critical about this criterion, because performance metric only measures a part of a data science problem.

While working in a project in which the main deliverable is a machine learning model, the data scientist frequently needs to answer the question if the model performance is good enough. Implicitly, it is assumed that the metric chosen to measure the model performance also captures all the pieces of information about the of the process that generate the data and its characteristics such as distribution, dependencies etc. This assumption can be understood as the causal relationship: if the performance metrics is high, then the model is correct.

An illustration for the aforementioned implicit assumption is shown in the figure below

If such a causal relationship is true, then a high performance necessarily implies that the predictions obtained from the model correspond to th reality. To see why this assumption is strong, one can verify that a performance metric is not a sufficient condition for a good prediction, otherwise we would never face problems such as target leakage, and overfitting.

Perfomance x Correcteness

Since we know that the model performance metric does not encapsulate all the pieces of information that are necessary to decided if a model is good enough for deployment, we need to understand what we can and cannot decided based on the performance metric, and what pieces of information are missing to make that decision. I propose to decompose the question “Is the model performance metric good enough to deployment?” into two components:

Correcteness: How correct is the scientific methodology used to build the model?
Performance: How can I improve the model performance?

We also need to keep in mind that the performance metric does not necessarily increase with the more correct the model methodology is. For example, IT/coding issues can lead to poor model performance.

Based on the above, we can visualize the relationship between correcteness and performance in a Cartesian plane below

The regions can be described as follows:

Start Region. This is where the data science work normally starts. With poor knowledge of the data or the modelling methodology, the data scientist will make incorrect decision and the model built tends to have a poor performance.
Learning Region. In this region, the data scientist starts to learn about the problem, the data, and the model methodology. Here, one can see a large number of hypotheses being formulated and tested. Many of these, will guide the work towards a more correct methodology. The improvements will come with the correct implementation of these methodologies.
Deployment Region. Here, the work has enough quality to be put into operation, when evaluated in terms of correcteness and performance metrics.
Fool’s Region. In this quadrant, one’s lack of care with respect to the correcteness of methodology leads to think that the model is ready for deployment, because the performance shows a value that is better than the defined threshold.

The Dunning-Krueger Effect

The division in four regions allows us to see similarities with the Dunning-Krueger effect, a cognitive bias phenomenon that explains the difference between one’s perceived knowledge and one’s actual knowledge of a subject.

In a nutshell, the Dunning-Krueger effect states that our ability to perceive our knowledge does not grows linearly with the actual acquired knowledge: when we have “very limited knowledge” of a subject, we are inclined to think that we are “highly skilled” on this subject. For example, the majority of drivers believe that their driving skills are above average [citation needed]. The following figure illustrates the relationship between perceived knowledge and actual knowledge.

In our case, the fool’s region coincides with the top of graph, when the “actual knowledge is low”.

The role of business assumptions

It is worth to note that many business assumptions and/or decision will tend to have a negative impact on the model correctness, as these tend not to be scientifically based.

One example of a common business assumption that is overlooked is the dependency of time on the prediction. In a marketing campaign setting, a machine learning model is often used to identify the most suitable customers. Frequently, the model predicting the customer behaviour does not take into account previous customers reactions to similar campaigns.

Conclusion

In conclusion, relying solely on performance metrics to determine model readiness for deployment is insufficient. Evaluating correctness and performance as separate components is essential. The relationship between correctness and performance, visualized in different regions, emphasizes the need for critical evaluation and avoiding the “Fool’s Region” of overconfidence. Additionally, considering business assumptions and decisions that impact correctness is crucial. A comprehensive assessment of both correctness and performance metrics is necessary to make informed deployment decisions and ensure the model’s validity.

Using a Cost Functional to Optimize Hyperparameters Using Cross Validation

2023-04-20T00:00:00+10:00

This blog post discusses the importance of cost functions in mathematical optimization and how it applies to machine learning problems. The author argues that formulating the optimization of the performance metrics of a machine learning classifier in terms of a cost function is better than optimizing a single metric because it provides a more comprehensive and flexible framework for optimization, can help to address the trade-off between model complexity and performance, and can lead to better performance and generalization. An example of this formulation is provided for a binary classifier.

What is a cost function?

In mathematical optimization, a cost function is a mathematical function that represents the cost or objective to be minimized or maximized in a given optimization problem. The cost function defines the relationship between the input variables and the output values of the problem. In optimization problems, the goal is to find the input values that minimize or maximize the cost function, subject to certain constraints. The cost function plays a crucial role in defining the optimization problem and in guiding the search for the optimal solution. The choice of the cost function depends on the nature of the problem and the desired optimization criteria. In many real-world problems, the cost function may be a complex, nonlinear function that requires advanced mathematical tools and techniques to be analyzed and optimized.

Advantages of Using a Cost Functional to Optimize Hyperparameters Using Cross Validation

Formulating the optimization of the performance metrics of a machine learning classifier in terms of a cost function is better than optimizing a single metric for several reasons.

Firstly, machine learning problems often involve multiple metrics that need to be optimized simultaneously, such as accuracy, precision, recall, F1-score, and others. However, optimizing a single metric in isolation may not necessarily lead to the best overall performance of the classifier. For example, optimizing only for accuracy may lead to a model that performs poorly on a specific class or in a specific context.

Secondly, a cost function can provide a more comprehensive and flexible framework for optimization. By defining a cost function that combines multiple metrics and incorporates domain-specific constraints and preferences, we can optimize the model for a specific task and context in a more principled and efficient way.

Thirdly, a cost function can also help to address the trade-off between model complexity and performance. By penalizing complex models that are prone to overfitting, we can ensure that the model is not only accurate but also robust and generalizable.

Overall, formulating the optimization of the performance metrics of a machine learning classifier in terms of a cost function provides a more principled, flexible, and effective approach to model optimization that can lead to better performance and generalization.

Example

An implementation of this formulation is shown below for a binary classifier.

Import the required libraries.

import sklearn.metrics as sm
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from collections.abc import Iterable, Callable
import pandas as pd
import numpy as np

from abc import ABC, abstractmethod

Define an abstract cost function.

class CostFunction(ABC):
    """Abstract class for cost functions"""
    def __init__(self, metrics: Iterable[str], M: 'np.ndarray[float]') -> None:
        """_summary_

        Args:
            metrics (Iterable[str]): Iterable of strings of the form (metric_name).
            M (np.ndarray[float]): Positive definite matrix of size len(metrics).

        Raises:
            ValueError: _description_

        Returns:
            _type_: _description_
        """
        self.metrics = metrics
        self.M = M or np.identity(len(metrics))  # type: ignore
        self._check_positive_definite(self.M)
    
    @abstractmethod
    def functional(self, y_true: 'np.ndarray[float]', y_pred: 'np.ndarray[float]') -> float:
        """_summary_

        Args:
            y_true (np.ndarray[float]): Array-like of true labels of length N.
            y_pred (np.ndarray[float]): Array-like of predicted labels of length N.
        """
        pass
    
    @staticmethod
    def _to_array(y: Iterable[float]) -> 'np.ndarray[float]':
        return np.fromiter(y, float)
    
    @staticmethod
    def _check_positive_definite(M: 'np.ndarray[float]') -> None:
        if not np.all(np.linalg.eigvals(M) > 0):
            raise ValueError(f'Matrix {M} is not positive definite')

    def make_scorer(self) -> Callable:
        return sm.make_scorer(self.functional, greater_is_better=False)

    def __call__(self, y_true: Iterable[float], y_pred: Iterable[float]) -> float:
        y_pred_array = self._to_array(y_pred)
        y_true_array = self._to_array(y_true)
            
        return self.functional(y_true_array, y_pred_array)

Define a specific cost function for a classifier. The default performance metrics to optimize for are accuracy, f1, precision, recall, log loss and rocauc.

class ClassificationCostFunction(CostFunction):
    def __init__(self, metrics: Iterable[str], M: 'np.ndarray[float]' = None, metric_class_opt_val_map: dict[str, tuple[str, float]]=None, proba_threshold: float = 0.5):
        """Defines cost functional for optimization of multiple metrics. 
        Since this is defined as a loss function, cross validation returns the negative of the score [1].

        Args:
            metrics (Iterable[str]): Iterable of strings of the form (metric_name).
            M (np.ndarray[float]): Positive definite matrix of size len(metrics).
            metric_class_map (dict[str, str], optional): Dictionary mapping metric to class or probability of the form {'metric': 'class' or 'proba'}. Defaults to {}.
            proba_threshold (float, optional): Probability threshold used to convert probabilities into classes. Defaults to 0.5.
            
        References:
            [1] https://github.com/scikit-learn/scikit-learn/issues/2439
            
        Example:
            >>> y_true = [0, 0, 0, 1, 1]
            >>> y_pred = [0.46, 0.6, 0.29, 0.25, 0.012]
            >>> threshold = 0.5
            >>> metrics = ["f1_score", "roc_auc_score"]
            >>> cf = ClassificationCostFunction(metrics)
            >>> np.isclose(cf(y_true, y_pred), 1.41, rtol=1e-01, atol=1e-01)
            True
            >>> X, y = make_classification()
            >>> model = LogisticRegression()
            >>> model.fit(X, y)
            >>> y_proba = model.predict_proba(X)[:, 1]
            >>> cost = cf(y, y_proba)
            >>> f1 = getattr(sm, "f1_score")
            >>> roc_auc = getattr(sm, "roc_auc_score")
            >>> y_pred = np.where(y_proba > 0.5, 1, 0)
            >>> scorer_output = np.sqrt((f1(y, y_pred) - 1.0)**2 + (roc_auc(y, y_proba) - 1.0)**2)
            >>> np.isclose(cost, scorer_output)
            True
        """
        super().__init__(metrics, M)
        self.proba_threshold = proba_threshold
        self.metric_class_opt_val_map = metric_class_opt_val_map or {
            "accuracy_score": ("class", 1),
            "f1_score": ("class", 1),
            "log_loss": ("class", 0),
            "precision_score": ("class", 1),
            "recall_score": ("class", 1),
            "roc_auc_score": ("proba", 1),
        }
        
    def _to_class(self, array: 'np.ndarray[float]', metric: str) -> 'np.ndarray[float]':
        # sourcery skip: inline-immediately-returned-variable
        output = np.where(array > self.proba_threshold, 1, 0) if self.metric_class_opt_val_map[metric][0] == "class" else array
        
        return output
    
    
    def functional(self, y_true: 'np.ndarray[float]', y_pred: 'np.ndarray[float]') -> float:
        
        self._check_positive_definite(self.M)

        opt_values = np.array([self.metric_class_opt_val_map[metric][1] for metric in self.metrics])

        metric_values = np.array([getattr(sm, metric)(y_true, self._to_class(y_pred, metric)) for metric in self.metrics])

        return np.sqrt(np.dot(np.dot(metric_values - opt_values, self.M), metric_values - opt_values))
            

Run the code in a grid search strategy.

metrics = [
        "accuracy_score",
        "f1_score",
        "log_loss",
        "precision_score",
        "recall_score",
        "roc_auc_score"
]

param_grid = {"C": [0.5, 1]}

scorer = ClassificationCostFunction(metrics, proba_threshold=0.5)
cv = GridSearchCV(LogisticRegression(), param_grid, scoring=scorer.make_scorer())

X, y = make_classification()
cv.fit(X, y)
pd.DataFrame.from_dict(cv.cv_results_)

mean_fit_time	std_fit_time	mean_score_time	std_score_time	param_C	params	split0_test_score	split1_test_score	split2_test_score	split3_test_score	split4_test_score	mean_test_score	std_test_score	rank_test_score
0	0.009353	0.003661	0.008929	0.002612	0.5	{‘C’: 0.5}	-1.732076	-6.922296	-1.732076	-3.464335	-3.461615	-3.462480	1.895201	1
1	0.006416	0.000833	0.006427	0.000340	1	{‘C’: 1}	-1.732076	-8.654072	-1.732076	-3.464335	-3.461615	-3.808835	2.543282	2

Conclusion

In conclusion, cost functions play a critical role in mathematical optimization problems and are essential in guiding the search for the optimal solution. In machine learning problems, where multiple performance metrics need to be optimized simultaneously, using a cost function provides a more principled and efficient way to optimize the model for a specific task and context. Furthermore, it helps to address the trade-off between model complexity and performance, ensuring that the model is not only accurate but also robust and generalizable. By using a cost function to optimize performance metrics, machine learning practitioners can achieve better performance and generalization on their models, making it a valuable tool for model optimization.

Adding a Dark/Light Theme Switcher to Minimal Mistakes

2022-09-27T00:00:00+10:00

This is how I found out how to add a switcher to toggle between light and dark modes of minimal mistakes theme.

I followed the instructions posted by sohamsaha99 in this Github thread and copied here:

Edit _config.yml: There are going to be two themes. The first one is declared as usual. And for the second one, we create a new entry caled minimal_mistakes_skin2. So, add the following lines:

minimal_mistakes_skin: "default"
minimal_mistakes_skin2: "dark"

Create a file in your project directory in the location assets/css/theme2.scss and insert the following lines in the file:

---
# Only the main Sass file needs front matter (the dashes are enough)
---

@charset "utf-8";

@import "minimal-mistakes/skins/dark"; // skin
@import "minimal-mistakes"; // main partials

Modify the following line in file _includes/head.html from:

<link rel="stylesheet" href="/assets/css/main.css">

<link rel="stylesheet" href="/assets/css/main.css" id="theme_source">

and just after that line, add the code:

  <link rel="stylesheet alternate" href="/assets/css/theme2.css" id="theme_source_2">
  <script>
    let theme = sessionStorage.getItem('theme');
    if(theme === "dark")
    {
      sessionStorage.setItem('theme', 'dark');
      node1 = document.getElementById('theme_source');
      node2 = document.getElementById('theme_source_2');
      node1.setAttribute('rel', 'stylesheet alternate'); 
      node2.setAttribute('rel', 'stylesheet');
    }
    else
    {
      sessionStorage.setItem('theme', 'light');
    }
  </script>

The names light and dark are generics of skin1 and skin2. These strings have nothing to do with the actual skin names.

Add an icon next to navigation. In _includes/masterhead.html find { % if site.search == true % } and above that add:

  <i class="fas fa-fw fa-sun" aria-hidden="true" onclick="node1=document.getElementById('theme_source');node2=document.getElementById('theme_source_2');if(node1.getAttribute('rel')=='stylesheet'){node1.setAttribute('rel', 'stylesheet alternate'); node2.setAttribute('rel', 'stylesheet');sessionStorage.setItem('theme', 'dark');}else{node2.setAttribute('rel', 'stylesheet alternate'); node1.setAttribute('rel', 'stylesheet');sessionStorage.setItem('theme', 'light');} return false;"></i>

My RegEx Cheatsheet

2022-09-27T00:00:00+10:00

In this post, I compile a cheatsheet of the main regexes that I use in my projects.

Digital communication relies heavily on regular expressions to make it work. These are sequences of characters that specify a search pattern in the text. It is usually these types of patterns that are used by string-searching algorithms when they are attempting to “find” and/or “replace” strings or when they are attempting to validate input. Regular expression techniques are developed in theoretical computer science and formal language theory.

A regular expression (regex) is a sequence of characters that specifies a search pattern. Regexes are commonly used in text processing tasks, such as finding and replacing specific patterns of characters in a body of text.

Regexes can be used to search for patterns of characters in a string, or to match or replace strings based on specific patterns. They are often used in text editors, programming languages, and command-line utilities to perform these types of tasks.

Regexes are powerful because they allow you to define complex search patterns using a compact and concise syntax. For example, you can use a regex to search for all the email addresses in a document, or to find and replace all instances of a particular word in a piece of text.

There are many different flavors of regexes, with different syntax and capabilities. Some of the most commonly used regexes are based on the Perl programming language, and are known as “Perl-compatible regular expressions” or PCREs.

Regexes can be used for a wide range of text processing tasks, such as:

Searching for specific patterns of characters in a body of text
Validating that a string matches a specific pattern (e.g. to ensure that a password meets certain criteria)
Extracting specific substrings from a larger string (e.g. to extract all the email addresses from a document)
Finding and replacing strings based on specific patterns (e.g. to replace all instances of a particular word in a piece of text)

It is common to use regular expressions and other text processing utilities, for example sed and AWK, to search and replace in text processors, as well as in lexical analysis and in text processing. The majority of general-purpose programming languages support regex capabilities either natively or with the aid of libraries. Examples of such languages include Python, C, C++, Java, and JavaScript.

An example of a regular expression is to locate a word spelled two different ways in a text editor, the regular expression seriali[sz]e matches both “serialise” and “serialize”.

Table of Contents
Character Classes
Python’s regex module
Cookbook
- Select everything between the keywods start and end
- Select email addresses
References:

Character Classes

All characters used in digital communication can be categorized the classes shown in the table below.

Character Class	Same as	Meaning
`[[:alnum:]]`	`[0-9A-Za-z]`	Letters and digits
`[[:alpha:]]`	`[A-Za-z]`	Letters
`[[:ascii:]]`	`[\x00-\x7F]`	ASCII codes 0-127
`[[:blank:]]`	`[\t ]`	Space or tab only
`[[:cntrl:]]`	`[\x00-\x1F\x7F]`	Control characters
`[[:digit:]]`	`[0-9]`	Decimal digits
`[[:graph:]]`	`[[:alnum:][:punct:]]`	Visible characters (not space)
`[[:lower:]]`	`[a-z]`	Lowercase letters
`[[:print:]]`	`[ -~] == [ [:graph:]]`	Visible characters
`[[:punct:]]`	[!"#$%&’()*+,-./:;<=>?@[]^_`{\\|}~]	Visible punctuation characters
`[[:space:]]`	`[\t\n\v\f\r ]`	Whitespace
`[[:upper:]]`	`[A-Z]`	Uppercase letters
`[[:word:]]`	`[0-9A-Za-z_]`	Word characters
`[[:xdigit:]]`	`[0-9A-Fa-f]`	Hexadecimal digits
`[[:<:]]`	`[\b(?=\w)]`	Start of word
`[[:>:]]`	`[\b(?<=\w)]`	End of word

Python’s regex module

The regular expressions module can be imported using the command

import re

It contains the following functions to be used.

`re.findall`

Returns a list containing all matches:

>>> re.findall(r'\bs?pare?\b', 'par spar apparent spare part pare')
['par', 'spar', 'spare', 'pare']
>>> re.findall(r'\b0*[1-9]\d{2,}\b', '0501 035 154 12 26 98234')
['0501', '154', '98234']

`re.finditer`

Returns an iterable of match objects (one for each match):

>>> m_iter = re.finditer(r'[0-9]+', '45 349 651 593 4 204')
>>> [m[0] for m in m_iter if int(m[0]) < 350]
['45', '349', '4', '204']

`re.search`

Returns a Match object if there is a match anywhere in the string:

>>> sentence = 'This is a sample string'
>>> bool(re.search(r'this', sentence, flags=re.I))
True
>>> bool(re.search(r'xyz', sentence))
False

`re.split`

Returns a list where the string has been split at each match:

>>> re.split(r'\d+', 'Sample123string42with777numbers')
['Sample', 'string', 'with', 'numbers']

`re.sub`

Replaces one or many matches with a string:

>>> ip_lines = "catapults\nconcatenate\ncat"
>>> print(re.sub(r'^', r'* ', ip_lines, flags=re.M))
* catapults
* concatenate
* cat

Tip: You can also use string methods {: .notice–info} {: .text-justify}

`re.compile`

Compiles a regular expression pattern for later use:

>>> pet = re.compile(r'dog')
>>> type(pet)
<class '_sre.SRE_Pattern'>
>>> bool(pet.search('They bought a dog'))
True
>>> bool(pet.search('A cat crossed their path'))
False

`re.escape`

Flags

code (short)	code (long)	Description
`re.I`	`re.IGNORECASE`	Ignore case
`re.M`	`re.MULTILINE`	Multiline
`re.L`	`re.LOCALE`	Make `\w`, `\b`, `\s` locale dependent
`re.S`	`re.DOTALL`	Dot matches all (including newline)
`re.U`	`re.UNICODE`	Make `\w`, `\b`, `\d`, `\s` unicode dependent
`re.X`	`re.VERBOSE`	Readable style

Cookbook

Suppose we have two paragraphs as such

paragraph = """

Start: 
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor 
incididunt ut labore et dolore magna aliqua. Sodales ut eu sem integer vitae 
justo eget magna. 

Tincidunt praesent semper feugiat nibh sed pulvinar proin 
gravida. Praesent semper feugiat nibh sed. Mi proin sed libero enim sed faucibus 
turpis. Tortor pretium viverra suspendisse potenti nullam ac. end
"""

Select everything between the keywods `start` and `end`

>>> result = re.search(r"(?<=Start:)((.|\n)*)(?=end)", paragraph).group()
>>> print(result)
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor 
incididunt ut labore et dolore magna aliqua. Sodales ut eu sem integer vitae 
justo eget magna. 

Tincidunt praesent semper feugiat nibh sed pulvinar proin 
gravida. Praesent semper feugiat nibh sed. Mi proin sed libero enim sed faucibus 
turpis. Tortor pretium viverra suspendisse potenti nullam ac.

Select email addresses

Suppose we want to extract the emails contained in the following paragraph:

paragraph = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor 
incididunt ut labore et dolore magna aliqua. Sodales ut eu sem integer vitae 
justo eget magna. John Silva Doe <john.silva_3.doe@email.com> 
Josh Tree Done 'jpsj_3@gmail.com'

Jane Doe <jane_doe4@email.com>

Malesuada fames ac turpis egestas integer eget. Cras semper auctor neque vitae 
tempus. Sed adipiscing diam donec adipiscing tristique risus nec. 
"""

>>> result = re.findall(r"<?(\S+@[\w.-]+\.[a-zA-Z]{2,4}\b)", paragraph)
>>> result
['john.silva_3.doe@email.com', 'jpsj_3@gmail.com', 'jane_doe4@email.com']

References:

[1] https://www.regexr.com
[2] https://quickref.me/regex
[3] https://www.regex101.com

Optimization References

2021-12-23T00:00:00+11:00

My list of reference materials containing for mathematical optimisation, based on Quora.

Lecture notes

Highly recommended: video lectures by Prof. S. Boyd at Stanford, this is a rare case where watching live lectures is better than reading a book.

EE263: Introduction to Linear Dynamical Systems (video): http://www.stanford.edu/~boyd/ee263/videos.html
EE363: Linear Dynamical Systems: http://www.stanford.edu/class/ee363/
EE364a: Convex Optimization I (video): http://www.stanford.edu/class/ee364a/videos.html
EE364b: Convex Optimization II (video): http://www.stanford.edu/class/ee364b/videos.html
6.253: Convex analysis and optimization: http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-253-convex-analysis-and-optimization-spring-2010/lecture-notes/
Optimization courses at MIT: http://optimization.mit.edu/classes.php
Optimisation Course in CMU 10-725 Optimization Fall 2012

Books

S. Bubeck, “Convex Optimization: Algorithms and Complexity”, arXiv:1405.4980, 2015
F. Clarke, “Functional Analysis, Calculus of Variations and Optimal Control”, Springer, 2013
Liberzon, D., “Calculus of Variations and Optimal Control Theory - A Concise Introduction”, Princeton University Press, 2012
S. Boyd and L. Vandenberghe, “Convex Optimization”, Cambridge University Press, 2004
G. Calafiore and L. El Ghaoui, “Optimization Models”, Cambridge University Press, 2014
R. T. Rockarfellar and R. J. B. Wets, “Variational Analysis”, Springer, 1998
D. G. Luenberger and Y. Ye, “Linear and Nonlinear Programming”, 4th ed., Springer, 2016
J. Frédéric Bonnans, J. Charles Gilbert, C. Lemaréchal and C. A. Sagastizábal, “Numerical Optimization”, 2nd ed., Springer, 2006
Papadimitriou & Steiglitz, Combinatorial Optimization: Algorithms and Complexity: http://www.amazon.com/Combinatorial-Optimization-Algorithms-Christos-Papadimitriou/dp/0486402584
Lawson & Hanson, Solving Least Squares Problems: http://books.google.com/books/about/Solving_Least_Squares_Problems.html?id=ROw4hU85nz8C
Bellman, Dynamic Programming: http://www.amazon.com/Dynamic-Programming-Richard-Bellman/dp/0486428095/
Bellman, Applied Dynamic Programming: http://www.amazon.com/Applied-Dynamic-Programming-Richard-Bellman/dp/B0000CLNVK
Bellman, Adaptive Control Processes: http://www.amazon.com/Adaptive-Control-Processes-Bellman/dp/0691079013
Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning: http://www.amazon.com/Genetic-Algorithms-Optimization-Machine-Learning/dp/0201157675
Gill, Murray, Wright, Practical Optimization: http://www.amazon.com/Practical-Optimization-Philip-Gill/dp/0122839528
Ben-Tal & Nemirovsky, Lectures on Modern Convex Optimization: http://www.amazon.com/Lectures-Modern-Convex-Optimization-Applications/dp/0898714915
Bertsekas, Introduction to Linear Optimization: http://www.amazon.com/Introduction-Linear-Optimization-Scientific-Computation/dp/1886529191
Bertsekas, Convex Analysis and Optimization: http://www.amazon.com/Convex-Analysis-Optimization-Dimitri-Bertsekas/dp/1886529450
Bertsekas, Nonlinear programming: http://www.amazon.com/Nonlinear-Programming-Dimitri-P-Bertsekas/dp/1886529000/
Bertsekas, Dynamic Programming and Optimal Control: http://www.amazon.com/Dynamic-Programming-Optimal-Control-Vol/dp/1886529086
Rockafellar, Convex Analysis: http://www.amazon.com/Analysis-Princeton-Landmarks-Mathematics-Physics/dp/0691015864/
Nesterov, Introductory Lectures on Convex Optimization: A Basic Course: http://www.amazon.com/Introductory-Lectures-Convex-Optimization-Applied/dp/1402075537
Ruszczynski, Nonlinear Optimization: http://www.amazon.com/Nonlinear-Optimization-Andrzej-Ruszczynski/dp/0691119155/
Fletcher, Practical Methods of Optimization: http://www.amazon.com/Practical-Methods-Optimization-R-Fletcher/dp/0471494631
Nocedal & Wright, Numerical Optimization: http://www.amazon.com/Numerical-Optimization-Operations-Financial-Engineering/dp/0387303030/ Press et al.
Numerical Recipes: http://www.amazon.com/Numerical-Recipes-3rd-Scientific-Computing/dp/0521880688
Dennis & Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations: http://www.amazon.com/Numerical-Unconstrained-Optimization-Nonlinear-Mathematics/dp/0898713641
Cornuejols & Tütüncü, Optimization Methods in Finance: http://www.amazon.com/Optimization-Methods-Finance-Mathematics-Risk/dp/0521861705/
Stengel, Optimal Control and Estimation: http://www.amazon.com/Optimal-Control-Estimation-Advanced-Mathematics/dp/0486682005/
Kirk, Optimal Control Theory: http://www.amazon.com/Optimal-Control-Theory-Donald-Kirk/dp/0486434842/
Spall, Introduction to Stochastic Search and Optimization: http://www.amazon.com/Introduction-Stochastic-Search-Optimization-James/dp/0471330523/
Lasdon, Optimization Theory for Large Systems: http://www.amazon.com/Optimization-Theory-Large-Systems-Lasdon/dp/0486419991
Deb & Kalyanmoy, Multi-Objective Optimization Using Evolutionary Algorithms: http://www.amazon.com/Multi-Objective-Optimization-Using-Evolutionary-Algorithms/dp/047187339X
Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning: http://www.amazon.com/Genetic-Algorithms-Optimization-Machine-Learning/dp/0201157675/
Minoux, Mathematical Programming: http://www.amazon.com/Mathematical-Programming-Wiley-Interscience-mathematics-optimization/dp/0471901709
Camacho & Alba: Model Predictive Control: http://www.amazon.com/Predictive-Control-Advanced-Textbooks-Processing/dp/1852336943
Hillier, Introduction to Operations Research: http://www.amazon.com/Introduction-Operations-Research-Student-Access/dp/0077298349/
Puterman, Markov Decision Processes: http://www.amazon.com/Markov-Decision-Processes-Programming-Probability/dp/0471727822
Powell, Approximate Dynamic Programming: http://www.amazon.com/Approximate-Dynamic-Programming-Dimensionality-Probability/dp/0470171553/

Other

Grešovnik, Optimization Links: http://www2.arnes.si/~ljc3m2/igor/links.html
8 Arsham, Intro to Modeling and Optimization: http://home.ubalt.edu/ntsbarsh/opre640a/partviii.htm
Matlab Optimization Toolbox resources: http://www.mathworks.com/help/toolbox/optim/
Bennett et al., The Interplay of Optimization and Machine Learning Research: http://jmlr.csail.mit.edu/papers/volume7/MLOPT-intro06a/MLOPT-intro06a.pdf
Evolutionary algorithms chapter in Jason Brownlee’s book : http://www.cleveralgorithms.com/nature-inspired/evolution.html
Brent, Algorithms for Minimization without Derivatives: http://maths-people.anu.edu.au/~brent/pub/pub011.html

Optimizing Marketing Campaigns Part 1: Clustering

2021-12-15T00:00:00+11:00

In this series of posts, we analyze how to maximize the profit of marketing campaigns using mathematical optimization techniques. In the first part, we use optimize the profit of campaign for a cluster of customers. To do this, we model the profit and cost of the campaigns of two products. Furthermore, the constraints on maximum number of offers, budget and return on investment are also modelled and considered to maximize the profit.

1. Modelling the Profit
2. Modelling the Constraints
3. Data
4. Python Implementation
5. In the Next Post
6. References

The estimated individual expected profit can be determined with machine learning models. For example, a response model such as uplift model can be used to estimate the individual expected profit.

The key idea is to cluster the estimated individual expected profits and then consider the cluster centroids as representative of the data for all the individual customers within a single cluster. This aggregation enables the problem to be formulated as a linear programming problem so that rather than assigning offers to individual customers, the model identifies proportions within each cluster for each product offer that maximizes the marketing campaign return on investment while considering the business constraints.

From the technical viewpoint, the model is formulated as a mixed-integer linear programming problem.

1. Modelling the Profit

Maximize total expected profit from marketing campaign and heavily penalize any correction to the budget. Let $K$ be the set of clusters and $J$ the set of products, we define the profit function as

$\max_{y,z} \sum_{k \in K} \sum_{j \in J} \pi_{k,j} \cdot y_{k,j} - M \cdot z\; \tag{Profit}$ where

$\pi_{k,j}$: is the expected profit to the bank from the offer of product $j \in J$ to an average customer of cluster $k \in K$.
$y_{k,j} \geq 0$: is the number of customers in cluster $k \in K$ that are offered product $j \in J$.
$M$: Big M penalty. This penalty is associated with corrections on the budget that are necessary to satisfy other business constraints.
$z \geq 0$: Increase in budget in order to have a feasible campaign.

2. Modelling the Constraints

2.1. Maximum Number of Offers for each Cluster

Maximum number of offers of products for each cluster is limited by the number of customers in the cluster.

$\sum_{j \in J} y_{k,j} \leq N_{k} \quad \forall k \in K\;, \tag{Max Number of Offers}$ where $N_k$ is the number of customers in cluster $k \in K$.

2.2. Maximum Budget

The marketing campaign budget constraint enforces that the total cost of the campaign should be less than the budget campaign. There is the possibility of increasing the budget to ensure the feasibility of the model, the minimum number of offers for all the product may require this increase in the budget.

$\sum_{k \in K} \sum_{j \in J} \nu_{k,j} \cdot y_{k,j} \leq B + z\;, \tag{Max Budget}$ where $\nu_{k,j}$ is the average variable cost associated with the offer of product $j \in J$ to an average customer of cluster $k \in K$ and $B$ is the marketing campaign budget.

2.3. Minimum Number of Offers of each Product

Minimum number of offers of each product.

$\sum_{k \in K} y_{k,j} \geq Q_{j} \quad \forall j \in J\;, \tag{Min Number of Offers}$ where $Q_j$ is the minimum number of offers of product $j \in J$ to be made.

2.4. Minimum ROI

The minimum ROI constraint ensures that the ratio of total profits over cost is at least one plus the corporate hurdle rate.

$\sum_{k \in K} \sum_{j \in J} \pi_{k,j} \cdot y_{k,j} \geq (1+R) \cdot \sum_{k \in K} \sum_{j \in J} \nu_{k,j} \cdot y_{k,j}\;, \tag{Minimum ROI}$ where $R$ is the corporate hurdle rate. This hurdle rate is used for the ROI calculation of the marketing campaign.

2.5. Recap of the Optimization Model

The optimization model is formulated as a mixed-integer linear programming problem. The objective function is defined as the maximum expected profit from the marketing campaign. The constraints are defined as the maximum number of offers of each product for each cluster, the budget, the minimum ROI, and the minimum number of offers of each product.

\[\begin{array}{rlr} \max_{y,z}&\sum_{k \in K} \sum_{j \in J} \pi_{k,j} \cdot y_{k,j} - M \cdot z&\text{(Profit)}\\ \text{s.t.}&\sum_{j \in J} y_{k,j} \leq N_{k} \quad \forall k \in K&\text{(Max Number of Offers)}\\ &\sum_{k \in K} \sum_{j \in J} \nu_{k,j} \cdot y_{k,j} \leq B + z&\text{(Max Budget)}\\ &\sum_{k \in K} y_{k,j} \geq Q_{j}&\text{(Min Number of Offers)}\\ &\sum_{k \in K} \sum_{j \in J} \pi_{k,j} \cdot y_{k,j} \geq (1+R) \cdot \sum_{k \in K} \sum_{j \in J} \nu_{k,j} \cdot y_{k,j}&\text{(Minimum ROI)} \end{array}\]

3. Data

We consider two products, ten customers, and two clusters of customers. The corporate hurdle-rate is twenty percent.

The following table defines the expected profit of an average customer in each cluster when offered a product.

	Product 1	Product 2
cluster 1	$2 000	$1 000
cluster 2	$3 000	$2 000

The expected cost of offering a product to an average customer in a cluster is determined by the following table.

	Product 1	Product 2
cluster 1	$200	$100
cluster 2	$300	$200

The budget available for the marketing campaign is $200.

The number of customers in each cluster is given by the following table.

	Num. Customers
cluster 1	5
cluster 2	5

The minimum number of offers of each product is provided in the following table,

	Min Offers
product 1	2
product 2	2

4. Python Implementation

5. In the Next Post

We will learn how to optimize the campaigns at an individual customer level.

6. References

https://gurobi.github.io/modeling-examples/marketing_campaign_optimization/marketing_campaign_optimization.html
M.-D. Cohen, “Exploiting response models—optimizing cross-sell and up-sell opportunities in banking”, Information Systems 29 (2004) 327–341

Mixed Integer Programming

2021-12-08T00:00:00+11:00

Mixed Integer Programming (MIP) are a form of optimization that is formulated using a combination of equations that are continous and discrete.

MIPs typically appear when one or more decision variable is boolean, ie, assume value 0 or 1. This type of optimization problem is formulated as, find $x\in\mathbb{R}^n$ such that

\[\begin{array}{rll} \min& x^T Q x + q^Tx\\ \text{subject to}& l \leq x \leq u & (\text{bound constraints})\\ &x^T Q x + q^T x \leq b & (\text{quadratic constraints})\\ &\exists i\in[1,n]\subset\mathbb{N}\text{ such that } x_i\in\mathbb{Z} &(\text{integrality constraints}), \end{array}\]

where $x=(x_1,\ldots,x_n)$ is the vector of decision variables, $Q\in\mathbb{R}^{n\times n}$ is the matrix of coefficients of the objective function, $q\in\mathbb{R}^n$ is the vector of coefficients of the linear part of the objective function, $l\in\mathbb{R}^n$ is the vector of lower bounds, $u\in\mathbb{R}^n$ is the vector of upper bounds, and $b\in\mathbb{R}^n$ is the vector of the right-hand side of the quadratic constraints.

An example of the implementation of the above formulation is shown in the notebook below.

Critical Point

Leveraging Generative AI for Effective Risk Management Part II: Actionable Strategies for Businesses

Understanding Generative AI in Risk Management

1. Identifying Emerging Risks

2. Scenario Modeling

3. Designing Tailored Controls

Actionable Tips for Businesses

1. Data Quality Matters

2. Collaboration is Key

3. Human Oversight and Interpretation

Real-Life Case Studies

Case Study 1: Proactive Supply Chain Management

Case Study 2: Healthcare System Enhancement

Conclusion

Leveraging Generative AI for Effective Risk Management Part I: Introduction to Business Risk

The Origins of Business Risk

Understanding Risk Management

The Three Key Elements of a Comprehensive Risk Management Strategy

1. Detecting Risks and Addressing Vulnerabilities

2. Evaluating Risk Tolerance

3. Choosing a Risk Management Approach

Developing Adaptable Risk Management

The Role of Scenarios in Grasping Uncertainty

Insights on Risk in Financial Institutions

Understanding Cyber Risk

A Risk-Based Cybersecurity Approach

Prudent Investments in Risk Management

The Traveling Salesman Problem

Modules

Data

Formulation

Solver

Solution Analysis

Plot Solution

References

Are Model Performance Metrics Enough?

Introduction

Perfomance x Correcteness

The Dunning-Krueger Effect

The role of business assumptions

Conclusion

Using a Cost Functional to Optimize Hyperparameters Using Cross Validation

What is a cost function?

Advantages of Using a Cost Functional to Optimize Hyperparameters Using Cross Validation

Example

Conclusion

Adding a Dark/Light Theme Switcher to Minimal Mistakes

My RegEx Cheatsheet

Table of Contents

Character Classes

Python’s regex module

re.findall

re.finditer

re.search

re.split

re.sub

re.compile

re.escape

Flags

Cookbook

Select everything between the keywods start and end

Select email addresses

References:

Optimization References

Lecture notes

Books

Other

Optimizing Marketing Campaigns Part 1: Clustering

1. Modelling the Profit

2. Modelling the Constraints

2.1. Maximum Number of Offers for each Cluster

2.2. Maximum Budget

2.3. Minimum Number of Offers of each Product

2.4. Minimum ROI

2.5. Recap of the Optimization Model

3. Data

4. Python Implementation

5. In the Next Post

6. References

Mixed Integer Programming

`re.findall`

`re.finditer`

`re.search`

`re.split`

`re.sub`

`re.compile`

`re.escape`

Select everything between the keywods `start` and `end`