Decisions and Loss Functions - A more clinical focus
In the previous post, loss functions where considered in the context of estimating measures of central tendency for distributions. In this post, I want to look at the computation of loss functions in situations that might arise in a clinical predictive model. This is all textbook stuff – see Further Reading – but I wanted to summarise it in a way I understood when in a year’s time, I wonder what the code does.
I realised that the notation in the last post was sloppy, so for this post, I’ll adopt the conventions in (Berger 1985).
The basic setup is:
- There is a finite set of available actions,
– in the examples that follow, we’ll restrict our attention to a choice between two actions of “do nothing” or “intervene/treat” respectively but there is no loss of generality in assuming this. - There are uncertain quantities representing “states of the world”
about which we can obtain data and that can affect our decision about which action to take. Here, these states will reflect something about a patient.
Then, our task will be to choose the best action from
Information about the states
1 Setup
To begin with, assume the simplest case of there being two states
For a single example patient
library(ggplot2)
library(gridExtra)
library(kableExtra)
library(latex2exp)
library(reshape2)
# globals for presentation
basictheme <- theme_minimal() +
theme(axis.title = element_text(size = 14),
axis.text = element_text(size = 12),
plot.title = element_text(size = rel(1.25), face = "bold", hjust = 0.5 ))
set.seed(3141)
range01 <- function(x){(x-min(x))/(max(x)-min(x))}
samples <- range01( rgamma( 2000, shape = 2, scale = 2) )
df <- data.frame( y = samples )
ggplot( df, aes( y ) ) +
geom_density( fill = "#fa9fb5") +
ylab("Density\n") +
xlab("\nScore (Y)") + basictheme
We now require a mapping from the samples
Define the distribution over states
So, we basically histogram the samples into two bins either side of 0.5, representing the probability of a patient being negative (
pdfFromSamples <- function(a, b, delta, samples) {
H <- hist( samples, plot = FALSE, breaks = seq(a, b, by = delta) )
ret <- data.frame(
mids = H$mids,
freq = H$counts
)
ret$P <- ret$freq / sum(ret$freq)
return(ret)
}
pdf.Y <- pdfFromSamples(0,1,delta = 1/2, samples)
pdf.Y$theta <- factor( c(1,2) )
pdf.plot <- ggplot( pdf.Y, aes( x = theta, y = P ) ) +
geom_col( fill = "#fa9fb5" ) +
scale_x_discrete(labels = pdf.Y$theta ) +
xlab(TeX("$\\theta")) +
ylab(TeX('$\\pi(\\theta)')) +
basictheme
print( pdf.plot )
According to our blunt assignment of states to output from the predictive model, the probability the patient is negative is
With this setup, (two actions, two states) we can “tabulate” the combinations of actions and states (the Cartesian product:
In each “cell” or combination
2 Example Loss Matrix
Equipped with this toy example we assign losses:
A <- c("a1:<br>do nothing","a2:<br>intervene")
Theta <- c("<b>theta1:</b><br>negative", "<b>theta2:</b><br>positive")
loss.matrix <- matrix( c( 0.0, -0.5,
-1.0, 0 ),
nrow = 2, ncol = 2, byrow = TRUE)
rownames(loss.matrix) <- Theta
colnames(loss.matrix) <- A
knitr::kable( loss.matrix,
format = "html",
align = 'c',
full_width = FALSE,
position = 'c',
escape = FALSE ) %>%
kable_styling(position = "center")
a1: do nothing | a2: intervene | |
---|---|---|
theta1: negative | 0 | -0.5 |
theta2: positive | -1 | 0.0 |
As a use example, assume we decide
3 Bayesian Expected Loss
Following (Berger 1985), we define the Bayesian expected loss (BEL) for action
# Bayesian expected loss (BEL)
BEL <- function( a, p.pi, loss.matrix ) {
sum( loss.matrix[ , a ] * p.pi )
}
# compute BEL for each action a:
rho.A <- data.frame(
A = factor(c("a1","a2")),
rho = rep(NA,2)
)
# for each action
for( j in 1:2 ) {
rho.A$rho[j] <- BEL( j, pdf.Y$P, loss.matrix )
}
bel.plot <- ggplot( rho.A, aes(x = A, y = rho) ) +
geom_col( fill = "#d6604d" ) +
basictheme
grid.arrange( pdf.plot, bel.plot, nrow = 1, ncol = 2 )
Note that the upper bound of the BEL is zero.
4 Conditional Bayes Decision Principal
Having established the BEL for each action, the conditional bayes decision principle (CBD) for deciding on an action (Berger 1985) is:
- choose
such that minimises the BEL :
In code: the resulting vector for
knitr::kable( rho.A,
format = "html",
align = 'c',
full_width = FALSE,
position = 'c',
escape = FALSE ) %>%
kable_styling(position = "center")
A | rho |
---|---|
a1 | -0.0790 |
a2 | -0.4605 |
And the action that minimises the BEL:
print( min.bel.CBD <- which.max( rho.A$rho ) )
## [1] 1
In the example above, we find the action 1 (i.e.
5 Developing the Loss Function
Consider a different patient where the posterior distribution
samples <- range01( c(rnorm( 1000, mean = 0, sd = 2 ), rnorm( 1000, mean = 10, sd = 3) ) )
df <- data.frame( y = samples )
ggplot( df, aes( y ) ) +
geom_density( fill = "#fa9fb5") +
ylab("Density\n") +
xlab("\nScore (Y)") + basictheme
In this example, there’s uncertainty about the patient being negative or positive.
This time, we’ll quantise into three equal-sized intervals over the range
pdf.Y <- pdfFromSamples(0,1,delta=1/3,samples)
pdf.Y$theta <- factor( seq(1,3,by=1) )
pdf.plot <- ggplot( pdf.Y, aes( x = theta, y = P ) ) +
geom_col( fill = "#fa9fb5" ) +
scale_x_discrete(labels = pdf.Y$theta ) +
xlab(TeX("$\\theta")) +
ylab(TeX('$\\pi(\\theta)')) +
basictheme
print( pdf.plot )
The loss matrix will now be a
A <- c("a1:<br>do nothing","a2:<br>intervene")
Theta <- c("<b>theta1:</b><br>negative",
"<b>theta2:</b><br>equivocal",
"<b>theta3:</b><br>positive")
loss.matrix <- matrix( c( 0.0, -0.5,
-0.5, -0.2,
-1.0, 0 ),
nrow = 3, ncol = 2, byrow = TRUE)
rownames(loss.matrix) <- Theta
colnames(loss.matrix) <- A
knitr::kable( loss.matrix,
format = "html",
align = 'c',
full_width = FALSE,
position = 'c',
escape = FALSE ) %>%
kable_styling(position = "center")
a1: do nothing | a2: intervene | |
---|---|---|
theta1: negative | 0.0 | -0.5 |
theta2: equivocal | -0.5 | -0.2 |
theta3: positive | -1.0 | 0.0 |
Bearing in mind that states are uncertain, the logic behind this loss matrix is as follows:
For
(do nothing) : no cost is incurred if the patient is likely negative ( ). If the patient is most likely positive ( ) and we do nothing, this is evidently the wrong decision and we incur the maximum penalty of -1.0. If there is some equivocation ) – we penalise by half the maximum cost to discourage doing nothing (equating to loss aversion)For
(intervene) : for we incur a cost (-0.5) for intervening when unnecessary. Naturally, for , the correct thing to do is intervene so this has no penalty associated. For the equivocal case, , we should certainly not ignore these cases but simply intervening (i.e. with zero penalty) is inappropriate. So we incur a small penalty (-0.2) to nudge us away from intervening as the default.
Notice that in designing the loss matrix, we are trying to capture domain knowledge about the deployment of the model – for example, the loss attached to doing nothing (when there is equivocation about the negative/positive state of the patient) pushes us to be cautious and intervene.
Let’s look at the resulting BEL:
# compute BEL for each action a:
rho.A <- data.frame(
A = factor(c("a1","a2")),
rho = rep(NA,2)
)
# for each action
for( j in 1:2 ) {
rho.A$rho[j] <- BEL( j, pdf.Y$P, loss.matrix )
}
bel.plot <- ggplot( rho.A, aes(x = A, y = rho) ) +
geom_col( fill = "#d6604d" ) +
basictheme
grid.arrange( pdf.plot, bel.plot, nrow = 1, ncol = 2 )
knitr::kable( rho.A,
format = "html",
align = 'c',
full_width = FALSE,
position = 'c',
escape = FALSE ) %>%
kable_styling(position = "center")
A | rho |
---|---|
a1 | -0.35125 |
a2 | -0.30530 |
print( min.bel.CBD <- which.max( rho.A$rho ) )
## [1] 2
The action that minimises
We can continue introducing more and more granularity in quantising the posterior predictions
Sigmoid <- function( x, A, B, m, s ) {
# x = vector of values
# A = height of sigmoid
# B = translation on y axis
# m = value of x for which Sigmoid() = half max value
# s = steepness of linear component
exp.x <- exp( -(x-m)/s )
return(
( A + B * (1+exp.x) ) / (1+exp.x)
)
}
# plots to compare the quantised states to a more fine-grained version
pdf.Y1 <- pdfFromSamples(0, 1, delta= 1/50, samples)
pdf.plot.Y1 <- ggplot( pdf.Y1, aes( x = mids, y = P ) ) +
geom_col( fill = "#fa9fb5" ) +
xlab(TeX("$\\theta")) +
ylab(TeX('$\\pi(\\theta)')) +
basictheme
pdf.Y2 <- pdfFromSamples(0, 1, delta= 1/3, samples)
pdf.Y2$Theta <- factor(c("theta1","theta2","theta3"))
pdf.plot.Y2 <- ggplot( pdf.Y2, aes( x = Theta, y = P ) ) +
geom_col( fill = "#fa9fb5" ) +
scale_x_discrete(labels = pdf.Y2$theta ) +
xlab(TeX("$\\theta")) +
ylab(TeX('$\\pi(\\theta)')) +
basictheme
# loss functions for a1 and a2
loss.fun.a1 <- Sigmoid(pdf.Y1$mids, A = -1.0, B = 0, m = 0.5, s = 0.15 )
loss.fun.a2 <- Sigmoid(pdf.Y1$mids, A = 0.5, B = -0.5, m = 0.3, s = 0.08 )
# build a tabular version of loss function
loss.fun <- data.frame( Theta = pdf.Y1$mids,
L.a1 = loss.fun.a1,
L.a2 = loss.fun.a2
)
# show the loss function and 3 state quantised loss matrix
loss.fun.plot <- ggplot( loss.fun, aes( x = Theta ) ) +
geom_line( aes( y = L.a1 ), colour = "#fc8d59", size = 1) +
annotate( "label", x = 0.9, y = -0.75, label = "a1" ) +
geom_line( aes( y = L.a2 ), colour = "#91bfdb", size = 1 ) +
annotate( "label", x = 0.9, y = -0.15, label = "a2" ) +
ylab("Loss") +
xlab("\nTheta") +
basictheme
df.loss.matrix <- data.frame( Theta = factor( c("theta1","theta2","theta3") ),
L.a1 = loss.matrix[,1],
L.a2 = loss.matrix[,2]
)
loss.matrix.plot <- ggplot( df.loss.matrix ) +
geom_line( aes( x = Theta, y = L.a1, group = 1),
colour = "#fc8d59", size = 1) +
geom_point( aes( x = Theta, y = L.a1, group = 1),
colour = "#fc8d59", size = 4) +
annotate( "label",
x = 2.8, y = -0.7, label = "a1" ) +
geom_line( aes( x = Theta, y = L.a2, group= 1),
colour = "#91bfdb", size = 1.5 ) +
geom_point( aes( x = Theta, y = L.a2, group= 1),
colour = "#91bfdb", size = 4 ) +
annotate( "label",
x = 2.8, y = -0.2, label = "a2" ) +
ylab("Loss") +
xlab("\nTheta") +
basictheme
grid.arrange( pdf.plot.Y2, pdf.plot.Y1,
loss.matrix.plot, loss.fun.plot,
nrow = 2, ncol = 2 )
In the figure above, we show the three-state loss matrix underneath the distribution
# the loss function x probability
pdf.Y1$L.a1 <- pdf.Y1$P * loss.fun$L.a1
pdf.Y1$L.a2 <- pdf.Y1$P * loss.fun$L.a2
# product of the posterior and loss function
loss.fun.plot2 <- ggplot( pdf.Y1, aes( x = mids ) ) +
geom_line( aes( y = L.a1 ), colour = "#fc8d59", size = 1.5) +
#annotate( "label", x = 0.9, y = -0.75, label = "a1" ) +
geom_line( aes( y = L.a2 ), colour = "#91bfdb", size = 1.5 ) +
#annotate( "label", x = 0.9, y = -0.15, label = "a2" ) +
ylab("Loss") +
xlab("\nTheta") +
basictheme
## The actual BEL
# we need a matrix representation of the loss function
loss.fun.matrix <- as.matrix( loss.fun[,2:3] )
colnames( loss.fun.matrix ) <- c("a1","a2")
# compute BEL for each action a:
rho.A <- data.frame(
A = factor(c("a1","a2")),
rho = rep(NA,2)
)
# for each action
for( j in 1:2 ) {
rho.A$rho[j] <- BEL( j, pdf.Y1$P, loss.fun.matrix )
}
bel.plot <- ggplot( rho.A, aes(x = A, y = rho) ) +
geom_col( fill = "#d6604d" ) +
basictheme
grid.arrange( pdf.plot.Y1, loss.fun.plot2, bel.plot, ncol = 2, nrow = 2)
Above, the top-left panel shows a finer-grained distribution function
6 Further Reading
If I were to try this again (rather than trying to piece together an understanding from wikipedia), I would proceed in this order:
- Start with (Savage 1951) for foundations/first principles and tutorial approach.
- First four chapters of (Berger 1985) for a really clear exposition of the core ideas.
- For decision theory in point estimation from the perspective of sciences concerned with prediction and forecasting: (Gneiting 2011) provides a comprehensive review
- Risk/decision theory for classification Chapter 2 of (Duda, Hart, and Stork 2012) and Chapter 1.5 of (Bishop 2006).
- Foundations in Bayesian principles more generally: Chapter 2 and Appendix B of (Bernardo and Smith 2009)