I recently had to run an experiment with Bayesian Logistic Regression but didn’t find a place that contained both a satisfying explanation for the motivation and derivation of the gradient.
We assume that the class label can be predicted from the input and some shared latent variable through the logistic function ,
This allows us to predict new points by marginalizing out the inferred distribution over ,
We don’t know the posterior though. How can we learn this distribution? By Bayes rule we obtain,
A common way to estimate is by a point estimate through maximum likelihood estimation (MLE). This is simply using the mode of the log-likelihood (effectively assuming uniform prior on ),
For logistic regression there is no close form solution for the above but the negative log-likelihood is convex and positive definite so standard optimization techniques applies to find . But what if we wanted to get the distribution over ?
has no nice parametric form that we know. Usually two approaches are considered:
For either approaches the gradient is usually required.
In all the approaches we needed to work with the log-likelihood. Specifically, we require the gradient. Assuming i.i.d. samples we can write the likelihood as,
This correctly “activates” either of and depending on the value of .
The log-likelihood is trivially,
The derivative of the log-likelihood (sometimes called the score) can then be shown to be,
The way we have defined the log-likelihood we need for things to work out nicely.
There is a nice connection with regularization if we instead of MLE did maximum a posteriori (MAP) which simply maximizes,
Working with the potential functions instead (i.e. the log probability),
If we choose a Gaussian prior this is,
Which is simply the negative log-likelihood with -regularization!
The two assumptions we made in the beginning can be altered:
A resource I found useful was Roman Garnett lecture notes.
 A. Durmus and S. Majewski, “Analysis of langevin monte carlo via convex optimization.” Journal of Machine Learning Research, vol. 20, no. 73, pp. 1–46, 2019.