Recently, on cross-validated, I used the example of logistic regression coefficients to demonstrate biased maximum likelihood estimates. In fact, the bias of these estimators is undefined: under the logistic regression model, there is a strictly positive (although extremely small) probability of perfect separation of the data by a hyper plane in the covariate space, leading to infinite estimates in the regression parameters. For illustration, here's an example in a single covariate.
Because the regression coefficients have a positive probability of being either positive or negative infinity, the expected value for the regression parameters is undefined for any finite sample size!
Does that mean logisitic regression is broken? Of course not. Maximum likelihood theory tells us that the estimates converge in distribution, not that the mean converges. Side note: I think this would make a great example for a first year graduate course in probability.
A commentor asked whether logistic regression is biased if we ignore the case of perfect separation. I decided to examine this using simulation. Another question I had is whether the estimated probabilities were biased. Using Rmarkdown, I walked through the exploration. Here goes!
We start with some utility functions.
Does that mean logisitic regression is broken? Of course not. Maximum likelihood theory tells us that the estimates converge in distribution, not that the mean converges. Side note: I think this would make a great example for a first year graduate course in probability.
A commentor asked whether logistic regression is biased if we ignore the case of perfect separation. I decided to examine this using simulation. Another question I had is whether the estimated probabilities were biased. Using Rmarkdown, I walked through the exploration. Here goes!
We start with some utility functions.
Question 1: Ignoring perfect separation, is there bias in the estimated logistic regression parameters?
To examine the small sample bias, we will simulate a large number of data sets, compute the logistic regression fit and extract the coefficients and estimated probabilities.
The probability of getting perfect separation in these simulations is so small that even in this large simulation (10,000 sample fits), we still don't observe a single perfect separation. So this esstentially ignores the issue of perfect separation.
To examine the small sample bias, we will simulate a large number of data sets, compute the logistic regression fit and extract the coefficients and estimated probabilities.
The probability of getting perfect separation in these simulations is so small that even in this large simulation (10,000 sample fits), we still don't observe a single perfect separation. So this esstentially ignores the issue of perfect separation.
It may be a little hard to see from the plot, but the estimator is right skewed! It’s also worth noting that
perfect separation was never observed. We can examine this a little more formally by examining the mean,
median and corresponding standard errors.
perfect separation was never observed. We can examine this a little more formally by examining the mean,
median and corresponding standard errors.
We see a distinct upward bias in both the mean and median. What about the fitted probabilities?
Question 2: Are the fitted probabilities biased?
Question 2: Are the fitted probabilities biased?
There appears to be no conclusive bias in the estimated probabilities! If anything, there is a bias towards the middle on the tails. This is kind of interesting: remember that there was an upward bias on the regression parameters. If your regression parameter is over estimated, then your probabilities will be further away from the middle. Yet despite having an upward bias on the regression parameter, we have mild evidence toward the middle on the tails! Why? Jensen's inequality! Over estimating the regression parameters pulls the probabilities up, but not as much as under estimating the regression parameters pulls the estimated probabilities down.