Maximum Likelihood Estimation

The 10 data points and possible Gaussian distributions from which the data were drawn. f1 is normally distributed with mean 10 and variance 2.25 (variance is equal to the square of the standard deviation), this is also denoted f1 ∼ N (10, 2.25). f2 ∼ N (10, 9), f3 ∼ N (10, 0.25) and f4 ∼ N (8, 2.25). The goal of maximum likelihood is to find the parameter values that give the distribution that maximise the probability of observing the data.

Maximum likelihood estimator

As a result of solving a maximisation problem, a maximum likelihood estimator $widehat heta$ of $ heta _0$ is obtained:

To put it another way, $widehat heta$ is the parameter that optimises the sample $xi $’s likelihood. The maximum likelihood estimator of $ heta $ is called $ widehat heta $.

Calculating the Maximum Likelihood Estimates

Now we can go on to learning how to calculate the parameter values now that we have an intuitive concept of what maximum likelihood estimation is. The values we discover are known as maximum likelihood estimates (MLE).
We’ll use an example to show this once more. Suppose we have three data points this time and we suppose that they have been generated by a process that is properly represented by a Gaussian distribution. These are the numbers 9, 9.5, and 11. How do we calculate the maximum likelihood estimates of the parameter values of the Gaussian distribution μ and σ?

  • P(X | theta)
  • that the MLE of θ is ˆθ = max(X1,···,Xn).

The log likelihood

The above calculation for total probability is difficult to differentiate, the natural logarithm of the expression is typically always used to simplify it. Because the natural logarithm is a monotonically growing function, this is perfectly good. This indicates that when the value on the x-axis rises, so does the value on the y-axis (see figure below). This is significant because it guarantees that the highest value of the probability log occurs at the same place as the original probability function. As a result, rather than using the original likelihood, we can use the simpler log-likelihood.

Monotonic behaviour of the original function, y = x on the left and the (natural) logarithm function y = ln(x). These functions are both monotonic because as you go from left to right on the x-axis the y value always increases.
Example of a non-monotonic function because as you go from left to right on the graph the value of f(x) goes up, then goes down and then goes back up again

Example Applications of Maximum Likelihood Estimation

Maximum likelihood estimation is effective in a wide range of empirical applications due to its adaptability. It can be used in a wide range of models, from simple linear regression to complex choice models.

  • The linear regression model
  • The probit model


Congratulations! You should have a better knowledge of the foundations of maximum likelihood estimation after reading today’s blog. We’ve gone through the following topics in particularly concepts of maximum likelihood extimation and how to calculate it.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store