Maximum likelihood parameters deviate from posterior distributionslme() and lmer() giving conflicting...
tikz: show 0 at the axis origin
Why Is Death Allowed In the Matrix?
How can I make my BBEG immortal short of making them a Lich or Vampire?
Do VLANs within a subnet need to have their own subnet for router on a stick?
What is the offset in a seaplane's hull?
What defenses are there against being summoned by the Gate spell?
Test whether all array elements are factors of a number
How did the USSR manage to innovate in an environment characterized by government censorship and high bureaucracy?
Why, historically, did Gödel think CH was false?
Dragon forelimb placement
How is it possible to have an ability score that is less than 3?
Why "Having chlorophyll without photosynthesis is actually very dangerous" and "like living with a bomb"?
Fencing style for blades that can attack from a distance
Is this a crack on the carbon frame?
What do you call a Matrix-like slowdown and camera movement effect?
Today is the Center
Font hinting is lost in Chrome-like browsers (for some languages )
How do we improve the relationship with a client software team that performs poorly and is becoming less collaborative?
How can bays and straits be determined in a procedurally generated map?
Writing rule stating superpower from different root cause is bad writing
Do I have a twin with permutated remainders?
The use of multiple foreign keys on same column in SQL Server
can i play a electric guitar through a bass amp?
Can I make popcorn with any corn?
Maximum likelihood parameters deviate from posterior distributions
lme() and lmer() giving conflicting resultsWhy is MCMC needed when estimating a parameter using MAPGiven MCMC samples, what are the options for estimating posterior of parameters?Maximizing likelihood versus MCMC sampling: Comparing Parameters and DevianceModelling parameters in maximum likelihoodMarkov chain Monte Carlo (MCMC) for Maximum Likelihood Estimation (MLE)What is the common criterion to decide the performance of prior selection in MCMCConnection between MCMC and Optimization for Inverse/Parameter-Estmation ProblemsFailure of Maximum Likelihood EstimationDo posterior probability values from an MCMC analysis have any use?Monte Carlo maximum likelihood vs Bayesian inferenceExample of maximum a posteriori that does not match the mean of a marginalized posterior
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
$begingroup$
I have a likelihood function $mathcal{L}(d | theta)$ for the probability of my data $d$ given some model parameters $theta in mathbf{R}^N$, which I would like to estimate. Assuming flat priors on the parameters, the likelihood is proportional to the posterior probability. I use an MCMC method to sample this probability.
Looking at the resulting converged chain, I find that the maximum likelihood parameters are not consistent with the posterior distributions. For example, the marginalized posterior probability distribution for one of the parameters might be $theta_0 sim N(mu=0, sigma^2=1)$, while the value of $theta_0$ at the maximum likelihood point is $theta_0^{ML} approx 4$, essentially being almost the maximum value of $theta_0$ traversed by the MCMC sampler.
This is an illustrative example, not my actual results. The real distributions are far more complicated, but some of the ML parameters have similarly unlikely p-values in their respective posterior distributions. Note that some of my parameters are bounded (e.g. $0 leq theta_1 leq 1$); within the bounds, the priors are always uniform.
My questions are:
Is such a deviation a problem per se? Obviously I do not expect the ML parameters to exactly coincide which the maxima of each of their marginalized posterior distributions, but intuitively it feels like they should also not be found deep in the tails. Does this deviation automatically invalidate my results?
Whether this is necessarily problematic or not, could it be symptomatic of specific pathologies at some stage of the data analysis? For example, is it possible to make any general statement about whether such a deviation could be induced by an improperly converged chain, an incorrect model, or excessively tight bounds on the parameters?
bayesian maximum-likelihood optimization inference mcmc
New contributor
$endgroup$
add a comment |
$begingroup$
I have a likelihood function $mathcal{L}(d | theta)$ for the probability of my data $d$ given some model parameters $theta in mathbf{R}^N$, which I would like to estimate. Assuming flat priors on the parameters, the likelihood is proportional to the posterior probability. I use an MCMC method to sample this probability.
Looking at the resulting converged chain, I find that the maximum likelihood parameters are not consistent with the posterior distributions. For example, the marginalized posterior probability distribution for one of the parameters might be $theta_0 sim N(mu=0, sigma^2=1)$, while the value of $theta_0$ at the maximum likelihood point is $theta_0^{ML} approx 4$, essentially being almost the maximum value of $theta_0$ traversed by the MCMC sampler.
This is an illustrative example, not my actual results. The real distributions are far more complicated, but some of the ML parameters have similarly unlikely p-values in their respective posterior distributions. Note that some of my parameters are bounded (e.g. $0 leq theta_1 leq 1$); within the bounds, the priors are always uniform.
My questions are:
Is such a deviation a problem per se? Obviously I do not expect the ML parameters to exactly coincide which the maxima of each of their marginalized posterior distributions, but intuitively it feels like they should also not be found deep in the tails. Does this deviation automatically invalidate my results?
Whether this is necessarily problematic or not, could it be symptomatic of specific pathologies at some stage of the data analysis? For example, is it possible to make any general statement about whether such a deviation could be induced by an improperly converged chain, an incorrect model, or excessively tight bounds on the parameters?
bayesian maximum-likelihood optimization inference mcmc
New contributor
$endgroup$
add a comment |
$begingroup$
I have a likelihood function $mathcal{L}(d | theta)$ for the probability of my data $d$ given some model parameters $theta in mathbf{R}^N$, which I would like to estimate. Assuming flat priors on the parameters, the likelihood is proportional to the posterior probability. I use an MCMC method to sample this probability.
Looking at the resulting converged chain, I find that the maximum likelihood parameters are not consistent with the posterior distributions. For example, the marginalized posterior probability distribution for one of the parameters might be $theta_0 sim N(mu=0, sigma^2=1)$, while the value of $theta_0$ at the maximum likelihood point is $theta_0^{ML} approx 4$, essentially being almost the maximum value of $theta_0$ traversed by the MCMC sampler.
This is an illustrative example, not my actual results. The real distributions are far more complicated, but some of the ML parameters have similarly unlikely p-values in their respective posterior distributions. Note that some of my parameters are bounded (e.g. $0 leq theta_1 leq 1$); within the bounds, the priors are always uniform.
My questions are:
Is such a deviation a problem per se? Obviously I do not expect the ML parameters to exactly coincide which the maxima of each of their marginalized posterior distributions, but intuitively it feels like they should also not be found deep in the tails. Does this deviation automatically invalidate my results?
Whether this is necessarily problematic or not, could it be symptomatic of specific pathologies at some stage of the data analysis? For example, is it possible to make any general statement about whether such a deviation could be induced by an improperly converged chain, an incorrect model, or excessively tight bounds on the parameters?
bayesian maximum-likelihood optimization inference mcmc
New contributor
$endgroup$
I have a likelihood function $mathcal{L}(d | theta)$ for the probability of my data $d$ given some model parameters $theta in mathbf{R}^N$, which I would like to estimate. Assuming flat priors on the parameters, the likelihood is proportional to the posterior probability. I use an MCMC method to sample this probability.
Looking at the resulting converged chain, I find that the maximum likelihood parameters are not consistent with the posterior distributions. For example, the marginalized posterior probability distribution for one of the parameters might be $theta_0 sim N(mu=0, sigma^2=1)$, while the value of $theta_0$ at the maximum likelihood point is $theta_0^{ML} approx 4$, essentially being almost the maximum value of $theta_0$ traversed by the MCMC sampler.
This is an illustrative example, not my actual results. The real distributions are far more complicated, but some of the ML parameters have similarly unlikely p-values in their respective posterior distributions. Note that some of my parameters are bounded (e.g. $0 leq theta_1 leq 1$); within the bounds, the priors are always uniform.
My questions are:
Is such a deviation a problem per se? Obviously I do not expect the ML parameters to exactly coincide which the maxima of each of their marginalized posterior distributions, but intuitively it feels like they should also not be found deep in the tails. Does this deviation automatically invalidate my results?
Whether this is necessarily problematic or not, could it be symptomatic of specific pathologies at some stage of the data analysis? For example, is it possible to make any general statement about whether such a deviation could be induced by an improperly converged chain, an incorrect model, or excessively tight bounds on the parameters?
bayesian maximum-likelihood optimization inference mcmc
bayesian maximum-likelihood optimization inference mcmc
New contributor
New contributor
New contributor
asked yesterday
mgc70mgc70
362
362
New contributor
New contributor
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Some possible generic explanations for this perceived discrepancy, assuming of course there is no issue with code or likelihood definition or MCMC implementation or number of MCMC iterations or convergence of the likelihood maximiser (thanks, Jacob Socolar):
in large dimensions $N$, the posterior does not concentrate on the
maximum but something of a distance of order $sqrt{N}$ from the
mode, meaning that the largest values of the likelihood function
encountered by an MCMC sampler are often quite below the value of
the likelihood at its maximum. For instance, if the posterior is $theta|mathbf xsimmathcal N_N(0,I_N)$, $theta$ is at least at a distance $N-2sqrt{2N}$ from the mode, $0$.While the MAP and the MLE are indeed confounded under a flat prior, the
marginal densities of the different parameters of the model may have (marginal) modes
that are far away from the corresponding MLEs (i.e., MAPs).The MAP is a position
in the parameter space where the posterior density is highest but
this does not convey any indication of posterior weight or volume
for neighbourhoods of the MAP. A very thin spike carries no posterior weight. This is also the reason why MCMC exploration of a posterior may face difficulties in identifying the posterior mode.The fact that most parameters are bounded may lead to some
components of the MAP=MLE occurring at a boundary.
See, e.g., Druihlet and Marin (2007) for arguments on the un-Bayesian nature of MAP estimators.
As an example of point 1 above, here is a short R code
N=100
T=1e4
lik=dis=rep(0,T)
mu=rmvnorm(1,mean=rep(0,N))
xobs=rmvnorm(1,mean=rep(0,N))
lik[1]=dmvnorm(xobs,mu,log=TRUE)
dis[1]=(xobs-mu)%*%t(xobs-mu)
for (t in 2:T){
prop=rmvnorm(1,mean=mu,sigma=diag(1/N,N))
proike=dmvnorm(xobs,prop,log=TRUE)
if (log(runif(1))<proike-lik[t-1]){
mu=prop;lik[t]=proike
}else{lik[t]=lik[t-1]}
dis[t]=(xobs-mu)%*%t(xobs-mu)}
which mimics a random-walk Metropolis-Hastings sequence in dimension N=100. The value of the log-likelihood at the MAP is -91.89, but the visited likelihoods never come close:
> range(lik)
[1] -183.9515 -126.6924
which is explained by the fact that the sequence never comes near the observation:
> range(dis)
[1] 69.59714 184.11525
$endgroup$
1
$begingroup$
I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
$endgroup$
– Jacob Socolar
yesterday
add a comment |
$begingroup$
With flat priors, the posterior is identical to the likelihood up to a constant. Thus
MLE (estimated with an optimizer) should be identical to the MAP (maximum a posteriori value = multivariate mode of the posterior, estimated with MCMC). If you don't get the same value, you have a problem with your sampler or optimiser.
For complex models, it is very common the marginal modes are different from the MAP. This happens, for example, if correlations between parameters are nonlinear. This is perfectly fine, but marginal modes should therefore not be interpreted as the points of highest posterior density, and not be compared to the MLE.
In your specific case, however, I suspect that the posterior runs against the prior boundary. In this case, the posterior will be strongly asymmetric, and it doesn't make sense to interpret it in terms of mean, sd. There is no principle problem with this situation, but in practice it often hints towards model misspecification, or poorly chosen priors.
$endgroup$
$begingroup$
Ah, sorry for the almost identical answer, typed in parallel!
$endgroup$
– Xi'an
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
mgc70 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401349%2fmaximum-likelihood-parameters-deviate-from-posterior-distributions%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Some possible generic explanations for this perceived discrepancy, assuming of course there is no issue with code or likelihood definition or MCMC implementation or number of MCMC iterations or convergence of the likelihood maximiser (thanks, Jacob Socolar):
in large dimensions $N$, the posterior does not concentrate on the
maximum but something of a distance of order $sqrt{N}$ from the
mode, meaning that the largest values of the likelihood function
encountered by an MCMC sampler are often quite below the value of
the likelihood at its maximum. For instance, if the posterior is $theta|mathbf xsimmathcal N_N(0,I_N)$, $theta$ is at least at a distance $N-2sqrt{2N}$ from the mode, $0$.While the MAP and the MLE are indeed confounded under a flat prior, the
marginal densities of the different parameters of the model may have (marginal) modes
that are far away from the corresponding MLEs (i.e., MAPs).The MAP is a position
in the parameter space where the posterior density is highest but
this does not convey any indication of posterior weight or volume
for neighbourhoods of the MAP. A very thin spike carries no posterior weight. This is also the reason why MCMC exploration of a posterior may face difficulties in identifying the posterior mode.The fact that most parameters are bounded may lead to some
components of the MAP=MLE occurring at a boundary.
See, e.g., Druihlet and Marin (2007) for arguments on the un-Bayesian nature of MAP estimators.
As an example of point 1 above, here is a short R code
N=100
T=1e4
lik=dis=rep(0,T)
mu=rmvnorm(1,mean=rep(0,N))
xobs=rmvnorm(1,mean=rep(0,N))
lik[1]=dmvnorm(xobs,mu,log=TRUE)
dis[1]=(xobs-mu)%*%t(xobs-mu)
for (t in 2:T){
prop=rmvnorm(1,mean=mu,sigma=diag(1/N,N))
proike=dmvnorm(xobs,prop,log=TRUE)
if (log(runif(1))<proike-lik[t-1]){
mu=prop;lik[t]=proike
}else{lik[t]=lik[t-1]}
dis[t]=(xobs-mu)%*%t(xobs-mu)}
which mimics a random-walk Metropolis-Hastings sequence in dimension N=100. The value of the log-likelihood at the MAP is -91.89, but the visited likelihoods never come close:
> range(lik)
[1] -183.9515 -126.6924
which is explained by the fact that the sequence never comes near the observation:
> range(dis)
[1] 69.59714 184.11525
$endgroup$
1
$begingroup$
I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
$endgroup$
– Jacob Socolar
yesterday
add a comment |
$begingroup$
Some possible generic explanations for this perceived discrepancy, assuming of course there is no issue with code or likelihood definition or MCMC implementation or number of MCMC iterations or convergence of the likelihood maximiser (thanks, Jacob Socolar):
in large dimensions $N$, the posterior does not concentrate on the
maximum but something of a distance of order $sqrt{N}$ from the
mode, meaning that the largest values of the likelihood function
encountered by an MCMC sampler are often quite below the value of
the likelihood at its maximum. For instance, if the posterior is $theta|mathbf xsimmathcal N_N(0,I_N)$, $theta$ is at least at a distance $N-2sqrt{2N}$ from the mode, $0$.While the MAP and the MLE are indeed confounded under a flat prior, the
marginal densities of the different parameters of the model may have (marginal) modes
that are far away from the corresponding MLEs (i.e., MAPs).The MAP is a position
in the parameter space where the posterior density is highest but
this does not convey any indication of posterior weight or volume
for neighbourhoods of the MAP. A very thin spike carries no posterior weight. This is also the reason why MCMC exploration of a posterior may face difficulties in identifying the posterior mode.The fact that most parameters are bounded may lead to some
components of the MAP=MLE occurring at a boundary.
See, e.g., Druihlet and Marin (2007) for arguments on the un-Bayesian nature of MAP estimators.
As an example of point 1 above, here is a short R code
N=100
T=1e4
lik=dis=rep(0,T)
mu=rmvnorm(1,mean=rep(0,N))
xobs=rmvnorm(1,mean=rep(0,N))
lik[1]=dmvnorm(xobs,mu,log=TRUE)
dis[1]=(xobs-mu)%*%t(xobs-mu)
for (t in 2:T){
prop=rmvnorm(1,mean=mu,sigma=diag(1/N,N))
proike=dmvnorm(xobs,prop,log=TRUE)
if (log(runif(1))<proike-lik[t-1]){
mu=prop;lik[t]=proike
}else{lik[t]=lik[t-1]}
dis[t]=(xobs-mu)%*%t(xobs-mu)}
which mimics a random-walk Metropolis-Hastings sequence in dimension N=100. The value of the log-likelihood at the MAP is -91.89, but the visited likelihoods never come close:
> range(lik)
[1] -183.9515 -126.6924
which is explained by the fact that the sequence never comes near the observation:
> range(dis)
[1] 69.59714 184.11525
$endgroup$
1
$begingroup$
I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
$endgroup$
– Jacob Socolar
yesterday
add a comment |
$begingroup$
Some possible generic explanations for this perceived discrepancy, assuming of course there is no issue with code or likelihood definition or MCMC implementation or number of MCMC iterations or convergence of the likelihood maximiser (thanks, Jacob Socolar):
in large dimensions $N$, the posterior does not concentrate on the
maximum but something of a distance of order $sqrt{N}$ from the
mode, meaning that the largest values of the likelihood function
encountered by an MCMC sampler are often quite below the value of
the likelihood at its maximum. For instance, if the posterior is $theta|mathbf xsimmathcal N_N(0,I_N)$, $theta$ is at least at a distance $N-2sqrt{2N}$ from the mode, $0$.While the MAP and the MLE are indeed confounded under a flat prior, the
marginal densities of the different parameters of the model may have (marginal) modes
that are far away from the corresponding MLEs (i.e., MAPs).The MAP is a position
in the parameter space where the posterior density is highest but
this does not convey any indication of posterior weight or volume
for neighbourhoods of the MAP. A very thin spike carries no posterior weight. This is also the reason why MCMC exploration of a posterior may face difficulties in identifying the posterior mode.The fact that most parameters are bounded may lead to some
components of the MAP=MLE occurring at a boundary.
See, e.g., Druihlet and Marin (2007) for arguments on the un-Bayesian nature of MAP estimators.
As an example of point 1 above, here is a short R code
N=100
T=1e4
lik=dis=rep(0,T)
mu=rmvnorm(1,mean=rep(0,N))
xobs=rmvnorm(1,mean=rep(0,N))
lik[1]=dmvnorm(xobs,mu,log=TRUE)
dis[1]=(xobs-mu)%*%t(xobs-mu)
for (t in 2:T){
prop=rmvnorm(1,mean=mu,sigma=diag(1/N,N))
proike=dmvnorm(xobs,prop,log=TRUE)
if (log(runif(1))<proike-lik[t-1]){
mu=prop;lik[t]=proike
}else{lik[t]=lik[t-1]}
dis[t]=(xobs-mu)%*%t(xobs-mu)}
which mimics a random-walk Metropolis-Hastings sequence in dimension N=100. The value of the log-likelihood at the MAP is -91.89, but the visited likelihoods never come close:
> range(lik)
[1] -183.9515 -126.6924
which is explained by the fact that the sequence never comes near the observation:
> range(dis)
[1] 69.59714 184.11525
$endgroup$
Some possible generic explanations for this perceived discrepancy, assuming of course there is no issue with code or likelihood definition or MCMC implementation or number of MCMC iterations or convergence of the likelihood maximiser (thanks, Jacob Socolar):
in large dimensions $N$, the posterior does not concentrate on the
maximum but something of a distance of order $sqrt{N}$ from the
mode, meaning that the largest values of the likelihood function
encountered by an MCMC sampler are often quite below the value of
the likelihood at its maximum. For instance, if the posterior is $theta|mathbf xsimmathcal N_N(0,I_N)$, $theta$ is at least at a distance $N-2sqrt{2N}$ from the mode, $0$.While the MAP and the MLE are indeed confounded under a flat prior, the
marginal densities of the different parameters of the model may have (marginal) modes
that are far away from the corresponding MLEs (i.e., MAPs).The MAP is a position
in the parameter space where the posterior density is highest but
this does not convey any indication of posterior weight or volume
for neighbourhoods of the MAP. A very thin spike carries no posterior weight. This is also the reason why MCMC exploration of a posterior may face difficulties in identifying the posterior mode.The fact that most parameters are bounded may lead to some
components of the MAP=MLE occurring at a boundary.
See, e.g., Druihlet and Marin (2007) for arguments on the un-Bayesian nature of MAP estimators.
As an example of point 1 above, here is a short R code
N=100
T=1e4
lik=dis=rep(0,T)
mu=rmvnorm(1,mean=rep(0,N))
xobs=rmvnorm(1,mean=rep(0,N))
lik[1]=dmvnorm(xobs,mu,log=TRUE)
dis[1]=(xobs-mu)%*%t(xobs-mu)
for (t in 2:T){
prop=rmvnorm(1,mean=mu,sigma=diag(1/N,N))
proike=dmvnorm(xobs,prop,log=TRUE)
if (log(runif(1))<proike-lik[t-1]){
mu=prop;lik[t]=proike
}else{lik[t]=lik[t-1]}
dis[t]=(xobs-mu)%*%t(xobs-mu)}
which mimics a random-walk Metropolis-Hastings sequence in dimension N=100. The value of the log-likelihood at the MAP is -91.89, but the visited likelihoods never come close:
> range(lik)
[1] -183.9515 -126.6924
which is explained by the fact that the sequence never comes near the observation:
> range(dis)
[1] 69.59714 184.11525
edited 18 hours ago
answered yesterday
Xi'anXi'an
59.2k897366
59.2k897366
1
$begingroup$
I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
$endgroup$
– Jacob Socolar
yesterday
add a comment |
1
$begingroup$
I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
$endgroup$
– Jacob Socolar
yesterday
1
1
$begingroup$
I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
$endgroup$
– Jacob Socolar
yesterday
$begingroup$
I'd just add that in addition to worrying about the code or likelihood definition or MCMC implementation, the OP might also worry about whether the software used to obtain the ML estimate got trapped in a local optimum. stats.stackexchange.com/questions/384528/…
$endgroup$
– Jacob Socolar
yesterday
add a comment |
$begingroup$
With flat priors, the posterior is identical to the likelihood up to a constant. Thus
MLE (estimated with an optimizer) should be identical to the MAP (maximum a posteriori value = multivariate mode of the posterior, estimated with MCMC). If you don't get the same value, you have a problem with your sampler or optimiser.
For complex models, it is very common the marginal modes are different from the MAP. This happens, for example, if correlations between parameters are nonlinear. This is perfectly fine, but marginal modes should therefore not be interpreted as the points of highest posterior density, and not be compared to the MLE.
In your specific case, however, I suspect that the posterior runs against the prior boundary. In this case, the posterior will be strongly asymmetric, and it doesn't make sense to interpret it in terms of mean, sd. There is no principle problem with this situation, but in practice it often hints towards model misspecification, or poorly chosen priors.
$endgroup$
$begingroup$
Ah, sorry for the almost identical answer, typed in parallel!
$endgroup$
– Xi'an
yesterday
add a comment |
$begingroup$
With flat priors, the posterior is identical to the likelihood up to a constant. Thus
MLE (estimated with an optimizer) should be identical to the MAP (maximum a posteriori value = multivariate mode of the posterior, estimated with MCMC). If you don't get the same value, you have a problem with your sampler or optimiser.
For complex models, it is very common the marginal modes are different from the MAP. This happens, for example, if correlations between parameters are nonlinear. This is perfectly fine, but marginal modes should therefore not be interpreted as the points of highest posterior density, and not be compared to the MLE.
In your specific case, however, I suspect that the posterior runs against the prior boundary. In this case, the posterior will be strongly asymmetric, and it doesn't make sense to interpret it in terms of mean, sd. There is no principle problem with this situation, but in practice it often hints towards model misspecification, or poorly chosen priors.
$endgroup$
$begingroup$
Ah, sorry for the almost identical answer, typed in parallel!
$endgroup$
– Xi'an
yesterday
add a comment |
$begingroup$
With flat priors, the posterior is identical to the likelihood up to a constant. Thus
MLE (estimated with an optimizer) should be identical to the MAP (maximum a posteriori value = multivariate mode of the posterior, estimated with MCMC). If you don't get the same value, you have a problem with your sampler or optimiser.
For complex models, it is very common the marginal modes are different from the MAP. This happens, for example, if correlations between parameters are nonlinear. This is perfectly fine, but marginal modes should therefore not be interpreted as the points of highest posterior density, and not be compared to the MLE.
In your specific case, however, I suspect that the posterior runs against the prior boundary. In this case, the posterior will be strongly asymmetric, and it doesn't make sense to interpret it in terms of mean, sd. There is no principle problem with this situation, but in practice it often hints towards model misspecification, or poorly chosen priors.
$endgroup$
With flat priors, the posterior is identical to the likelihood up to a constant. Thus
MLE (estimated with an optimizer) should be identical to the MAP (maximum a posteriori value = multivariate mode of the posterior, estimated with MCMC). If you don't get the same value, you have a problem with your sampler or optimiser.
For complex models, it is very common the marginal modes are different from the MAP. This happens, for example, if correlations between parameters are nonlinear. This is perfectly fine, but marginal modes should therefore not be interpreted as the points of highest posterior density, and not be compared to the MLE.
In your specific case, however, I suspect that the posterior runs against the prior boundary. In this case, the posterior will be strongly asymmetric, and it doesn't make sense to interpret it in terms of mean, sd. There is no principle problem with this situation, but in practice it often hints towards model misspecification, or poorly chosen priors.
answered yesterday
Florian HartigFlorian Hartig
4,2091323
4,2091323
$begingroup$
Ah, sorry for the almost identical answer, typed in parallel!
$endgroup$
– Xi'an
yesterday
add a comment |
$begingroup$
Ah, sorry for the almost identical answer, typed in parallel!
$endgroup$
– Xi'an
yesterday
$begingroup$
Ah, sorry for the almost identical answer, typed in parallel!
$endgroup$
– Xi'an
yesterday
$begingroup$
Ah, sorry for the almost identical answer, typed in parallel!
$endgroup$
– Xi'an
yesterday
add a comment |
mgc70 is a new contributor. Be nice, and check out our Code of Conduct.
mgc70 is a new contributor. Be nice, and check out our Code of Conduct.
mgc70 is a new contributor. Be nice, and check out our Code of Conduct.
mgc70 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401349%2fmaximum-likelihood-parameters-deviate-from-posterior-distributions%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown