why did the subset and factor influenced coefficients of logistic regression in R Announcing...
Flight departed from the gate 5 min before scheduled departure time. Refund options
Printing attributes of selection in ArcPy?
How does light 'choose' between wave and particle behaviour?
Why are vacuum tubes still used in amateur radios?
Central Vacuuming: Is it worth it, and how does it compare to normal vacuuming?
Can two person see the same photon?
"klopfte jemand" or "jemand klopfte"?
As a dual citizen, my US passport will expire one day after traveling to the US. Will this work?
How to ternary Plot3D a function
Sally's older brother
Where is the Next Backup Size entry on iOS 12?
If Windows 7 doesn't support WSL, then what is "Subsystem for UNIX-based Applications"?
Does the Black Tentacles spell do damage twice at the start of turn to an already restrained creature?
How can a team of shapeshifters communicate?
What is the chair depicted in Cesare Maccari's 1889 painting "Cicerone denuncia Catilina"?
How to force a browser when connecting to a specific domain to be https only using only the client machine?
Did any compiler fully use 80-bit floating point?
Monty Hall Problem-Probability Paradox
Differences to CCompactSize and CVarInt
Test print coming out spongy
Why is it faster to reheat something than it is to cook it?
two integers one line calculator
Special flights
What is a more techy Technical Writer job title that isn't cutesy or confusing?
why did the subset and factor influenced coefficients of logistic regression in R
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)Omitted variable bias in logistic regression vs. omitted variable bias in ordinary least squares regressionComparing ways to create a composite scorebinomial GLM output hugely affected by a factor level with all zerosglm in R - which pvalue represents the goodness of fit of entire model?How to fit a glm with sum to zero constraints in R (no reference level)Why do different negative binomial regression functions produce different coefficients, p-valuesinterpretation of random effects in GLMMInterpreting odds ratio of multiple comparisons from a logistic regression model (using R)Test for effects of categorical variables on a binary response variable considering their interactions?Comparison of two odds ratios: Take 2Negative Binomial Regression Coefficients and Std. Errors in RSensitivity and Specificity of gaussian and negative binomial glm family
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
$begingroup$
The coefficients changed a lot when I used all the factor levels versus when I limited to only one level of a factor as a subset of the data.
I am trying to do a logistic regression between the disease and contact exposure. There were several different sites, so I use the factor function (model:ml1).
I also tried to focus on only a specific site:WB to analyze the association, which site was used as the subset of the data (model:ml2).
ml1<-glm(disease~x+**factor(site)**+factor(anycontact) +factor(comecat), data=gianalysis_bd, family= binomial )
summary(ml1)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.44400 0.25761 -13.369 < 2e-16 ***
x 0.24559 0.08309 2.956 0.003121 **
factor(site)FB 0.03967 0.15177 0.261 0.793792
factor(site)GB -0.54896 0.16538 -3.319 0.000902 ***
factor(site)HB 0.39635 0.14699 2.696 0.007010 **
factor(site)SB -0.13887 0.14347 -0.968 0.333069
factor(site)WB -0.06200 0.14647 -0.423 0.672067
factor(site)WP -0.03706 0.15388 -0.241 0.809677
**factor(anycontact)1 0.40856** 0.06846 5.968 2.41e-09 ***
factor(comecat)2 0.02260 0.07184 0.315 0.753037
factor(comecat)3 0.11195 0.07574 1.478 0.139405
ml2<-glm(disease~x+factor(anycontact) +factor(comecat), data=gianalysis_bd, **subset=site=="WB"**, family= binomial )
summary(ml2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.4016 0.4347 -7.825 5.06e-15 ***
x 0.1421 0.1454 0.977 0.32834
**factor(anycontact)1 0.7380** 0.2590 2.850 0.00438 **
factor(comecat)2 -0.4049 0.2042 -1.983 0.04738 *
factor(comecat)3 0.1136 0.2182 0.520 0.60273
However, the coefficient of factor(anycontact) changed significantly, increasing from 0.4085 (ml1) to 0.7380. I could not tell why that happened (I think it should be the same in both the models). Can someone help to explain the difference between the two model and the reason? Thank you very much.
r logistic
$endgroup$
migrated from stackoverflow.com 8 hours ago
This question came from our site for professional and enthusiast programmers.
add a comment |
$begingroup$
The coefficients changed a lot when I used all the factor levels versus when I limited to only one level of a factor as a subset of the data.
I am trying to do a logistic regression between the disease and contact exposure. There were several different sites, so I use the factor function (model:ml1).
I also tried to focus on only a specific site:WB to analyze the association, which site was used as the subset of the data (model:ml2).
ml1<-glm(disease~x+**factor(site)**+factor(anycontact) +factor(comecat), data=gianalysis_bd, family= binomial )
summary(ml1)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.44400 0.25761 -13.369 < 2e-16 ***
x 0.24559 0.08309 2.956 0.003121 **
factor(site)FB 0.03967 0.15177 0.261 0.793792
factor(site)GB -0.54896 0.16538 -3.319 0.000902 ***
factor(site)HB 0.39635 0.14699 2.696 0.007010 **
factor(site)SB -0.13887 0.14347 -0.968 0.333069
factor(site)WB -0.06200 0.14647 -0.423 0.672067
factor(site)WP -0.03706 0.15388 -0.241 0.809677
**factor(anycontact)1 0.40856** 0.06846 5.968 2.41e-09 ***
factor(comecat)2 0.02260 0.07184 0.315 0.753037
factor(comecat)3 0.11195 0.07574 1.478 0.139405
ml2<-glm(disease~x+factor(anycontact) +factor(comecat), data=gianalysis_bd, **subset=site=="WB"**, family= binomial )
summary(ml2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.4016 0.4347 -7.825 5.06e-15 ***
x 0.1421 0.1454 0.977 0.32834
**factor(anycontact)1 0.7380** 0.2590 2.850 0.00438 **
factor(comecat)2 -0.4049 0.2042 -1.983 0.04738 *
factor(comecat)3 0.1136 0.2182 0.520 0.60273
However, the coefficient of factor(anycontact) changed significantly, increasing from 0.4085 (ml1) to 0.7380. I could not tell why that happened (I think it should be the same in both the models). Can someone help to explain the difference between the two model and the reason? Thank you very much.
r logistic
$endgroup$
migrated from stackoverflow.com 8 hours ago
This question came from our site for professional and enthusiast programmers.
$begingroup$
can you please reword the question to be clear that infact you're training 2 different models, one specifically for "WB" and another across all sites.
$endgroup$
– behold
8 hours ago
add a comment |
$begingroup$
The coefficients changed a lot when I used all the factor levels versus when I limited to only one level of a factor as a subset of the data.
I am trying to do a logistic regression between the disease and contact exposure. There were several different sites, so I use the factor function (model:ml1).
I also tried to focus on only a specific site:WB to analyze the association, which site was used as the subset of the data (model:ml2).
ml1<-glm(disease~x+**factor(site)**+factor(anycontact) +factor(comecat), data=gianalysis_bd, family= binomial )
summary(ml1)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.44400 0.25761 -13.369 < 2e-16 ***
x 0.24559 0.08309 2.956 0.003121 **
factor(site)FB 0.03967 0.15177 0.261 0.793792
factor(site)GB -0.54896 0.16538 -3.319 0.000902 ***
factor(site)HB 0.39635 0.14699 2.696 0.007010 **
factor(site)SB -0.13887 0.14347 -0.968 0.333069
factor(site)WB -0.06200 0.14647 -0.423 0.672067
factor(site)WP -0.03706 0.15388 -0.241 0.809677
**factor(anycontact)1 0.40856** 0.06846 5.968 2.41e-09 ***
factor(comecat)2 0.02260 0.07184 0.315 0.753037
factor(comecat)3 0.11195 0.07574 1.478 0.139405
ml2<-glm(disease~x+factor(anycontact) +factor(comecat), data=gianalysis_bd, **subset=site=="WB"**, family= binomial )
summary(ml2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.4016 0.4347 -7.825 5.06e-15 ***
x 0.1421 0.1454 0.977 0.32834
**factor(anycontact)1 0.7380** 0.2590 2.850 0.00438 **
factor(comecat)2 -0.4049 0.2042 -1.983 0.04738 *
factor(comecat)3 0.1136 0.2182 0.520 0.60273
However, the coefficient of factor(anycontact) changed significantly, increasing from 0.4085 (ml1) to 0.7380. I could not tell why that happened (I think it should be the same in both the models). Can someone help to explain the difference between the two model and the reason? Thank you very much.
r logistic
$endgroup$
The coefficients changed a lot when I used all the factor levels versus when I limited to only one level of a factor as a subset of the data.
I am trying to do a logistic regression between the disease and contact exposure. There were several different sites, so I use the factor function (model:ml1).
I also tried to focus on only a specific site:WB to analyze the association, which site was used as the subset of the data (model:ml2).
ml1<-glm(disease~x+**factor(site)**+factor(anycontact) +factor(comecat), data=gianalysis_bd, family= binomial )
summary(ml1)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.44400 0.25761 -13.369 < 2e-16 ***
x 0.24559 0.08309 2.956 0.003121 **
factor(site)FB 0.03967 0.15177 0.261 0.793792
factor(site)GB -0.54896 0.16538 -3.319 0.000902 ***
factor(site)HB 0.39635 0.14699 2.696 0.007010 **
factor(site)SB -0.13887 0.14347 -0.968 0.333069
factor(site)WB -0.06200 0.14647 -0.423 0.672067
factor(site)WP -0.03706 0.15388 -0.241 0.809677
**factor(anycontact)1 0.40856** 0.06846 5.968 2.41e-09 ***
factor(comecat)2 0.02260 0.07184 0.315 0.753037
factor(comecat)3 0.11195 0.07574 1.478 0.139405
ml2<-glm(disease~x+factor(anycontact) +factor(comecat), data=gianalysis_bd, **subset=site=="WB"**, family= binomial )
summary(ml2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.4016 0.4347 -7.825 5.06e-15 ***
x 0.1421 0.1454 0.977 0.32834
**factor(anycontact)1 0.7380** 0.2590 2.850 0.00438 **
factor(comecat)2 -0.4049 0.2042 -1.983 0.04738 *
factor(comecat)3 0.1136 0.2182 0.520 0.60273
However, the coefficient of factor(anycontact) changed significantly, increasing from 0.4085 (ml1) to 0.7380. I could not tell why that happened (I think it should be the same in both the models). Can someone help to explain the difference between the two model and the reason? Thank you very much.
r logistic
r logistic
edited 7 hours ago
EdM
22.6k23497
22.6k23497
asked 9 hours ago
bb wwbb ww
141
141
migrated from stackoverflow.com 8 hours ago
This question came from our site for professional and enthusiast programmers.
migrated from stackoverflow.com 8 hours ago
This question came from our site for professional and enthusiast programmers.
$begingroup$
can you please reword the question to be clear that infact you're training 2 different models, one specifically for "WB" and another across all sites.
$endgroup$
– behold
8 hours ago
add a comment |
$begingroup$
can you please reword the question to be clear that infact you're training 2 different models, one specifically for "WB" and another across all sites.
$endgroup$
– behold
8 hours ago
$begingroup$
can you please reword the question to be clear that infact you're training 2 different models, one specifically for "WB" and another across all sites.
$endgroup$
– behold
8 hours ago
$begingroup$
can you please reword the question to be clear that infact you're training 2 different models, one specifically for "WB" and another across all sites.
$endgroup$
– behold
8 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Without knowing more about the details of your data it's hard to say precisely what's going on in your case, but here are 2 possibilities.
First, omitting predictors in any regression model that are correlated with the included predictors can even go so far as to reverse the signs of the coefficients for the included predictors, as in Simpson's paradox.
Second, omitting any predictor related to outcome in models like logistic or Cox proportional hazards regression can lead to bias in coefficient values, even if it is not correlated with the included predictors. This answer provides an analytic demonstration for a similar approach, probit modeling.
In your example, not only did the coefficient for anycontact1 change from the full model when analysis was restricted to the subset, but so did the values and apparent significance of coefficients for x and factor(comecat)2. I suspect that the reasons for these differences lie in some combination of the correlations among these predictors and how they might change between the entire data set and the subset.
$endgroup$
add a comment |
$begingroup$
I think it makes sense for site "WB" specific model to be different from a model for all sites combined.
Looks like, in terms of sites, there are 3 combinations "HB", "GB" and "Not HB/GB".
Only HB and GB are significant with low p values.
I think if you run the regression for "Not HB/GB" it should yield you a model similar to what you fitted only for "WB". Can you try that and post?
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404133%2fwhy-did-the-subset-and-factor-influenced-coefficients-of-logistic-regression-in%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Without knowing more about the details of your data it's hard to say precisely what's going on in your case, but here are 2 possibilities.
First, omitting predictors in any regression model that are correlated with the included predictors can even go so far as to reverse the signs of the coefficients for the included predictors, as in Simpson's paradox.
Second, omitting any predictor related to outcome in models like logistic or Cox proportional hazards regression can lead to bias in coefficient values, even if it is not correlated with the included predictors. This answer provides an analytic demonstration for a similar approach, probit modeling.
In your example, not only did the coefficient for anycontact1 change from the full model when analysis was restricted to the subset, but so did the values and apparent significance of coefficients for x and factor(comecat)2. I suspect that the reasons for these differences lie in some combination of the correlations among these predictors and how they might change between the entire data set and the subset.
$endgroup$
add a comment |
$begingroup$
Without knowing more about the details of your data it's hard to say precisely what's going on in your case, but here are 2 possibilities.
First, omitting predictors in any regression model that are correlated with the included predictors can even go so far as to reverse the signs of the coefficients for the included predictors, as in Simpson's paradox.
Second, omitting any predictor related to outcome in models like logistic or Cox proportional hazards regression can lead to bias in coefficient values, even if it is not correlated with the included predictors. This answer provides an analytic demonstration for a similar approach, probit modeling.
In your example, not only did the coefficient for anycontact1 change from the full model when analysis was restricted to the subset, but so did the values and apparent significance of coefficients for x and factor(comecat)2. I suspect that the reasons for these differences lie in some combination of the correlations among these predictors and how they might change between the entire data set and the subset.
$endgroup$
add a comment |
$begingroup$
Without knowing more about the details of your data it's hard to say precisely what's going on in your case, but here are 2 possibilities.
First, omitting predictors in any regression model that are correlated with the included predictors can even go so far as to reverse the signs of the coefficients for the included predictors, as in Simpson's paradox.
Second, omitting any predictor related to outcome in models like logistic or Cox proportional hazards regression can lead to bias in coefficient values, even if it is not correlated with the included predictors. This answer provides an analytic demonstration for a similar approach, probit modeling.
In your example, not only did the coefficient for anycontact1 change from the full model when analysis was restricted to the subset, but so did the values and apparent significance of coefficients for x and factor(comecat)2. I suspect that the reasons for these differences lie in some combination of the correlations among these predictors and how they might change between the entire data set and the subset.
$endgroup$
Without knowing more about the details of your data it's hard to say precisely what's going on in your case, but here are 2 possibilities.
First, omitting predictors in any regression model that are correlated with the included predictors can even go so far as to reverse the signs of the coefficients for the included predictors, as in Simpson's paradox.
Second, omitting any predictor related to outcome in models like logistic or Cox proportional hazards regression can lead to bias in coefficient values, even if it is not correlated with the included predictors. This answer provides an analytic demonstration for a similar approach, probit modeling.
In your example, not only did the coefficient for anycontact1 change from the full model when analysis was restricted to the subset, but so did the values and apparent significance of coefficients for x and factor(comecat)2. I suspect that the reasons for these differences lie in some combination of the correlations among these predictors and how they might change between the entire data set and the subset.
answered 7 hours ago
EdMEdM
22.6k23497
22.6k23497
add a comment |
add a comment |
$begingroup$
I think it makes sense for site "WB" specific model to be different from a model for all sites combined.
Looks like, in terms of sites, there are 3 combinations "HB", "GB" and "Not HB/GB".
Only HB and GB are significant with low p values.
I think if you run the regression for "Not HB/GB" it should yield you a model similar to what you fitted only for "WB". Can you try that and post?
$endgroup$
add a comment |
$begingroup$
I think it makes sense for site "WB" specific model to be different from a model for all sites combined.
Looks like, in terms of sites, there are 3 combinations "HB", "GB" and "Not HB/GB".
Only HB and GB are significant with low p values.
I think if you run the regression for "Not HB/GB" it should yield you a model similar to what you fitted only for "WB". Can you try that and post?
$endgroup$
add a comment |
$begingroup$
I think it makes sense for site "WB" specific model to be different from a model for all sites combined.
Looks like, in terms of sites, there are 3 combinations "HB", "GB" and "Not HB/GB".
Only HB and GB are significant with low p values.
I think if you run the regression for "Not HB/GB" it should yield you a model similar to what you fitted only for "WB". Can you try that and post?
$endgroup$
I think it makes sense for site "WB" specific model to be different from a model for all sites combined.
Looks like, in terms of sites, there are 3 combinations "HB", "GB" and "Not HB/GB".
Only HB and GB are significant with low p values.
I think if you run the regression for "Not HB/GB" it should yield you a model similar to what you fitted only for "WB". Can you try that and post?
answered 8 hours ago
beholdbehold
36710
36710
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404133%2fwhy-did-the-subset-and-factor-influenced-coefficients-of-logistic-regression-in%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
can you please reword the question to be clear that infact you're training 2 different models, one specifically for "WB" and another across all sites.
$endgroup$
– behold
8 hours ago