Why Normality assumption in linear regressionProbability of x given past data and linear model...
Can you tell from a blurry photo if focus was too close or too far?
Publishing research using outdated methods
Could a phylactery of a lich be a mirror or does it have to be a box?
Do authors have to be politically correct in article-writing?
How to avoid being sexist when trying to employ someone to function in a very sexist environment?
How to count the characters of jar files by wc
Can I write a book of my D&D game?
Does SQL Server 2017, including older versions, support 8k disk sector sizes?
Why exactly do action photographers need high fps burst cameras?
How can my powered armor quickly replace its ceramic plates?
How to prevent users from executing commands through browser URL
If I delete my router's history can my ISP still provide it to my parents?
How would an AI self awareness kill switch work?
Why do no American passenger airlines still operate dedicated cargo flights?
Why Normality assumption in linear regression
Why did other German political parties disband so fast when Hitler was appointed chancellor?
Would a National Army of mercenaries be a feasible idea?
In Linux what happens if 1000 files in a directory are moved to another location while another 300 files were added to the source directory?
One Half of Ten; A Riddle
Why has the mole been redefined for 2019?
Is it a fallacy if someone claims they need an explanation for every word of your argument to the point where they don't understand common terms?
What are "industrial chops"?
Can I string the D&D Starter Set campaign into another module, keeping the same characters?
Early credit roll before the end of the film
Why Normality assumption in linear regression
Probability of x given past data and linear model assumptionNormality assumption in linear regressionIs it necessary to plot histogram of dependent variable before running simple linear regression?Assumptions behind simple linear regression modelOLS vs. maximum likelihood under Normal distribution in linear regressionfrom where the error in target variable comes in linear regressionWhy linear regression has assumption on residual but generalized linear model has assumptions on response?Distribution of $(n-2)MSres/sigma^2$ in simple linear regressionHomoscedasticity assumption in simple linear regressionWhat if the Error is Not Normal in Linear Regression?
$begingroup$
My question is very simple: why we choose normal as the distribution that error term follows in the assumption of linear regression? Why we don't choose others like uniform, t or whatever?
regression mathematical-statistics normal-distribution error linear
$endgroup$
add a comment |
$begingroup$
My question is very simple: why we choose normal as the distribution that error term follows in the assumption of linear regression? Why we don't choose others like uniform, t or whatever?
regression mathematical-statistics normal-distribution error linear
$endgroup$
$begingroup$
We don't choose the normal assumption. It just happens to be the case that when the error is normal, the model coefficients exactly follow a normal distribution and an exact F-test can be used to test hypotheses about them.
$endgroup$
– AdamO
1 hour ago
$begingroup$
Because the math works out easily enough that people could use it before modern computers.
$endgroup$
– Nat
1 hour ago
add a comment |
$begingroup$
My question is very simple: why we choose normal as the distribution that error term follows in the assumption of linear regression? Why we don't choose others like uniform, t or whatever?
regression mathematical-statistics normal-distribution error linear
$endgroup$
My question is very simple: why we choose normal as the distribution that error term follows in the assumption of linear regression? Why we don't choose others like uniform, t or whatever?
regression mathematical-statistics normal-distribution error linear
regression mathematical-statistics normal-distribution error linear
asked 2 hours ago
Master ShiMaster Shi
161
161
$begingroup$
We don't choose the normal assumption. It just happens to be the case that when the error is normal, the model coefficients exactly follow a normal distribution and an exact F-test can be used to test hypotheses about them.
$endgroup$
– AdamO
1 hour ago
$begingroup$
Because the math works out easily enough that people could use it before modern computers.
$endgroup$
– Nat
1 hour ago
add a comment |
$begingroup$
We don't choose the normal assumption. It just happens to be the case that when the error is normal, the model coefficients exactly follow a normal distribution and an exact F-test can be used to test hypotheses about them.
$endgroup$
– AdamO
1 hour ago
$begingroup$
Because the math works out easily enough that people could use it before modern computers.
$endgroup$
– Nat
1 hour ago
$begingroup$
We don't choose the normal assumption. It just happens to be the case that when the error is normal, the model coefficients exactly follow a normal distribution and an exact F-test can be used to test hypotheses about them.
$endgroup$
– AdamO
1 hour ago
$begingroup$
We don't choose the normal assumption. It just happens to be the case that when the error is normal, the model coefficients exactly follow a normal distribution and an exact F-test can be used to test hypotheses about them.
$endgroup$
– AdamO
1 hour ago
$begingroup$
Because the math works out easily enough that people could use it before modern computers.
$endgroup$
– Nat
1 hour ago
$begingroup$
Because the math works out easily enough that people could use it before modern computers.
$endgroup$
– Nat
1 hour ago
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
You can choose another error distribution; they basically just change the loss function.
This is certainly done.
Laplace (double exponential errors) correspond to least absolute deviations regression/$L_1$ regression (which numerous posts on site discuss). Regressions with t-errors are occasionally used (in some cases because they're more robust to gross errors), though they can have a disadvantage -- the likelihood (and therefore the negative of the loss) can have multiple modes.
Uniform errors correspond to an $L_infty$ loss (minimize the maximum deviation); such regression is sometimes called Chebyshev approximation (though beware, since there's another thing with essentially the same name). Again, this is sometimes done (indeed for simple regression and smallish data sets with bounded errors with constant spread the fit is often easy enough to find by hand, directly on a plot, though in practice you can use linear programming methods, or other algorithms; indeed, $L_infty$ and $L_1$ regression problems are duals of each other, which can lead to sometimes convenient shortcuts for some problems).
Many other choices are possible and quite a few have been used in practice.
[Note that if you have additive, independent, constant-spread errors with a density of the form $k,exp(-c.g(varepsilon))$, maximizing the likelihood will correspond to minimizing $sum_i g(e_i)$, where $e_i$ is the $i$th residual.]
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f395011%2fwhy-normality-assumption-in-linear-regression%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You can choose another error distribution; they basically just change the loss function.
This is certainly done.
Laplace (double exponential errors) correspond to least absolute deviations regression/$L_1$ regression (which numerous posts on site discuss). Regressions with t-errors are occasionally used (in some cases because they're more robust to gross errors), though they can have a disadvantage -- the likelihood (and therefore the negative of the loss) can have multiple modes.
Uniform errors correspond to an $L_infty$ loss (minimize the maximum deviation); such regression is sometimes called Chebyshev approximation (though beware, since there's another thing with essentially the same name). Again, this is sometimes done (indeed for simple regression and smallish data sets with bounded errors with constant spread the fit is often easy enough to find by hand, directly on a plot, though in practice you can use linear programming methods, or other algorithms; indeed, $L_infty$ and $L_1$ regression problems are duals of each other, which can lead to sometimes convenient shortcuts for some problems).
Many other choices are possible and quite a few have been used in practice.
[Note that if you have additive, independent, constant-spread errors with a density of the form $k,exp(-c.g(varepsilon))$, maximizing the likelihood will correspond to minimizing $sum_i g(e_i)$, where $e_i$ is the $i$th residual.]
$endgroup$
add a comment |
$begingroup$
You can choose another error distribution; they basically just change the loss function.
This is certainly done.
Laplace (double exponential errors) correspond to least absolute deviations regression/$L_1$ regression (which numerous posts on site discuss). Regressions with t-errors are occasionally used (in some cases because they're more robust to gross errors), though they can have a disadvantage -- the likelihood (and therefore the negative of the loss) can have multiple modes.
Uniform errors correspond to an $L_infty$ loss (minimize the maximum deviation); such regression is sometimes called Chebyshev approximation (though beware, since there's another thing with essentially the same name). Again, this is sometimes done (indeed for simple regression and smallish data sets with bounded errors with constant spread the fit is often easy enough to find by hand, directly on a plot, though in practice you can use linear programming methods, or other algorithms; indeed, $L_infty$ and $L_1$ regression problems are duals of each other, which can lead to sometimes convenient shortcuts for some problems).
Many other choices are possible and quite a few have been used in practice.
[Note that if you have additive, independent, constant-spread errors with a density of the form $k,exp(-c.g(varepsilon))$, maximizing the likelihood will correspond to minimizing $sum_i g(e_i)$, where $e_i$ is the $i$th residual.]
$endgroup$
add a comment |
$begingroup$
You can choose another error distribution; they basically just change the loss function.
This is certainly done.
Laplace (double exponential errors) correspond to least absolute deviations regression/$L_1$ regression (which numerous posts on site discuss). Regressions with t-errors are occasionally used (in some cases because they're more robust to gross errors), though they can have a disadvantage -- the likelihood (and therefore the negative of the loss) can have multiple modes.
Uniform errors correspond to an $L_infty$ loss (minimize the maximum deviation); such regression is sometimes called Chebyshev approximation (though beware, since there's another thing with essentially the same name). Again, this is sometimes done (indeed for simple regression and smallish data sets with bounded errors with constant spread the fit is often easy enough to find by hand, directly on a plot, though in practice you can use linear programming methods, or other algorithms; indeed, $L_infty$ and $L_1$ regression problems are duals of each other, which can lead to sometimes convenient shortcuts for some problems).
Many other choices are possible and quite a few have been used in practice.
[Note that if you have additive, independent, constant-spread errors with a density of the form $k,exp(-c.g(varepsilon))$, maximizing the likelihood will correspond to minimizing $sum_i g(e_i)$, where $e_i$ is the $i$th residual.]
$endgroup$
You can choose another error distribution; they basically just change the loss function.
This is certainly done.
Laplace (double exponential errors) correspond to least absolute deviations regression/$L_1$ regression (which numerous posts on site discuss). Regressions with t-errors are occasionally used (in some cases because they're more robust to gross errors), though they can have a disadvantage -- the likelihood (and therefore the negative of the loss) can have multiple modes.
Uniform errors correspond to an $L_infty$ loss (minimize the maximum deviation); such regression is sometimes called Chebyshev approximation (though beware, since there's another thing with essentially the same name). Again, this is sometimes done (indeed for simple regression and smallish data sets with bounded errors with constant spread the fit is often easy enough to find by hand, directly on a plot, though in practice you can use linear programming methods, or other algorithms; indeed, $L_infty$ and $L_1$ regression problems are duals of each other, which can lead to sometimes convenient shortcuts for some problems).
Many other choices are possible and quite a few have been used in practice.
[Note that if you have additive, independent, constant-spread errors with a density of the form $k,exp(-c.g(varepsilon))$, maximizing the likelihood will correspond to minimizing $sum_i g(e_i)$, where $e_i$ is the $i$th residual.]
edited 20 mins ago
answered 2 hours ago
Glen_b♦Glen_b
212k22409758
212k22409758
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f395011%2fwhy-normality-assumption-in-linear-regression%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
We don't choose the normal assumption. It just happens to be the case that when the error is normal, the model coefficients exactly follow a normal distribution and an exact F-test can be used to test hypotheses about them.
$endgroup$
– AdamO
1 hour ago
$begingroup$
Because the math works out easily enough that people could use it before modern computers.
$endgroup$
– Nat
1 hour ago