How will Occam's Razor principle work in Machine learningHow exactly does a validation data-set work work in...
Why would the IRS ask for birth certificates or even audit a small tax return?
I can't die. Who am I?
Did Amazon pay $0 in taxes last year?
How spaceships determine each other's mass in space?
How do we objectively assess if a dialogue sounds unnatural or cringy?
Professor forcing me to attend a conference
Called into a meeting and told we are being made redundant (laid off) and "not to share outside". Can I tell my partner?
Why won't the strings command stop?
What is the oldest European royal house?
What does it mean when I add a new variable to my linear model and the R^2 stays the same?
Naming Characters after Friends/Family
Align equations with text before one of them
Why do phishing e-mails use faked e-mail addresses instead of the real one?
Can you run a ground wire from stove directly to ground pole in the ground
Convert an array of objects to array of the objects' values
Where do you go through passport control when transiting through another Schengen airport on your way out of the Schengen area?
How to write a chaotic neutral protagonist and prevent my readers from thinking they are evil?
What is a term for a function that when called repeatedly, has the same effect as calling once?
Python 3.6+ function to ask for a multiple-choice answer
Is this nominative case or accusative case?
An Undercover Army
Is being socially reclusive okay for a graduate student?
Create chunks from an array
Replacing tantalum capacitor with ceramic capacitor for Op Amps
How will Occam's Razor principle work in Machine learning
How exactly does a validation data-set work work in machine learning?Explaining machine learning modelsPredict the date an item will be sold using machine learningMachine learning learn to work well on future data distribution?How does a feature learning component workQ learning neural network experience replay problemHow Do Machine Learning Models Work and Remember?Why neural networks do not perform well on structured data?Will reinforcement learning work if states wont get repeated again?What are Machine learning model characteristics?
$begingroup$
The following question displayed in the image was asked during one of the exams recently. I am not sure if I have correctly understood the Occam's Razor principle or not. According to the distributions and decision boundaries given in the question and following the Occam's Razor the decision boundary B in both the cases should be the answer. Because as per Occam's Razor, choose the simpler classifier which does a decent job rather than the complex one.
Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning
machine-learning classification
New contributor
$endgroup$
add a comment |
$begingroup$
The following question displayed in the image was asked during one of the exams recently. I am not sure if I have correctly understood the Occam's Razor principle or not. According to the distributions and decision boundaries given in the question and following the Occam's Razor the decision boundary B in both the cases should be the answer. Because as per Occam's Razor, choose the simpler classifier which does a decent job rather than the complex one.
Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning
machine-learning classification
New contributor
$endgroup$
1
$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
18 hours ago
add a comment |
$begingroup$
The following question displayed in the image was asked during one of the exams recently. I am not sure if I have correctly understood the Occam's Razor principle or not. According to the distributions and decision boundaries given in the question and following the Occam's Razor the decision boundary B in both the cases should be the answer. Because as per Occam's Razor, choose the simpler classifier which does a decent job rather than the complex one.
Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning
machine-learning classification
New contributor
$endgroup$
The following question displayed in the image was asked during one of the exams recently. I am not sure if I have correctly understood the Occam's Razor principle or not. According to the distributions and decision boundaries given in the question and following the Occam's Razor the decision boundary B in both the cases should be the answer. Because as per Occam's Razor, choose the simpler classifier which does a decent job rather than the complex one.
Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning
machine-learning classification
machine-learning classification
New contributor
New contributor
New contributor
asked yesterday
user1479198user1479198
334
334
New contributor
New contributor
1
$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
18 hours ago
add a comment |
1
$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
18 hours ago
1
1
$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
18 hours ago
$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
18 hours ago
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Occam’s razor principle:
Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.
In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.
What if training error is not the same?
If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.
As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.
Relation to cross-validation
In many practical cases, when model progresses toward more complexity (larger explanation) to reach lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.
$endgroup$
add a comment |
$begingroup$
Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.
In above question one has to think in designing the simple separable boundaries,
like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)
similarly in figure 2 D2 answer is B.
New contributor
$endgroup$
add a comment |
$begingroup$
Occam’s razor in data-fitting tasks :
- First try linear equation
- If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.
D2
B
clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A
boundary is highly non-linear which seems like a jittered sine wave.
D1
However I am not sure about this one. A
boundary is like a circle and B
is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :
So I opt for a C
:-)
New contributor
$endgroup$
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
9 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
user1479198 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46831%2fhow-will-occams-razor-principle-work-in-machine-learning%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Occam’s razor principle:
Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.
In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.
What if training error is not the same?
If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.
As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.
Relation to cross-validation
In many practical cases, when model progresses toward more complexity (larger explanation) to reach lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.
$endgroup$
add a comment |
$begingroup$
Occam’s razor principle:
Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.
In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.
What if training error is not the same?
If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.
As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.
Relation to cross-validation
In many practical cases, when model progresses toward more complexity (larger explanation) to reach lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.
$endgroup$
add a comment |
$begingroup$
Occam’s razor principle:
Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.
In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.
What if training error is not the same?
If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.
As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.
Relation to cross-validation
In many practical cases, when model progresses toward more complexity (larger explanation) to reach lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.
$endgroup$
Occam’s razor principle:
Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.
In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.
What if training error is not the same?
If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.
As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.
Relation to cross-validation
In many practical cases, when model progresses toward more complexity (larger explanation) to reach lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.
edited 20 hours ago
answered 21 hours ago
EsmailianEsmailian
3865
3865
add a comment |
add a comment |
$begingroup$
Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.
In above question one has to think in designing the simple separable boundaries,
like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)
similarly in figure 2 D2 answer is B.
New contributor
$endgroup$
add a comment |
$begingroup$
Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.
In above question one has to think in designing the simple separable boundaries,
like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)
similarly in figure 2 D2 answer is B.
New contributor
$endgroup$
add a comment |
$begingroup$
Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.
In above question one has to think in designing the simple separable boundaries,
like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)
similarly in figure 2 D2 answer is B.
New contributor
$endgroup$
Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.
In above question one has to think in designing the simple separable boundaries,
like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)
similarly in figure 2 D2 answer is B.
New contributor
New contributor
answered 21 hours ago
Gaurav DograGaurav Dogra
312
312
New contributor
New contributor
add a comment |
add a comment |
$begingroup$
Occam’s razor in data-fitting tasks :
- First try linear equation
- If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.
D2
B
clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A
boundary is highly non-linear which seems like a jittered sine wave.
D1
However I am not sure about this one. A
boundary is like a circle and B
is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :
So I opt for a C
:-)
New contributor
$endgroup$
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
9 hours ago
add a comment |
$begingroup$
Occam’s razor in data-fitting tasks :
- First try linear equation
- If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.
D2
B
clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A
boundary is highly non-linear which seems like a jittered sine wave.
D1
However I am not sure about this one. A
boundary is like a circle and B
is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :
So I opt for a C
:-)
New contributor
$endgroup$
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
9 hours ago
add a comment |
$begingroup$
Occam’s razor in data-fitting tasks :
- First try linear equation
- If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.
D2
B
clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A
boundary is highly non-linear which seems like a jittered sine wave.
D1
However I am not sure about this one. A
boundary is like a circle and B
is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :
So I opt for a C
:-)
New contributor
$endgroup$
Occam’s razor in data-fitting tasks :
- First try linear equation
- If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.
D2
B
clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A
boundary is highly non-linear which seems like a jittered sine wave.
D1
However I am not sure about this one. A
boundary is like a circle and B
is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :
So I opt for a C
:-)
New contributor
New contributor
answered 15 hours ago
Agnius VasiliauskasAgnius Vasiliauskas
1213
1213
New contributor
New contributor
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
9 hours ago
add a comment |
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
9 hours ago
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
9 hours ago
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
9 hours ago
add a comment |
user1479198 is a new contributor. Be nice, and check out our Code of Conduct.
user1479198 is a new contributor. Be nice, and check out our Code of Conduct.
user1479198 is a new contributor. Be nice, and check out our Code of Conduct.
user1479198 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46831%2fhow-will-occams-razor-principle-work-in-machine-learning%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
18 hours ago