Is it OK to use the testing sample to compare algorithms? Announcing the arrival of Valued...

newbie Q : How to read an output file in one command line

Baking rewards as operations

An isoperimetric-type inequality inside a cube

Inverse square law not accurate for non-point masses?

How can I prevent/balance waiting and turtling as a response to cooldown mechanics

What is "Lambda" in Heston's original paper on stochastic volatility models?

"Destructive power" carried by a B-52?

How many time has Arya actually used Needle?

Sally's older brother

Is there a verb for listening stealthily?

Why not use the yoke to control yaw, as well as pitch and roll?

The Nth Gryphon Number

My mentor says to set image to Fine instead of RAW — how is this different from JPG?

Did pre-Columbian Americans know the spherical shape of the Earth?

What was the last profitable war?

Can the Haste spell grant both a Beast Master ranger and their animal companion extra attacks?

As a dual citizen, my US passport will expire one day after traveling to the US. Will this work?

Is a copyright notice with a non-existent name be invalid?

Is the Mordenkainens' Sword spell underpowered?

How does TikZ render an arc?

Understanding piped commands in GNU/Linux

Why is there so little support for joining EFTA in the British parliament?

New Order #6: Easter Egg

What is the proper term for etching or digging of wall to hide conduit of cables



Is it OK to use the testing sample to compare algorithms?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsCan I use the test dataset to select a model?Training Validation Testing set split for facial expression datasetSample selection through clusteringPossible Reason for low Test accuracy and high AUCOverfitted model produces similar AUC on test set, so which model do I go with?Hyperparameter tuning for stacked modelsHyper-parameter tuning when you don't have an access to the test dataCan I use the test dataset to select a model?Oversampling before Cross-Validation, is it a problem?How to plan a model analysis that avoids overfitting?Supervised multiclass classification : is ANN a good idea ? or use other classifiers?












2












$begingroup$


I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.



Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.



As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).



My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.



Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV, how can I prevent it to overfit?










share|improve this question









$endgroup$








  • 2




    $begingroup$
    Possible duplicate of Can I use the test dataset to select a model?
    $endgroup$
    – Ben Reiniger
    6 hours ago










  • $begingroup$
    @BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
    $endgroup$
    – Dan Chaltiel
    5 hours ago
















2












$begingroup$


I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.



Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.



As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).



My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.



Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV, how can I prevent it to overfit?










share|improve this question









$endgroup$








  • 2




    $begingroup$
    Possible duplicate of Can I use the test dataset to select a model?
    $endgroup$
    – Ben Reiniger
    6 hours ago










  • $begingroup$
    @BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
    $endgroup$
    – Dan Chaltiel
    5 hours ago














2












2








2





$begingroup$


I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.



Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.



As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).



My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.



Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV, how can I prevent it to overfit?










share|improve this question









$endgroup$




I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.



Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.



As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).



My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.



Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV, how can I prevent it to overfit?







machine-learning scikit-learn sampling






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 8 hours ago









Dan ChaltielDan Chaltiel

1435




1435








  • 2




    $begingroup$
    Possible duplicate of Can I use the test dataset to select a model?
    $endgroup$
    – Ben Reiniger
    6 hours ago










  • $begingroup$
    @BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
    $endgroup$
    – Dan Chaltiel
    5 hours ago














  • 2




    $begingroup$
    Possible duplicate of Can I use the test dataset to select a model?
    $endgroup$
    – Ben Reiniger
    6 hours ago










  • $begingroup$
    @BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
    $endgroup$
    – Dan Chaltiel
    5 hours ago








2




2




$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
6 hours ago




$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
6 hours ago












$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
5 hours ago




$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
5 hours ago










2 Answers
2






active

oldest

votes


















2












$begingroup$

Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.



The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.



If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.






share|improve this answer








New contributor




Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$













  • $begingroup$
    Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
    $endgroup$
    – Dan Chaltiel
    5 hours ago










  • $begingroup$
    With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
    $endgroup$
    – Cameron King
    4 hours ago












  • $begingroup$
    I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
    $endgroup$
    – Cameron King
    4 hours ago



















1












$begingroup$

No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.



To compare algorithms you instead set aside another chunk of your data called the validation set.



Here is some info about good splits depending on data size:



Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.



(Andrew uses the word dev set instead of validation set)






share|improve this answer











$endgroup$













  • $begingroup$
    I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
    $endgroup$
    – Dan Chaltiel
    8 hours ago










  • $begingroup$
    Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
    $endgroup$
    – Simon Larsson
    8 hours ago










  • $begingroup$
    I added a video on the subject.
    $endgroup$
    – Simon Larsson
    8 hours ago












Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49683%2fis-it-ok-to-use-the-testing-sample-to-compare-algorithms%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









2












$begingroup$

Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.



The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.



If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.






share|improve this answer








New contributor




Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$













  • $begingroup$
    Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
    $endgroup$
    – Dan Chaltiel
    5 hours ago










  • $begingroup$
    With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
    $endgroup$
    – Cameron King
    4 hours ago












  • $begingroup$
    I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
    $endgroup$
    – Cameron King
    4 hours ago
















2












$begingroup$

Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.



The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.



If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.






share|improve this answer








New contributor




Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$













  • $begingroup$
    Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
    $endgroup$
    – Dan Chaltiel
    5 hours ago










  • $begingroup$
    With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
    $endgroup$
    – Cameron King
    4 hours ago












  • $begingroup$
    I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
    $endgroup$
    – Cameron King
    4 hours ago














2












2








2





$begingroup$

Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.



The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.



If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.






share|improve this answer








New contributor




Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$



Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.



The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.



If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.







share|improve this answer








New contributor




Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer






New contributor




Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered 5 hours ago









Cameron KingCameron King

311




311




New contributor




Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Cameron King is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
    $endgroup$
    – Dan Chaltiel
    5 hours ago










  • $begingroup$
    With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
    $endgroup$
    – Cameron King
    4 hours ago












  • $begingroup$
    I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
    $endgroup$
    – Cameron King
    4 hours ago


















  • $begingroup$
    Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
    $endgroup$
    – Dan Chaltiel
    5 hours ago










  • $begingroup$
    With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
    $endgroup$
    – Cameron King
    4 hours ago












  • $begingroup$
    I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
    $endgroup$
    – Cameron King
    4 hours ago
















$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
5 hours ago




$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
5 hours ago












$begingroup$
With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
$endgroup$
– Cameron King
4 hours ago






$begingroup$
With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
$endgroup$
– Cameron King
4 hours ago














$begingroup$
I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
$endgroup$
– Cameron King
4 hours ago




$begingroup$
I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
$endgroup$
– Cameron King
4 hours ago











1












$begingroup$

No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.



To compare algorithms you instead set aside another chunk of your data called the validation set.



Here is some info about good splits depending on data size:



Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.



(Andrew uses the word dev set instead of validation set)






share|improve this answer











$endgroup$













  • $begingroup$
    I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
    $endgroup$
    – Dan Chaltiel
    8 hours ago










  • $begingroup$
    Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
    $endgroup$
    – Simon Larsson
    8 hours ago










  • $begingroup$
    I added a video on the subject.
    $endgroup$
    – Simon Larsson
    8 hours ago
















1












$begingroup$

No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.



To compare algorithms you instead set aside another chunk of your data called the validation set.



Here is some info about good splits depending on data size:



Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.



(Andrew uses the word dev set instead of validation set)






share|improve this answer











$endgroup$













  • $begingroup$
    I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
    $endgroup$
    – Dan Chaltiel
    8 hours ago










  • $begingroup$
    Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
    $endgroup$
    – Simon Larsson
    8 hours ago










  • $begingroup$
    I added a video on the subject.
    $endgroup$
    – Simon Larsson
    8 hours ago














1












1








1





$begingroup$

No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.



To compare algorithms you instead set aside another chunk of your data called the validation set.



Here is some info about good splits depending on data size:



Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.



(Andrew uses the word dev set instead of validation set)






share|improve this answer











$endgroup$



No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.



To compare algorithms you instead set aside another chunk of your data called the validation set.



Here is some info about good splits depending on data size:



Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.



(Andrew uses the word dev set instead of validation set)







share|improve this answer














share|improve this answer



share|improve this answer








edited 8 hours ago

























answered 8 hours ago









Simon LarssonSimon Larsson

1,025214




1,025214












  • $begingroup$
    I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
    $endgroup$
    – Dan Chaltiel
    8 hours ago










  • $begingroup$
    Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
    $endgroup$
    – Simon Larsson
    8 hours ago










  • $begingroup$
    I added a video on the subject.
    $endgroup$
    – Simon Larsson
    8 hours ago


















  • $begingroup$
    I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
    $endgroup$
    – Dan Chaltiel
    8 hours ago










  • $begingroup$
    Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
    $endgroup$
    – Simon Larsson
    8 hours ago










  • $begingroup$
    I added a video on the subject.
    $endgroup$
    – Simon Larsson
    8 hours ago
















$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
8 hours ago




$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
8 hours ago












$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
8 hours ago




$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
8 hours ago












$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
8 hours ago




$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
8 hours ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49683%2fis-it-ok-to-use-the-testing-sample-to-compare-algorithms%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Cannot install PyQt5 The Next CEO of Stack OverflowCannot install tcpreplay 3.4.4cannot...

Kapp-Putsch Acontecimentos | Outros artigos | Menu de navegação

Why did early computer designers eschew integers? The Next CEO of Stack OverflowWhat register...