CURL to download a directoryHow to copy whole directory from a web serverwhole site download manager for...

Does convergence of polynomials imply that of its coefficients?

PTIJ: At the Passover Seder, is one allowed to speak more than once during Maggid?

Pre-Employment Background Check With Consent For Future Checks

Print a physical multiplication table

Does fire aspect on a sword, destroy mob drops?

How can a new country break out from a developed country without war?

Gauss brackets with double vertical lines

Jem'Hadar, something strange about their life expectancy

Why is participating in the European Parliamentary elections used as a threat?

How to balance a monster modification (zombie)?

Can a university suspend a student even when he has left university?

Why is there so much iron?

Symbolism of 18 Journeyers

Why are there no stars visible in cislunar space?

Interior of Set Notation

Have any astronauts/cosmonauts died in space?

Why doesn't the fusion process of the sun speed up?

Asserting that Atheism and Theism are both faith based positions

If I cast the Enlarge/Reduce spell on an arrow, what weapon could it count as?

What is 露わになる affecting in the following sentence, '才能の持ち主' (持ち主 to be specific) or '才能'?

Would this string work as string?

Is VPN a layer 3 concept?

Why do I have a large white artefact on the rendered image?

Do I need an EFI partition for each 18.04 ubuntu I have on my HD?

CURL to download a directory

How to copy whole directory from a web serverwhole site download manager for Mac?cURL hangs trying to upload file from stdinDownloading file from FTP using cURLCurl ignores the -r switch when -C is presentError executing sftp with curlcurl check if file is newer and instead of downloading - execute a bash (or python) scriptCurl command to download a file over HTTPSGetting the parent directory with curlHow to do I do a cURL HTTP Post request to download a file and then save that file?How to use gzip or gunzip in a pipeline with curl (For binary gz files)cURL: curl: (18) transfer closed with 96671047906 bytes remaining to read

I am trying to download a full website directory using CURL. The following command does not work:

curl -LO http://example.com/

It returns an error: curl: Remote file name has no length!.

But when I do this: curl -LO http://example.com/someFile.type it works. Any idea how to download all files in the specified directory? Thanks.

edited Oct 14 '14 at 6:59

Der Hochstapler

68.2k50230286

asked Oct 17 '10 at 17:55

Foo

161123

add a comment |

I am trying to download a full website directory using CURL. The following command does not work:

curl -LO http://example.com/

It returns an error: curl: Remote file name has no length!.

But when I do this: curl -LO http://example.com/someFile.type it works. Any idea how to download all files in the specified directory? Thanks.

edited Oct 14 '14 at 6:59

Der Hochstapler

68.2k50230286

asked Oct 17 '10 at 17:55

Foo

161123

add a comment |

I am trying to download a full website directory using CURL. The following command does not work:

curl -LO http://example.com/

It returns an error: curl: Remote file name has no length!.

But when I do this: curl -LO http://example.com/someFile.type it works. Any idea how to download all files in the specified directory? Thanks.

edited Oct 14 '14 at 6:59

Der Hochstapler

68.2k50230286

asked Oct 17 '10 at 17:55

Foo

161123

I am trying to download a full website directory using CURL. The following command does not work:

curl -LO http://example.com/

It returns an error: curl: Remote file name has no length!.

But when I do this: curl -LO http://example.com/someFile.type it works. Any idea how to download all files in the specified directory? Thanks.

curl

edited Oct 14 '14 at 6:59

Der Hochstapler

68.2k50230286

asked Oct 17 '10 at 17:55

Foo

161123

edited Oct 14 '14 at 6:59

Der Hochstapler

68.2k50230286

asked Oct 17 '10 at 17:55

Foo

161123

edited Oct 14 '14 at 6:59

Der Hochstapler

68.2k50230286

edited Oct 14 '14 at 6:59

Der Hochstapler

68.2k50230286

edited Oct 14 '14 at 6:59

Der Hochstapler

68.2k50230286

asked Oct 17 '10 at 17:55

Foo

161123

asked Oct 17 '10 at 17:55

Foo

161123

asked Oct 17 '10 at 17:55

Foo

161123

add a comment |

7 Answers
7

active

oldest

votes

HTTP doesn't really have a notion of directories. The slashes other than the first three (http://example.com/) do not have any special meaning except with respect to .. in relative URLs. So unless the server follows a particular format, there's no way to “download all files in the specified directory”.

If you want to download the whole site, your best bet is to traverse all the links in the main page recursively. Curl can't do it, but wget can. This will work if the website is not too dynamic (in particular, wget won't see links that are constructed by Javascript code). Start with wget -r http://example.com/, and look under “Recursive Retrieval Options” and “Recursive Accept/Reject Options” in the wget manual for more relevant options (recursion depth, exclusion lists, etc).

If the website tries to block automated downloads, you may need to change the user agent string (-U Mozilla), and to ignore robots.txt (create an empty file example.com/robots.txt and use the -nc option so that wget doesn't try to download it from the server).

answered Oct 17 '10 at 19:59

Gilles

53k15114161

How wget is able to do it. ??

– Srikan
Oct 6 '16 at 16:29

@Srikan wget parses the HTML to find the links that it contains and recursively downloads (a selection of) those links.

– Gilles
Oct 6 '16 at 21:05

If the files don't have any internal links, then does recursive download fail to get all the files. Lets say there is a HTTP folder of some txt files. Will wget succeed to get all the files. Let me try it after this comment

– Srikan
Oct 15 '16 at 2:28

@Srikan HTTP has no concept of directory. Recursive download means following links in web pages (including web pages generated by the server to show a directory listing, if the web server does this).

– Gilles
Oct 15 '16 at 11:58

wget supports ignoring robots.txt with the flag -e robots=off. Alternatively you can avoid downloading it by rejecting it with -R "robots.txt".

– Ryan Krage
Nov 13 '18 at 13:39

add a comment |

Always works for me, included no parent and recursive to only get the desired directory.

 wget --no-parent -r http://WEBSITE.com/DIRECTORY

answered Jan 31 '14 at 16:44

stanzheng

32124

add a comment |

In this case, curl is NOT the best tool. You can use wget with the -r argument, like this:

wget -r http://example.com/

This is the most basic form, and and you can use additional arguments as well. For more information, see the manpage (man wget).

edited Jun 20 '14 at 15:35

Canadian Luke

18.1k3090148

answered Jan 23 '14 at 11:50

moroccan

12112

add a comment |

This isn't possible. There is no standard, generally implemented, way for a web server to return the contents of a directory to you. Most servers do generate an HTML index of a directory, if configured to do so, but this output isn't standard, nor guaranteed by any means. You could parse this HTML, but keep in mind that the format will change from server to server, and won't always be enabled.

answered Oct 17 '10 at 17:59

Brad

3,55843361

Look at this app called Site Sucker. sitesucker.us. How do they do it?

– Foo
Oct 17 '10 at 18:09

They parse the HTML file and download every link in it.

– Brad
Oct 17 '10 at 18:14

Using wget or curl?

– Foo
Oct 17 '10 at 18:17

7

@Brad: curl doesn't parse the HTML, but wget does precisely this (it's called recursive retrieval).

– Gilles
Oct 17 '10 at 20:00

1

Ah, well I stand corrected! gnu.org/software/wget/manual/html_node/… OP should be aware that this still doesn't get what he is looking for... it only follows links that are available on the pages returned.

– Brad
Oct 17 '10 at 20:13

|
show 1 more comment

You can use the Firefox extension DownThemAll!
It will let you download all the files in a directory in one click. It is also customizable and you can specify what file types to download. This is the easiest way I have found.

answered Jan 20 '13 at 0:08

Asdf

211

add a comment |

You might find a use for a website ripper here, this will download everything and modify the contents/internal links for local use. A good one can be found here: http://www.httrack.com

answered Jan 23 '14 at 12:44

Gaurav Joseph

1,4121120

add a comment |

I used httrack on Mac:

brew install httrack httrack http://...

answered 1 min ago

Sungryeul

New contributor

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f200426%2fcurl-to-download-a-directory%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

7 Answers
7

active

oldest

votes

7 Answers
7

active

oldest

votes

answered Oct 17 '10 at 19:59

Gilles

53k15114161

How wget is able to do it. ??

– Srikan
Oct 6 '16 at 16:29

@Srikan wget parses the HTML to find the links that it contains and recursively downloads (a selection of) those links.

– Gilles
Oct 6 '16 at 21:05

If the files don't have any internal links, then does recursive download fail to get all the files. Lets say there is a HTTP folder of some txt files. Will wget succeed to get all the files. Let me try it after this comment

– Srikan
Oct 15 '16 at 2:28

@Srikan HTTP has no concept of directory. Recursive download means following links in web pages (including web pages generated by the server to show a directory listing, if the web server does this).

– Gilles
Oct 15 '16 at 11:58

wget supports ignoring robots.txt with the flag -e robots=off. Alternatively you can avoid downloading it by rejecting it with -R "robots.txt".

– Ryan Krage
Nov 13 '18 at 13:39

add a comment |

answered Oct 17 '10 at 19:59

Gilles

53k15114161

How wget is able to do it. ??

– Srikan
Oct 6 '16 at 16:29

@Srikan wget parses the HTML to find the links that it contains and recursively downloads (a selection of) those links.

– Gilles
Oct 6 '16 at 21:05

If the files don't have any internal links, then does recursive download fail to get all the files. Lets say there is a HTTP folder of some txt files. Will wget succeed to get all the files. Let me try it after this comment

– Srikan
Oct 15 '16 at 2:28

@Srikan HTTP has no concept of directory. Recursive download means following links in web pages (including web pages generated by the server to show a directory listing, if the web server does this).

– Gilles
Oct 15 '16 at 11:58

wget supports ignoring robots.txt with the flag -e robots=off. Alternatively you can avoid downloading it by rejecting it with -R "robots.txt".

– Ryan Krage
Nov 13 '18 at 13:39

add a comment |

answered Oct 17 '10 at 19:59

Gilles

53k15114161

answered Oct 17 '10 at 19:59

Gilles

53k15114161

answered Oct 17 '10 at 19:59

Gilles

53k15114161

answered Oct 17 '10 at 19:59

Gilles

53k15114161

answered Oct 17 '10 at 19:59

Gilles

53k15114161

How wget is able to do it. ??

– Srikan
Oct 6 '16 at 16:29

@Srikan wget parses the HTML to find the links that it contains and recursively downloads (a selection of) those links.

– Gilles
Oct 6 '16 at 21:05

If the files don't have any internal links, then does recursive download fail to get all the files. Lets say there is a HTTP folder of some txt files. Will wget succeed to get all the files. Let me try it after this comment

– Srikan
Oct 15 '16 at 2:28

@Srikan HTTP has no concept of directory. Recursive download means following links in web pages (including web pages generated by the server to show a directory listing, if the web server does this).

– Gilles
Oct 15 '16 at 11:58

wget supports ignoring robots.txt with the flag -e robots=off. Alternatively you can avoid downloading it by rejecting it with -R "robots.txt".

– Ryan Krage
Nov 13 '18 at 13:39

add a comment |

How wget is able to do it. ??

– Srikan
Oct 6 '16 at 16:29

@Srikan wget parses the HTML to find the links that it contains and recursively downloads (a selection of) those links.

– Gilles
Oct 6 '16 at 21:05

If the files don't have any internal links, then does recursive download fail to get all the files. Lets say there is a HTTP folder of some txt files. Will wget succeed to get all the files. Let me try it after this comment

– Srikan
Oct 15 '16 at 2:28

@Srikan HTTP has no concept of directory. Recursive download means following links in web pages (including web pages generated by the server to show a directory listing, if the web server does this).

– Gilles
Oct 15 '16 at 11:58

wget supports ignoring robots.txt with the flag -e robots=off. Alternatively you can avoid downloading it by rejecting it with -R "robots.txt".

– Ryan Krage
Nov 13 '18 at 13:39

How wget is able to do it. ??

– Srikan
Oct 6 '16 at 16:29

@Srikan wget parses the HTML to find the links that it contains and recursively downloads (a selection of) those links.

– Gilles
Oct 6 '16 at 21:05

If the files don't have any internal links, then does recursive download fail to get all the files. Lets say there is a HTTP folder of some txt files. Will wget succeed to get all the files. Let me try it after this comment

– Srikan
Oct 15 '16 at 2:28

@Srikan HTTP has no concept of directory. Recursive download means following links in web pages (including web pages generated by the server to show a directory listing, if the web server does this).

– Gilles
Oct 15 '16 at 11:58

wget supports ignoring robots.txt with the flag -e robots=off. Alternatively you can avoid downloading it by rejecting it with -R "robots.txt".

– Ryan Krage
Nov 13 '18 at 13:39

add a comment |

Always works for me, included no parent and recursive to only get the desired directory.

 wget --no-parent -r http://WEBSITE.com/DIRECTORY

answered Jan 31 '14 at 16:44

stanzheng

32124

add a comment |

Always works for me, included no parent and recursive to only get the desired directory.

 wget --no-parent -r http://WEBSITE.com/DIRECTORY

answered Jan 31 '14 at 16:44

stanzheng

32124

add a comment |

Always works for me, included no parent and recursive to only get the desired directory.

 wget --no-parent -r http://WEBSITE.com/DIRECTORY

answered Jan 31 '14 at 16:44

stanzheng

32124

Always works for me, included no parent and recursive to only get the desired directory.

 wget --no-parent -r http://WEBSITE.com/DIRECTORY

answered Jan 31 '14 at 16:44

stanzheng

32124

answered Jan 31 '14 at 16:44

stanzheng

32124

answered Jan 31 '14 at 16:44

stanzheng

32124

answered Jan 31 '14 at 16:44

stanzheng

32124

add a comment |

In this case, curl is NOT the best tool. You can use wget with the -r argument, like this:

wget -r http://example.com/

This is the most basic form, and and you can use additional arguments as well. For more information, see the manpage (man wget).

edited Jun 20 '14 at 15:35

Canadian Luke

18.1k3090148

answered Jan 23 '14 at 11:50

moroccan

12112

add a comment |

In this case, curl is NOT the best tool. You can use wget with the -r argument, like this:

wget -r http://example.com/

This is the most basic form, and and you can use additional arguments as well. For more information, see the manpage (man wget).

edited Jun 20 '14 at 15:35

Canadian Luke

18.1k3090148

answered Jan 23 '14 at 11:50

moroccan

12112

add a comment |

In this case, curl is NOT the best tool. You can use wget with the -r argument, like this:

wget -r http://example.com/

This is the most basic form, and and you can use additional arguments as well. For more information, see the manpage (man wget).

edited Jun 20 '14 at 15:35

Canadian Luke

18.1k3090148

answered Jan 23 '14 at 11:50

moroccan

12112

In this case, curl is NOT the best tool. You can use wget with the -r argument, like this:

wget -r http://example.com/

This is the most basic form, and and you can use additional arguments as well. For more information, see the manpage (man wget).

edited Jun 20 '14 at 15:35

Canadian Luke

18.1k3090148

answered Jan 23 '14 at 11:50

moroccan

12112

edited Jun 20 '14 at 15:35

Canadian Luke

18.1k3090148

edited Jun 20 '14 at 15:35

Canadian Luke

18.1k3090148

edited Jun 20 '14 at 15:35

Canadian Luke

18.1k3090148

answered Jan 23 '14 at 11:50

moroccan

12112

answered Jan 23 '14 at 11:50

moroccan

12112

answered Jan 23 '14 at 11:50

moroccan

12112

add a comment |

answered Oct 17 '10 at 17:59

Brad

3,55843361

Look at this app called Site Sucker. sitesucker.us. How do they do it?

– Foo
Oct 17 '10 at 18:09

They parse the HTML file and download every link in it.

– Brad
Oct 17 '10 at 18:14

Using wget or curl?

– Foo
Oct 17 '10 at 18:17

7

@Brad: curl doesn't parse the HTML, but wget does precisely this (it's called recursive retrieval).

– Gilles
Oct 17 '10 at 20:00

1

Ah, well I stand corrected! gnu.org/software/wget/manual/html_node/… OP should be aware that this still doesn't get what he is looking for... it only follows links that are available on the pages returned.

– Brad
Oct 17 '10 at 20:13

|
show 1 more comment

answered Oct 17 '10 at 17:59

Brad

3,55843361

Look at this app called Site Sucker. sitesucker.us. How do they do it?

– Foo
Oct 17 '10 at 18:09

They parse the HTML file and download every link in it.

– Brad
Oct 17 '10 at 18:14

Using wget or curl?

– Foo
Oct 17 '10 at 18:17

7

@Brad: curl doesn't parse the HTML, but wget does precisely this (it's called recursive retrieval).

– Gilles
Oct 17 '10 at 20:00

1

Ah, well I stand corrected! gnu.org/software/wget/manual/html_node/… OP should be aware that this still doesn't get what he is looking for... it only follows links that are available on the pages returned.

– Brad
Oct 17 '10 at 20:13

|
show 1 more comment

answered Oct 17 '10 at 17:59

Brad

3,55843361

answered Oct 17 '10 at 17:59

Brad

3,55843361

answered Oct 17 '10 at 17:59

Brad

3,55843361

answered Oct 17 '10 at 17:59

Brad

3,55843361

answered Oct 17 '10 at 17:59

Brad

3,55843361

Look at this app called Site Sucker. sitesucker.us. How do they do it?

– Foo
Oct 17 '10 at 18:09

They parse the HTML file and download every link in it.

– Brad
Oct 17 '10 at 18:14

Using wget or curl?

– Foo
Oct 17 '10 at 18:17

7

@Brad: curl doesn't parse the HTML, but wget does precisely this (it's called recursive retrieval).

– Gilles
Oct 17 '10 at 20:00

1

Ah, well I stand corrected! gnu.org/software/wget/manual/html_node/… OP should be aware that this still doesn't get what he is looking for... it only follows links that are available on the pages returned.

– Brad
Oct 17 '10 at 20:13

|
show 1 more comment

Look at this app called Site Sucker. sitesucker.us. How do they do it?

– Foo
Oct 17 '10 at 18:09

They parse the HTML file and download every link in it.

– Brad
Oct 17 '10 at 18:14

Using wget or curl?

– Foo
Oct 17 '10 at 18:17

7

@Brad: curl doesn't parse the HTML, but wget does precisely this (it's called recursive retrieval).

– Gilles
Oct 17 '10 at 20:00

1

Ah, well I stand corrected! gnu.org/software/wget/manual/html_node/… OP should be aware that this still doesn't get what he is looking for... it only follows links that are available on the pages returned.

– Brad
Oct 17 '10 at 20:13

Look at this app called Site Sucker. sitesucker.us. How do they do it?

– Foo
Oct 17 '10 at 18:09

They parse the HTML file and download every link in it.

– Brad
Oct 17 '10 at 18:14

Using wget or curl?

– Foo
Oct 17 '10 at 18:17

@Brad: curl doesn't parse the HTML, but wget does precisely this (it's called recursive retrieval).

– Gilles
Oct 17 '10 at 20:00

Ah, well I stand corrected! gnu.org/software/wget/manual/html_node/… OP should be aware that this still doesn't get what he is looking for... it only follows links that are available on the pages returned.

– Brad
Oct 17 '10 at 20:13

|
show 1 more comment

answered Jan 20 '13 at 0:08

Asdf

211

add a comment |

answered Jan 20 '13 at 0:08

Asdf

211

add a comment |

answered Jan 20 '13 at 0:08

Asdf

211

answered Jan 20 '13 at 0:08

Asdf

211

answered Jan 20 '13 at 0:08

Asdf

211

answered Jan 20 '13 at 0:08

Asdf

211

answered Jan 20 '13 at 0:08

Asdf

211

add a comment |

You might find a use for a website ripper here, this will download everything and modify the contents/internal links for local use. A good one can be found here: http://www.httrack.com

answered Jan 23 '14 at 12:44

Gaurav Joseph

1,4121120

add a comment |

You might find a use for a website ripper here, this will download everything and modify the contents/internal links for local use. A good one can be found here: http://www.httrack.com

answered Jan 23 '14 at 12:44

Gaurav Joseph

1,4121120

add a comment |

You might find a use for a website ripper here, this will download everything and modify the contents/internal links for local use. A good one can be found here: http://www.httrack.com

answered Jan 23 '14 at 12:44

Gaurav Joseph

1,4121120

You might find a use for a website ripper here, this will download everything and modify the contents/internal links for local use. A good one can be found here: http://www.httrack.com

answered Jan 23 '14 at 12:44

Gaurav Joseph

1,4121120

answered Jan 23 '14 at 12:44

Gaurav Joseph

1,4121120

answered Jan 23 '14 at 12:44

Gaurav Joseph

1,4121120

answered Jan 23 '14 at 12:44

Gaurav Joseph

1,4121120

add a comment |

I used httrack on Mac:

brew install httrack httrack http://...

answered 1 min ago

Sungryeul

New contributor

add a comment |

I used httrack on Mac:

brew install httrack httrack http://...

answered 1 min ago

Sungryeul

New contributor

add a comment |

I used httrack on Mac:

brew install httrack httrack http://...

answered 1 min ago

Sungryeul

New contributor

I used httrack on Mac:

brew install httrack httrack http://...

answered 1 min ago

Sungryeul

New contributor

answered 1 min ago

Sungryeul

New contributor

answered 1 min ago

Sungryeul

answered 1 min ago

Sungryeul

New contributor

Sungryeul is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Super User!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tjyylli