美思 技術雜談:Fast Parallel Download with lftp

Facebook Twitter LinkedIn LINE Skype EverNote GMail Yahoo Email

I was assigned by my professor to download some RNA sequences data to my own computer for backup purpose, because the original data server will clean these data. These data are dozens gigabytes in size, but the connection speed of the network in my house is not fast. Therefore, I chose lftp to speed up the downloading rate by parallel downloading.

lftp is a sophisticated ftp/http client and a file transfer program supporting several network protocols, e.g. FTP, FTPS, HTTP, HTTPS, HFTP, FISH, SFTP, torrent file and so on. lftp can be run interactively or in batch mode. You can write simple or complex commands to run lftp in a script file.

The basic lftp command is like this:


$ lftp -e "get -c http://releases.ubuntu.com/14.04.1/ubuntu-14.04.1-desktop-amd64.iso"
{{< / highlight >}}

`get` means the action of downloading.  `-c` means continuing previous download files.

If you need to download several files, you need the command `mget`, which supports wildcard expansion.

```console
$ lftp -e "mget -c http://releases.ubuntu.com/14.04.1/*.iso"
{{< / highlight >}}

If you need to download files in parallel, replace `get` with `pget`; set `mirror:use-pget-n (number)`.

```console
$ lftp -e "set mirror:use-pget-n 3; pget -c http://releases.ubuntu.com/14.04.1/ubuntu-14.04.1-desktop-amd64.iso"
{{< / highlight >}}

If you need more commands, separate them with `;`(semicolon):

```console
$ lftp -e "get -c http://releases.ubuntu.com/14.04.1/ubuntu-14.04.1-desktop-amd64.iso; get -c http://releases.ubuntu.com/14.04.1/ubuntu-14.04-server-amd64.iso"
{{< / highlight >}}

Be aware of the risk of being banned while downloading in parallel, so use the feature in caution.  Therefore, I downloaded these data to an intermediate server and, later, fetched them to my own computer.  The intermediate server was a droplet of DigitalOcean.  The downloading speed on the droplet counts in dozens megabytes, so it wouldn't spend you too much time downloading to the droplet.  Then, you can download these data again in parallel without overloading the original data server.

The rent fee of the droplet of DigitalOcean is charged in hours, so it won't cost you too much to rent a temporatory server.  You may try DigitalOcean or other similar virtual server hosting providers like Linode.  If you need the promote code of DigitalOcean, use [this link](https://www.digitalocean.com/?refcode=bb01e632c755).
關於作者

身為資訊領域碩士,美思認為開發應用程式的目的是為社會帶來價值。如果在這個過程中該軟體能成為永續經營的項目,那就是開發者和使用者雙贏的局面。

美思喜歡用開源技術來解決各式各樣的問題,但必要時對專有技術也不排斥。閒暇之餘,美思將所學寫成文章,放在這個網站上和大家分享。