Quite recently I faced small challange when I wanted to download some video files from the webpage.
Normally if there are more videos to download I was using Firefox with DownThemALl! extension.
Generally speaking it works fine when you are using Windows/Linux with GUI. This time as the amount of the videos to download was more than 800 (don’t ask for the details 😉 ).
I've decided to run it from my linux server running in the cloud (really doesn’t matter where).
root@ubuntu:~ wget https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4 --2018-09-07 23:44:36-- https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4 Resolving s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)... 18.104.22.168 Connecting to s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)|22.214.171.124|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2018-09-07 23:44:37 ERROR 403: Forbidden.
The final resolution is to use some extra flags with wget.
wget -O New filename.mp4 --referer=http://www.google.com --user-agent=Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:126.96.36.199) Gecko/20070725 Firefox/188.8.131.52 --header=Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 --header=Accept-Language: en-us,en;q=0.5 --header=Accept-Encoding: gzip,deflate --header=Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 --header=Keep-Alive: 300 -dnv https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4
Let me explain new switches in wget:
-O New filename.mp4 - we change the default name of the file from video.mp4 to the value in
--referer=http://www.google.com - Useful for retrieving documents with server-side processing that assume they are always being retrieved by interactive web browsers
--user-agent=agent-string - we fake the browser. In our case Firefox
--header - what kind of http request, encoding, charset we support
-dnv - we want to see some details of the download but we don't want to debug
Finally we receive http 200 code which means it works.
setting --no (verbose) to 0 DEBUG output created by Wget 1.17.1 on linux-gnu. Reading HSTS entries from /root/.wget-hsts URI encoding = 'ANSI_X3.4-1968'converted 'https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4' (ANSI_X3.4-1968) -> 'https://s3-us-west-1.amazonaws.com/bucket-name/video.mp4' (UTF-8) Caching s3-us-west-1.amazonaws.com => 184.108.40.206 Created socket 4. Releasing 0x0000559b1541fd60 (new refcount 1). Initiating SSL handshake. Handshake successful; connected socket 4 to SSL handle 0x0000559b154200f0 certificate: subject: CN=*.s3-us-west-1.amazonaws.com,O=Amazon.com Inc.,L=Seattle,ST=Washington,C=US issuer: CN=DigiCert Baltimore CA-2 G2,OU=www.digicert.com,O=DigiCert Inc,C=US X509 certificate successfully verified and matches host s3-us-west-1.amazonaws.com ---request begin--- GET /bucket-name/video.mp4 HTTP/1.1 Referer: http://www.google.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:220.127.116.11) Gecko/20070725 Firefox/18.104.22.168 Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Encoding: gzip,deflate Host: s3-us-west-1.amazonaws.com Connection: Keep-Alive Accept-Language: en-us,en;q=0.5 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 ---request end--- ---response begin--- HTTP/1.1 200 OK x-amz-id-2: dGFs1NEVkw2xOzfkbIm1PWAR9zAYsXGOAUuqn5xz/LzgsKpGaNaTfv6HcKy4sfRDO8BSn0vcwt4= x-amz-request-id: AA528E7DF42B458C Date: Fri, 07 Sep 2018 15:58:36 GMT Last-Modified: Thu, 30 Aug 2018 23:37:06 GMT ETag: e18629b490b0253b379f8ddae566a438 Accept-Ranges: bytes Content-Type: video/mp4 Content-Length: 942460456 Server: AmazonS3 ---response end--- Registered socket 4 for persistent reuse.