How to use Nginx to proxy your S3 files

Chi Thuc Nguyen on 2020-10-18

There are many benefits you can get when using Nginx to proxy your S3 (and others) files, such as:

This tutorial helps you through detailed steps you can follow to configure Nginx to proxy your files in S3, and leverage browser caching.

Inspect AWS S3 response headers

First, let inspect AWS S3 files’ response headers:

$ curl -I https://ucodevn.s3-ap-southeast-1.amazonaws.com/image_restored/1867/courses/42/17773/id365442.png

You will get a result similar to this:

HTTP/1.1 200 OK
x-amz-id-2: C8uQhUZqP8hZ3hACa3fFbG6l29kXYq/VS+i5otl9DNRHAHM4wnM+28Yk4LzDt9L/Q8OuMbLGeX4=
x-amz-request-id: 4A0191D9383DE48B
Date: Sun, 18 Oct 2020 01:47:31 GMT
Last-Modified: Mon, 27 Jul 2020 07:48:22 GMT
ETag: "356b5d3ba55b3f00bd30b071e298c5fe"
Accept-Ranges: bytes
Content-Type: image/png
Content-Length: 7481
Server: AmazonS3

We can see missing Cache-Control but Conditional GET headers have already been configured. When we reuse E-Tag/Last-Modified (that’s how a browser’s client-side cache works), we get HTTP 304 alongside with empty Content-Length. An interpretation of that is the client (curl in our case) queries the resource saying that no data transfer required unless the file has been modified on the server:

$ curl -I  https://ucodevn.s3-ap-southeast-1.amazonaws.com/image_restored/1867/courses/42/17773/id365442.png --header "If-None-Match: 356b5d3ba55b3f00bd30b071e298c5fe"

And we will get HTTP 304 Not Modified with no data re-send:

HTTP/1.1 304 Not Modified
x-amz-id-2: 3HoshGpkZODJGCObVCv37r95mX2JI01A/CI9S7Kqj8uquI8GuNELi+a/1D0G7Sfu/GsTfiulT1U=
x-amz-request-id: 9C4B0A7CFF6107BB
Date: Sun, 18 Oct 2020 01:49:22 GMT
Last-Modified: Mon, 27 Jul 2020 07:48:22 GMT
ETag: "356b5d3ba55b3f00bd30b071e298c5fe"
Server: AmazonS3

Or using If-Modified-Since header:

$ curl -I  https://ucodevn.s3-ap-southeast-1.amazonaws.com/image_restored/1867/courses/42/17773/id365442.png --header "If-Modified-Since: Mon, 27 Jul 2020 07:48:22 GMT"

Similar result:

HTTP/1.1 304 Not Modified
x-amz-id-2: 2JuMRfdvkgnL8O5J07H4H5J4U7D7h7KOMHfzkDuqGiUIREduB39gFc6fIxgWm+419M564IsR2pY=
x-amz-request-id: 80E743CCBC58B5DC
Date: Sun, 18 Oct 2020 01:51:45 GMT
Last-Modified: Mon, 27 Jul 2020 07:48:22 GMT
ETag: "356b5d3ba55b3f00bd30b071e298c5fe"
Server: AmazonS3
Connection: close

Proxy using Nginx

Use the configuration below to proxy your S3 files with Nginx as following. Please note that there is no slash (/) at the end of proxy_pass URL.

server {
  listen 80;
  listen 443 ssl;
  server_name  statics.yourside.com;
  access_log  /var/log/nginx/statics.access.log  combined;
  error_log   /var/log/nginx/statics.error.log;
  set $bucket "ucodevn.s3-ap-southeast-1.amazonaws.com";
  sendfile on;
location / {
    resolver 8.8.8.8;
    proxy_http_version     1.1;
    proxy_redirect off;
    proxy_set_header       Connection "";
    proxy_set_header       Authorization '';
    proxy_set_header       Host $bucket;
    proxy_set_header       X-Real-IP $remote_addr;
    proxy_set_header       X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_hide_header      x-amz-id-2;
    proxy_hide_header      x-amz-request-id;
    proxy_hide_header      x-amz-meta-server-side-encryption;
    proxy_hide_header      x-amz-server-side-encryption;
    proxy_hide_header      Set-Cookie;
    proxy_ignore_headers   Set-Cookie;
    proxy_intercept_errors on;
    add_header             Cache-Control max-age=31536000;
    proxy_pass             https://$bucket; # without trailing slash
  }
}

Restart Nginx:

$ sudo nginx -t
$ sudo service nginx restart

And try to access your file via your own URL (use -k in case you don’t have an SSL certificate yet):

$ curl -I -k https://statics.yourside.com/{path_to_your_file}

For instances:

$ curl -I -k https://cdn.ucode.vn/image_restored/1867/courses/42/17773/id365442.png

And you will see Cache-Control header as following. Note that the Server is nginx now, but not AmazonS3 anymore.

HTTP/1.1 200 OK
Server: nginx/1.18.0 (Ubuntu)
Date: Sun, 18 Oct 2020 02:11:57 GMT
Content-Type: image/png
Content-Length: 7481
Connection: keep-alive
Last-Modified: Mon, 27 Jul 2020 07:48:22 GMT
ETag: "356b5d3ba55b3f00bd30b071e298c5fe"
Accept-Ranges: bytes
Cache-Control: max-age=31536000
Nginx proxy without cache

Now, we can try request the file using Nginx proxy with Conditional GET:

$ curl -I -k https://cdn.ucode.vn/image_restored/1867/courses/42/17773/id365442.png --header "If-None-Match: 356b5d3ba55b3f00bd30b071e298c5fe"

And the same HTTP 304 status as responded from AmazonS3, but the file was served vid Nginx server also:

HTTP/1.1 304 Not Modified
Server: nginx/1.18.0 (Ubuntu)
Date: Sun, 18 Oct 2020 02:24:46 GMT
Connection: keep-alive
Last-Modified: Mon, 27 Jul 2020 07:48:22 GMT
ETag: "356b5d3ba55b3f00bd30b071e298c5fe"
Cache-Control: max-age=31536000
Proxy request with Condition GET

Proxy using Nginx with proxy cache

Add another sub-path to serve file via proxy cache. We will add an additional X-Cache-Status header, its value is MISS until cache warmed up after first request.

Nginx configuration:

proxy_cache_path   /tmp/ levels=1:2 keys_zone=s3_cache:10m max_size=500m inactive=60m use_temp_path=off;
server {
  listen 80;
  listen 443 ssl;
  server_name  cdn.ucode.vn;
  access_log   /var/log/nginx/ucode-cdn.access.log  combined;
  error_log   /var/log/nginx/ucode-cdn.error.log;
  set $bucket "ucodevn.s3-ap-southeast-1.amazonaws.com";
  sendfile        on;
# This configuration uses a 60 minute cache for files requested:
  location ^~ /cached/ {
    rewrite           /cached(.*) $1 break;
    resolver 8.8.8.8;
    proxy_cache            s3_cache;
    proxy_http_version     1.1;
    proxy_redirect off;
    proxy_set_header       Connection "";
    proxy_set_header       Authorization '';
    proxy_set_header       Host $bucket;
    proxy_set_header       X-Real-IP $remote_addr;
    proxy_set_header       X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_hide_header      x-amz-id-2;
    proxy_hide_header      x-amz-request-id;
    proxy_hide_header      x-amz-meta-server-side-encryption;
    proxy_hide_header      x-amz-server-side-encryption;
    proxy_hide_header      Set-Cookie;
    proxy_ignore_headers   Set-Cookie;
    proxy_cache_revalidate on;
    proxy_intercept_errors on;
    proxy_cache_use_stale  error timeout updating http_500 http_502 http_503 http_504;
    proxy_cache_lock       on;
    proxy_cache_valid      200 304 60m;
    add_header             Cache-Control max-age=31536000;
    add_header             X-Cache-Status $upstream_cache_status;
    proxy_pass             https://$bucket; # without trailing slash
  }
# This configuration provides direct access to the Object Storage bucket:
  location / {
    resolver 8.8.8.8;
    proxy_http_version     1.1;
    proxy_redirect off;
    proxy_set_header       Connection "";
    proxy_set_header       Authorization '';
    proxy_set_header       Host $bucket;
    proxy_set_header       X-Real-IP $remote_addr;
    proxy_set_header       X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_hide_header      x-amz-id-2;
    proxy_hide_header      x-amz-request-id;
    proxy_hide_header      x-amz-meta-server-side-encryption;
    proxy_hide_header      x-amz-server-side-encryption;
    proxy_hide_header      Set-Cookie;
    proxy_ignore_headers   Set-Cookie;
    proxy_intercept_errors on;
    add_header             Cache-Control max-age=31536000;
    proxy_pass             https://$bucket;
  }
}

At first request, the X-Cache-Status should be MISS:

$ curl -I -k https://cdn.ucode.vn/cached/image_restored/1867/courses/42/17773/
id365442.png --header "If-None-Match: 356b5d3ba55b3f00bd30b071e298c5fe"

After that, a new cached file will be created somewhere inside /tmp/ folder as configured above.

HTTP/1.1 200 OK
Server: nginx/1.18.0 (Ubuntu)
Date: Sun, 18 Oct 2020 03:11:17 GMT
Content-Type: image/png
Content-Length: 7481
Connection: keep-alive
Last-Modified: Mon, 27 Jul 2020 07:48:22 GMT
ETag: "356b5d3ba55b3f00bd30b071e298c5fe"
Cache-Control: max-age=31536000
X-Cache-Status: MISS
Accept-Ranges: bytes

Then the cache will be warmed up after first request, following requests will be served directly from our Nginx without re-fetch from the origin (AmazonS3):

$ curl -I -k https://cdn.ucode.vn/cached/image_restored/1867/courses/42/17773/
id365442.png --header "If-None-Match: 356b5d3ba55b3f00bd30b071e298c5fe"

The headers now will be like this (with much shorter response time):

HTTP/1.1 200 OK
Server: nginx/1.18.0 (Ubuntu)
Date: Sun, 18 Oct 2020 03:14:19 GMT
Content-Type: image/png
Content-Length: 7481
Connection: keep-alive
Last-Modified: Mon, 27 Jul 2020 07:48:22 GMT
ETag: "356b5d3ba55b3f00bd30b071e298c5fe"
Cache-Control: max-age=31536000
X-Cache-Status: HIT
Accept-Ranges: bytes
Proxy cache HIT

Some optimized settings used above are based on Nginx official documentation I provide the Nginx S3 configuration with optimized caching settings that supports the following options:

Happy Ops-ing!!