Hosting with Amazon S3 and CloudFront
A post on using Amazon S3 and CloudFront for my website.
When I first started looking at this, I thought it would be a sizeable read, but in reality it wasn’t. I will take you through all the steps needed to get you up and running on and using Amazon S3 and CloudFront to host and secure your website and then walk through some typical costs and why I elected to go this route instead of self hosting.
I spent some time looking at how to do this and was surprised how hard it was to find what ended up being the really simple steps needed to set this up. Now Amazon are always changing and improving what they do, so in the future this may become even easier.
The ‘aws’ command line
Firstly I like using a command line if I am going to be doing the same thing over and over again so I recommend installing the AWS command line utility. I am deeply embedded in the Apple eco-system, so below are the steps I followed to install the utility on my Mac.
- Download and Install the Utility https://awscli.amazonaws.com/AWSCLIV2.pkg
- Configure the ‘aws’ utility with your region, access key etc
AWS Access Key ID [None]: QERQEQEQEQEEXAMPLE
AWS Secret Access Key [None]: DFGDGDGDGDGBFGHEXAMPLEKEY
Default region name [None]: us-east-1
Default output format [None]: json
Once you have completed this you should be able to use the command line utility to interact directly with the awesome power that is Amazon Web Services.
Using S3 for Storage
First you need to log into the Console and create an S3 bucket. Lots of post out there about using S3 to host an unsecured (HTTP) website, that mention that the bucket must have the same name as your website. We are going to front (no pun intended) our site with CloudFront, so Amazon S3 is simply acting as a data store. I am using a single bucket to host this site www.data-smith.ca. But in theory you could create different “folders” in your S3 bucket and just point CloudFront at one of them. For example you might have different websites for blogs, testing, development etc. Once you have created your bucket with any name that makes sense for what you are doing, then that is basically it. No need to enable the Static Website Hosting option as S3 will not be hosting the site.
Next, upload the contents of your static website to the S3 bucket. From my earlier posts you will know that I use Hugo, which conveniently places all the files and directory structure for my site in a single “public” folder. So in my case I simply used the following command to upload the entire static website to my S3 bucket.
aws s3 cp public s3://www.data-smith.ca/ --recursive
Note at this point I am not worrying about policy documents to enable access to my S3 bucket, the bucket is private and only accessible by me.
Content Delivery with CloudFront
One of the main reasons for moving to AWS along with simplicity, was my desire to serve my website over HTTPS. This is now required for modern browsers and to be honest I think anything not use HTTPS should be turned off and blocked. However, setting up and managing certs can be a pain and one of the many benefits of AWS was the ability to use AWS CloudFront to obtain free 12 month certificates. CloudFront is a fully fledged Content Delivery Network (CDN) with all the complexity and options that comes with it. The documentation is pretty good with lots of examples which I will not reproduce here. THe basic high level steps are:
- Create a Web Distribution - you will see your S3 bucket in the drop down to select as the Origin.
- Create an OAI (Origin Access Identity) - you will use this identity to access your S3 bucket.
- Select the Options to allow CloudFront to change your S3 Policies - this is the awesome bit :)
- Serve your traffic over HTTPS (you can redirect HTTP traffic to HTTPS as well)
- Create a Certificate (I use a wildcard certificate, but I may change that later, because you can!!)
- Add a Default Root Object (index.html)
- Allow your data to be cached in CloudFront (it will be faster and reduce S3 retrieval costs)
- Optionally Whitelist or Blacklist specific Countries from accessing your site.
Once you have gone through this save the distribution, it will take a few minutes to propogate. Don’t worry if you get something wrong, it is easy to update and fix or modify later.
Next go back to your S3 Bucket and you will see it has an updated Policy which has been generated by CloudFront, which will look something like this:
{
"Version": "2012-10-17",
"Id": "PolicyForCloudFrontContent",
"Statement": [
{
"Sid": "2",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity SOMEIDENTITY"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::www.data-smith.ca/*"
}
]
}
The final step is to update your CNAME or Aliases with your domain owner to point at your CloudFront URL, which will look something like “d345df345.cloudfront.net”. I am using https://name.com that provides the ability to specify something they call an ANAME, which works well for me. Give everything a few minutes to propogate (sometimes longer) and you should be able to access your website.
Some added bonuses from CloudFront is the ability to block certain regions, this is remarkably granular and extensive. CloudFront will also provide, pretty detailed information on how your site is being used and accessed,in addition to how many requests it is having to make to your S3 bucket (Origin) to fetch webpages to cache. As this is a small static site, the site will typically end up cached entirely in CloudFront, which makes it very snappy to navigate.
Lambda to the rescue
As previously mentioned I am a fan of Hugo. One of the things I like about Hugo is it’s use of aesthetically pleasing URLs that do not include index.html but rather just include the directory name. Hugo leverages a standard webserver redirect rule, that will always implicitly search and use index.html if you just specify a directory. This works really well, but with CloudFront we have a problem, CloudFront is not a typical webserver, it is really a CDN solution. So where do we put our redirect rules?
With Amazon Lambda what we are really wanting to do is modify the request sent to S3 from our CloudFront distribution. If the request to our S3 bucket ends in a /, we want to change this to be /index.html. Below is a snippet of Node.js that will do this for us
'use strict';
exports.handler = (event, context, callback) => {
// Extract the request from the CloudFront event that is sent to Lambda@Edge
var request = event.Records[0].cf.request;
// Extract the URI from the request
var olduri = request.uri;
// Match any '/' that occurs at the end of a URI. Replace it with a default index
var newuri = olduri.replace(/\/$/, '\/index.html');
// Log the URI as received by CloudFront and the new URI to be used to fetch from origin
console.log("Old URI: " + olduri);
console.log("New URI: " + newuri);
// Replace the received URI with the URI that includes the index page
request.uri = newuri;
// Return to CloudFront
return callback(null, request);
}
To use this code, simply “author from scratch” a brand new function (index-request-redirect) and paste the code above. You will need to create a new IAM role with some basic Lambda capabilities, to assign to the Lambda function then you can deploy this to the Lambda@Edge. You will get asked a couple of questions about your CloudFront distribution and then you are basically up and running. Just like CloudFront Lambda has it’s own Dashboard, which will give you information on your invocations and how long they executed for.
But what does it cost?
One last thing, the costs are just jaw dropping. If you have a site getting 10,000 hits per month and is serving less than 10TB of data per month (and I expect most of us would fall into this category) then your CloudFront and Lambda costs are 10s of cents per month, add in a couple of more cents for your S3 Storage and occassional access and for a couple of dollars per year you do not have to worry about keeping your site up and patched, with the ability to Geo Block and also see who is using it. If you are self hosting or even thinking about it then try looking to the Clouds.
