Three Pitfalls when running an AWS S3 Static Website
Yet another one? 🥱
“Yet another blog on how to create a static S3 website? “ you might think. 🥱 Yes, exactly! Nothing new here except that this one prepares for real life production usage.
Many of the coding examples available do not go beyond a simple proof of concept. This one is for running and operating a static serverless website for professional scenarios like your personal blog. Do not get me the wrong way as I am not thinking that the available examples out there are bad quality! On the contrary, many great solutions and great sources of inspiration. But I could simple not find one sample project that checks all the boxes for go-live requirements in production.
There are some pitfalls in the form of security features that you can easily miss to put in place and that - from my perspective - are mandatory. Everything deployable using infrastructure-as-code (IAC) through a source code repo making use of all the benefits from AWS about security, availability and edge performance what have you.
“What about the costs?” you might think now. We know that S3 storage and cloudfront are fairly cheap for websites with small footprint and small access rates, but what if we enable all the features mentioned above?
We will look at the bill at the end.
As IAC tool we use Pulumi to able to deploy everything with one command.
As starting point served the Pulumi Example Repo on Gihtub. Some of the constructs used in there were deprecated back then, so I shamelessly copied the index.ts
file and modified it for my own purpose.
The Setup and the Static Content (images, html pages, css, favicon, etc.)
The static page is build with HUGO, one of multiple JAMStack Static Website Building Frameworks combined with a HUGO Blog Theme. In general, any other JAMStack framework (e.g.Next.js, or Gatsby) should work equally well to build static websites hosted on S3. So we do not worry too much about HUGO at the moment. Just pick whichever you prefer most!
As cloud object store I chose AWS S3 Buckets (Of course there is also Azure Blog Storage or Google Cloud Storage), as CDN I used Cloudfront and Route53 as my domain hosted zone to host my DNS Entries and validate the SSL certs. So far nothing special, you deploy the bucket upload your files, hook it up with Cloudfront, create a domain-validated SSL Certificate in ACM along with the corresponding DNS Records; ready you are.
However to make this stack suitable for daily life on the WWW, there are some pitfalls that one might not detect at a first glance and it’s better to go around them. These will be explained in the following.
🕳 Pitfall #0: Blocking Access to Cloudfront .net URLs
Usually you do not want to see anyone accessing your website through this weird looking .net cloudfront URL containing the AWS-autogenerated ID as domain name - other than yourself of course during development or in a QA environment if you have any for your infrastructure.
So you want to block direct access to cloudfront.net URLs and limit access to your own domain. Stephan Keable’s Medium Post describes and illustrates the concept of using the AWS WAF WebACL functionality very well. So for conceptual details please refer to his (blog post). I simply copied his idea around limiting the access by introducing an allow-list with a regular expression that only lets requests pass that start with your domain name. I transposed his guide into infrastructure code. Here, the Pulumi Typescript for it:
The rule containing the regexString
to allow only access to our target domain.
...
const blockDirectAccessToCloudfront: Input<IWebAclRule> = {
action: {
allow: {},
},
name: ruleName,
priority: 100,
statement: {
regexMatchStatement: {
fieldToMatch:{
singleHeader:{
name: "host",
},
},
regexString: `^${config.targetDomain}|^www.${config.targetDomain}`,
textTransformations: [{
priority: 100,
type: 'NONE',
}]
},
},
...
};
const rules = [
blockDirectAccessToCloudfront,
// Add more Rules here if needed
];
The WebACL itself. Note the array of rules
from above going into the construct
const webAcl = new aws.wafv2.WebAcl("block-direct-access-to-cloudfront", {
defaultAction: {
block: {}, // Block everything except what's allowed by the rules in the rule array
},
description: "Web ACL to block direct access to the cfnid.cloudfront.net address",
name: "webAclBlockDirectAccessToCloudfront",
rules: rules,
scope: "CLOUDFRONT",
tags: tags,
...
}, { provider: usEastRegion });
Finally you are able to serve cloudfront DistributionArgs
with the ID of the WebACL and integrate the WAF to sit in front of all requests to your content; similar to a fence around your property.
const distributionArgs: aws.cloudfront.DistributionArgs = {
...
comment: "Cloudfront Settings for a simple static S3 website",
webAclId: webAcl.arn,
aliases: distributionAliases,
enabled: true,
...
🕳 Pitfall #1: Cloudfront functions
It turns out that Cloudfront cannot deal with index.html
pages in sub folders by default. So we must first teach it how to. For a long time the way to do this on AWS was via Lambda@Edge functions that rewrite the URL (See details in the discussion here). However building lambdas seems a bit overpowered and cumbersome at same time for this comparably simple modification. Guess that was the reason that AWS came up with something simpler that is called “Cloudfront Functions”. Here the AWS blog announcing the feature.
In short it helps you to “be closer to your users” similar to Lambda@Edge, but much simpler to implement, configure and integrate into Cloudfront.
Here the code for it that is more or less just taken directly from this official AWS Sample collection and integrated as the following:
export function createCloudfrontUrlRewriterFunction(): string {
return `
function handler(event) {
var request = event.request;
var uri = request.uri;
// Check whether the URI is missing a file name.
if (uri.endsWith('/')) {
request.uri += 'index.html';
}
// Check whether the URI is missing a file extension.
else if (!uri.includes('.')) {
request.uri += '/index.html';
}
return request;
}
`;
}
/**
* Create a Cloudfront function for rewriting the URLs
* */
export function provisionCloudfrontFunction(): aws.cloudfront.Function {
return new aws.cloudfront.Function("rewrite-url-cf-function", {
name: "rewrite-url",
comment: "Handle cloudfront's inability to handle index.htmls in subfolders",
runtime: "cloudfront-js-1.0",
publish: true,
code: createCloudfrontUrlRewriterFunction(),
});
}
And again serve cloudfront DistributionArgs
with the cf function:
const cfFunctions: pulumi.Input<pulumi.Input<aws.types.input.cloudfront.DistributionDefaultCacheBehaviorFunctionAssociation>[]> = [];
const cfBuildHeader = provisionCloudfrontFunction()
cfFunctions.push({
eventType: "viewer-request",
functionArn: cfBuildHeader.arn,
});
...
const distributionArgs: aws.cloudfront.DistributionArgs = {
...
comment: "Cloudfront Settings for a simple static S3 website",
webAclId: webAcl.arn,
aliases: distributionAliases,
enabled: true,
...
functionAssociations: cfFunctions
},
🕳 Pitfall #2: OAC
This is actually no a pitfall, but a big improvement called “origin-access-control” (OAC) from AWS to Cloudfront Service that lets Cloudfront access your bucket without exposing it to the public. Therefore do not fall for sacrificing security in favor of a short cut solution allowing public access to the S3 bucket. Use OAC instead to connect your distribution with the private bucket!
A the time writing this post, there were not many coding examples for this available. This is likely to change/have changed. Here is the detailed post from AWS announcing the feature and here the relevant pulumi code snippets:
Create the originAccessControl
const originAccessControl = new aws.cloudfront.OriginAccessControl("origin-access-control", {
description: "Origin Access Control",
originAccessControlOriginType: "s3",
signingBehavior: "always",
signingProtocol: "sigv4",
name: "oac"
});
Finally serve the cloudfront the DistributionArgs
with the originAccessControlId
from originAccessControl
created above
const distributionArgs: aws.cloudfront.DistributionArgs = {
...
origins: [
{
originId: contentBucket.arn,
originAccessControlId: originAccessControl.id,
domainName: contentBucket.bucketRegionalDomainName,
},
],
...
Congrats! 🥳 You enabled a lot of real life usage features for your static S3 website.
To view the full source code check out the project repository on GitHub.
Costs
[Update from 2021-10-13] To get the cost details, please have look at this post the got published some time after.
Summary
(Never not afraid of showing unfinished work)
Coming soon… 🤞
Additional References
Just researching and looking for the right snippets that smart people created and trying to make them fit into my solution. Here are all resources in addition to the ones already referenced directly in text.
- Special thanks to vaga’s HUGO Blog Theme that served as a starting point for building static websites and provides the underlying theme for this blog website too.
- My Starting Example on Pulumi infrastructure: Github - Pulumi Examples
- Pulumi Docs - Starting Sample
- Github - Starting sample with outdated code
- WebACL to block direct access to cloudfront.net URL
- AWS Docs - Cloudfront Origin access control
- Cloudfront Functions
- AWS Docs - Troubleshooting Cloudfront
- Discussion on CF Function vs. Lambda@Edge
- AWS Cloudfront Sample Functions
- AWS Docs - Example Function Add Index
- Github - CF Functions for Pulumi