Taking over taxi020.se, yet another S3 rookie mistake

I was web browsing through the different taxi companies operating in Stockholm this morning when I eventually ended up on taxi020.se which responded with an HTTP 404 error. As I read the code and message for that error my search for waiting time at the airport terms and conditions got all of a sudden more interesting.

Sverige Taxi Taxi 020 merged with Sverigetaxi in 2016 and is now part of the Cabonline Group.

404 Not Found

For anyone familiar with the Amazon Web Services (AWS) ecosystem the following screenshot of the HTTP 404 error message gives away a couple of hints on the website’s underlying infrastructure.

Taxi020.se - 404 (NoSuchBucket)

  1. taxi020.se is served out of an S3 bucket (and therefore is hosted on AWS)
  2. The S3 bucket does not exist (and might be up for grabs)
  3. The missing S3 bucket should be named taxi020.se

I obviously needed to try something out and signed in to the AWS console.

Let’s take over

Taking over (sort of) taxi020.se was just a few clicks away and soon I had:

  1. an S3 bucket named taxi020.se in the eu-west-1 region (most probable choice)
  2. enabled static website hosting and configured requests redirection to this very blog

Taxi020.se - S3 bucket configuration

I could now just point my web browser to https://taxi020.se and follow the network requests to see all this in action.

Taxi020.se - Network requests

What did go wrong?

Whether the missing S3 bucket was the result of a manual mistake by a user with excessive privileges, of an automated deployment gone wrong–a CloudFormation template lacking a DeletionPolicy: Retain on the S3 bucket resource for example–or the result of something completely different is for the engineers at Cabonline Technologies to figure out. I’m just throwing a few educated guesses here and some ideas on how the infrastructure should have been provisioned.

Do NOT point a DNS record directly to a S3 static website endpoint
Bucket names are globally unique.

According to the Setting up a Static Website Using a Custom Domain walkthrough, “the bucket name must match the name of the website that you are hosting.” That can be problematic if for some reason a bucket named after the domain you want to host a static website for already exists in AWS.
Bucket names being globally unique it is very likely that someone else has already created a bucket with the name you need (your domain name) and… there is nothing you can do about it.

Use a Content Delivery Network (CDN) like Amazon CloudFront
Serving content directly out of S3 is not optimal.

Serving content directly out of an S3 bucket located in a specific AWS region is not optimal from a user experience point of view. Each user requesting content from taxi020.se (most of them probably located in Sweden) will have to suffer from the request/response round trip to Ireland (the eu-west-1 AWS region) meaning extra latency and slower load times.

It is not optimal from a cost perspective either since each user will, for each HTTP request, fetch all of the webpage’s resources (images, css files, …) directly from the S3 bucket which translates to many GET operations to the S3 service (priced per 1,000 requests) and unecessary data transfer (priced per GB) out from S3 and to Internet.

Improved infrastructure

S3, CloudFront & Origin Access Identity
Tighten security, improve performance and cut cost.

Part 1
One can serve a static website directly out of S3 and have such a solution up and running in a matter of minutes–given that the bucket can be created–but enabling CloudFront does not require much extra work and works regardless of the S3 bucket’s name:

  1. Create an S3 bucket (or keep the existing one and disable Static website hosting) with a name that makes sense but does not have to match the domain name
  2. Create a CloudFront distribution for the domain…
  3. … with an origin pointing to the S3 bucket endpoint

Part 2
To further improve the solution, requests to the S3 bucket should only be allowed via the CloudFront distribution so that users cannot retrieve objects directly from S3 and bypass the content delivery network:

  1. Create a CloudFront Origin Access Identity
  2. Enable the Restrict Bucket Access option on the CloudFront distribution S3 origin using the Origin Access Identity created
  3. Create/Replace the S3 bucket policy to only allow the CloudFront distribution to get objects from the bucket
{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity my_cloudfront_distribution_id"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::my_bucket_name/*"
        }
    ]
}

Limiting access to the S3 bucket from the CloudFront distribution only with Origin Access Identity comes with a price though as CloudFront must use the S3 REST endpoint–instead of the HTTP endpoint–which does not support redirection to default index pages (see part 2 of this blog series).

Part 3
Last but not least, point the DNS record to the CloudFront distribution.

Template your infrastructure and forget about names
Infrastructure as code to the rescue.

Even though this more secure and cost effective infrastructure is only a few more clicks away from the straight-out-of-an-S3-bucket kind of static website hosting it is a bit more cumbersome to create and maintain.

There obviously is a solution to help deal–create, update and delete–with the needed resources and make sure they work together nicely: infrastructure as code also known as CloudFormation templates–and stacks–in AWS (see part 2 of this blog series).

Taxi020.se is now www.sverigetaxi.se

While this was not the hack of the year it once again outlines the importance of securing the resources needed to host static assets on Amazon Simple Storage Service (S3) and I hope some of the improvements, if not all, I described in this blog entry make sense.

The guys at Taxi 020 solved the issue after a couple of hours and before I could publish this blog post. https://taxi020.se is now redirected to https://www.sverigetaxi.se/ which points to… a CloudFront distribution.

Part 2: Static website hosting on AWS with S3, CloudFront & Lambda@Edge