AWS CloudFront Architecture

Crishantha Nanayakkara
9 min readSep 23, 2021

--

The AWS Content Delivery Network (CDN)

The Background

AWS CloudFront (CF) is AWS fast, programmable and secure Content Delivery Network (CDN).

This article will summarize multiple key concepts related to AWS CF such as CF architecture, Edge Locations, Caching Process, Behaviors, Time To Live (TTL), Cache Hit Ratio, Cache maximization strategies, Cache invalidations, Origin security and Lambda@Edge.

Lets dive into them one by one now!

Content Delivery Network (CDN)

A Content Delivery Network (CDN) is a global cache that stores copies of your data on Edge Location caches, which are positioned close to your customers as possible.

A better way to solve the web latency issues for very remote servers in the Internet can be solved having a CDN.

AWS CloudFront (CF)

CloudFront is the AWS CDN and it is a Global (not Regional) service. It consists of a global network of Edge Locations, that are distributed across the globe. It is complaint with both PCI DSS (except storing credit card information in the cache) and HIPAA standards.

To deliver content to end users with lower latency, Amazon CloudFront uses a global network of 225+ Points of Presence (215+ Edge locations and 13 regional mid-tier caches) in 90 cities across 47 countries. (Updated — May 2021, Source: AWS Documentation) (See Figure 01).

Figure 01 — AWS Edge Locations (Source: AWS Documentation)

CloudFront Components

There are multiple components in AWS CloudFront

  1. CF Origin — The source location of your content
  2. CF Distribution — The configurable unit of CloudFormation
  3. Edge Locations — The local cache of your data
  4. Regional Edge Caches — A larger version of an Edge location, which sits between the origin and an typical Edge location primarily to improve the performance [2].

CloudFront Caching Process

There are multiple steps involved in CloudFront caching process (See Figure 02).

On the diagram (Figure 02) you can see two users from the same region are trying to access a single file from the Origin (S3). There are two Edge cache locations having connected a single Regional Edge cache in the same region. Each user points to separate Edge cache locations.

Figure 02 — The CloudFront Architecture

Step 1: The user request is landed on the closest Edge location. The process checks the requested resource (image) is available at the Edge location.

Step 2: If the content is available, it returns the successful response with the requested image. This is a “Cache Hit” scenario.

Step 3: If it is not available at the Edge location, the process requests it from the Regional Edge location. This is a “Cache Miss” scenario. If it is available it sends the image back to the requester.

Step 4: If not, it requests it from the AWS origin

Step 5 and 6: The process returns the image back to the requester.

Step 7: Another user tries to retrieve the same image, which the first user tried. The second user gets it from a different Edge location close to his access.

Step 8: Since the second Edge location does not have the image file (since it was copied only to the first Edge location before), it tries to get it from the Regional Edge location, which the first user also used. (Remember that multiple Edge location can share the same Regional Edge location).

Step 9 and 10: Since the Regional Edge location already has it, it returns the image file back to the second user.

CloudFront Behaviors

Behavior is a configuration within an AWS CF Distribution (See Figure 03).

A distribution can have many behaviors which are configured with a path pattern. If requests match that pattern, that particular behavior is used, if not the default behavior is taken into consideration.

CloudFront Behaviors control much of the Origins, TTL, protocol policies and privacy settings within CloudFront.

Figure 03 — CloudFront Behaviors

The TTL

TTL is the time an object stays or active at an Edge location and by default the TTL value is 24 hours.

Even if you have a new copy of the file being requested at the Origin, if the request comes within the TTL to Edge location, the Edge location returns the file copy at the Edge location back to the client.

Once the file expires (exceeding the TTL) then it will look at the file at the origin to see whether file has got changed compared to the edge location copy.

If it is not changed → It will return 304 Not Modified response.

If it is changed → It will return 200 OK response and will copy the new copy to the Edge location and will return the same to the client.
Minimum and Maximum TTL can be specified at the object level.

Cache Hit Ratio

The ratio of requests served from edge locations (rather than the origin) is known as the cache hit ratio.

The more requests from edge locations, the better the performance.The ratio is higher the better.

Maximizing the Cache Hit Ratio

The following strategies can be set to maximize the cache hit ratio

  1. Specifying the cache duration — Using Cache Control max-age directive. The shorter the duration, more frequently CF forwards another request to your origin to determine whether the object has changed and if so to get the latest origin version.
  2. Caching based on the query string parameters — Maintaining a consistent naming convention in the query string can reduce multiple calls to the origin.
  3. Caching based on the Cookie values — Creating a separate cache behaviors for static (.css files) and dynamic content (.js files) and configure CF to forward cookies only for the dynamic content (.js).
  4. Caching based on Request Headers — Use only specific header for caching rather using all headers.
  5. Remove Accept-Encoding Header — when compression is not needed.

Content Invalidations

You can remove files from your origin that you no longer want to be included in your CloudFront distribution. However, CloudFront will continue to show viewers content from the edge cache until the files expire.

Cache invalidations do occur at the distribution level and applied to all edge locations involved in. If you want to remove a file right away, you must do one of the following:

  1. Invalidate the file (See Figure 04)
  2. Use file versioning — When you use versioning, different versions of a file have different names that you can use in your CloudFront distribution, to change which file is returned to viewers.
Figure 04 — Invalidating a File

Alternate Domains and SSL

Once a CloudFront Distribution is created for an Origin, it will generate a CloudFront specific public DNS for you (https:://xxxxxxx.cloudfront.net). This is the “Default Domain” it creates for you (See Figure 05).

Figure 05 — The Default Domain

If we are planning to use a CloudFront distribution with one of our production level domains (as Alternate Domains), then we need to use AWS Certification Manager (ACM) or any other SSL certificate provider to create a valid legitimate SSL certificate.

Make sure to use us-east-1 (N.Virginia) while creating the AWS Certificate when you use ACM (This is a restriction when you use ACM with a global service such as CF. However if you use ACM with other services such as ALB, you are required to generate the certificate in the same region of the service you are in).

You cannot use Self Signed Certificates with CloudFront and only certificates issued by a Trusted Certification Authority (CA) such as Verisign, Comodo, Digicert, Semantec or AWS ACM (Certificate Manager) are allowed.

There are two connections while creating a secure connection to multiple origins via CloudFront.

  1. Client -> CloudFront
  2. CloudFront → Origin (Native or Custom)

See Figure 06 for the above explanation.

Figure 06 — Invoking multiple origin types via CloudFront Edges

Securing the Origin via CloudFront

As you know, CloudFront Edge locations sits between the origin and the client. This architecture allows any user to access both the origin and the CF edge locations via public URLs. This is not secure and not a better practice.

Therefore, CloudFront can implement multiple ways to prevent this.

  1. Restricting S3 origins using Origin Access Identities (OAI)s.
  2. Restricting custom origins using custom headers
  3. Restricting custom origins using firewalls
  4. Geo-restrictions

1.0 Restricting S3 Access with Origin Access Identity (OAI)

To restrict access to content that you serve from Amazon S3 buckets, follow these steps.

  1. Create a special CloudFront user called an Origin Access Identity (OAI) and associate it with your distribution.
  2. Configure your S3 bucket permission so that CloudFront can use the OAI to access the files in your bucket and serve them to your users. Make sure that users can’t use a direct URL to the S3 bucket to access a file there.

OAI is a type of identity, which can be associated with CF distributions. In this scenario CF becomes the OAI.

Figure 07 — Securing S3 via OAI

After you take these steps, users can only access your files through CloudFront and not directly from the S3 bucket. An AWS Account can have up to 100 CloudFront OAIs.

You may refer to my article on this topic here.

2.0 Restricting custom origins using custom headers

A customer header is injected at the edge location to the request and the origin will serve the request only if the custom header is present in the request (See Figure 08).

Figure 08 — Securing the custom origin with a custom header

3.0 Restricting custom origins using firewalls

If neither OAI nor custom headers are applied, you can use a firewall to secure the origins from malicious attacks. This is possible while restricting and specifying the access only to the edge location IP range (See Figure 09).

Figure 09 — Securing the custom origin with a Web Application Firewall

4.0 Geo Restrictions

By default CF caches your data to all of the edge locations unless you specify specifically to a set of regions (North America, Europe, Asia, Middle East or Africa) using the list given while creating the CF distributions (See Figure 10).

Figure 10 — Selecting your Edge location regions

If you need to restrict / allow the access to one single location, you can still achieve this by enabling the Geo-restriction mode. With CF Geo-restriction mode you can restrict / allow to a “Country” only. CF uses a Geo-IP Database for this purpose to track the user location.

However, if you need to restrict / allow access to any other attribute other than the “country”, you need to use a 3rd party Geo-location mechanism. These 3rd party Geo-locations are basically managed by a compute function, which is attached to a Geo-location database and other sources, which can give more information about the user.

Lambda@Edge

Lambda@Edge Is a feature of CF, which allows you to run lightweight Lambda Functions @ CF Edge locations. These Lambda functions can adjust traffic between the viewer and the origin. However these Lambda functions do not have the full Lambda feature set, which you see in other normal Lambda functions. For the moment, only NodeJS and Python languages are supported as runtime languages. Lambda Layers are not supported with Lambda@Edge and only runs in AWS public space (not within VPCs).

The Lamda@Edge architecture consists of four Lambda functions to cater a typical viewer request (See Figure 10).

  1. Viewer Request Lambda: Executed after the CF receives a request from the client
  2. Origin Request Lambda: Executed before CF forwards the request to the origin
  3. Origin Response Lambda: Executed after the CF receives a response from the origin
  4. Viewer Response Lambda: Executed before a response is forwarded to the client
Figure 10 : Lambda@Edge architecture

At each of these four steps, you can execute any change to the normal request traffic to cater to your application level changes.

For example, If you need to change the S3 bucket origin for a parameter in the request header, you can check the parameter probably at the Origin Response Lambda function and route the traffic to the desired S3 bucket. This could be very much useful for places such as A/B testing in application product migrations.

There are few more examples related to Lambda@Edge could be found here.

Hosting a Secure CF Distribution via AWS Route53

You may follow one of my previously published articles on this topic to have a step by step approach on this subject.

Thank You and hope you now have a decent idea what AWS CF can offer to you!

References

  1. Hosting a secure CF Distribution via AWS Route53: https://crishantha.medium.com/hosting-a-secure-aws-cloudfront-endpoint-via-aws-route-53-be65d42191b7
  2. https://aws.amazon.com/about-aws/whats-new/2016/11/announcing-regional-edge-caches-for-amazon-cloudfront/
  3. Securing S3 with Origin Access Identity (OAI) via CloudFront: https://crishantha.medium.com/securing-s3-with-origin-access-identity-oai-via-cloudfront-147467eae8aa

--

--

Crishantha Nanayakkara
Crishantha Nanayakkara

Written by Crishantha Nanayakkara

Enterprise Architect | Consultant @ FAO (UN) | Former CTO, ICTA Sri Lanka

Responses (5)