AWS Aurora — Why is it better?
1.0 AWS Aurora — Introduction
Amazon Aurora is a part of the Amazon RDS family. It is a MySQL compatible, relational database engine that combines of high-end commercial databases with the simplicity and cost-effectiveness of open source databases.
It is a fully managed database engine that is compatible with MySQL and PostgreSQL.
Amazon Aurora provides up to five (05) times better performance than MySQL and three (03) times better performance than PostgreSQL, without doing any programmatic changes to your existing application.
2.0 The RDS Multi-AZ Architecture
In a typical RDS (MySQL/PostgreSQL,etc), you basically finds the following Multi-AZ deployment. The primary, standby and read replicas can be in different instances and the storage lies within the EBS volumes of the EC2 instance, which RDS is made of.
3.0 The Aurora Multi-AZ Architecture
In a typical Aurora cluster, there should be minimum of three(03) Availability Zones (P.Note: Aurora only supports regions with minimum 03 Availability Zones). The storage and the compute (EC2) are separated / independent in the Aurora cluster (See the cluster volume on the diagram where the storage exists).
Each Availability Zone maintains two copies of the storage of each RDS Aurora instance and altogether minimum six (06) copies in all three (03) Availability Zones. The master / primary compute Aurora instance writes data to the data copies in the storage cluster. On the other hand, the read replicas in the other instances can read the data from the storage cluster. The SSD based (High IOPS, low latency) storage cluster is represented as a single, logical volume to the primary instance and to Aurora Replicas in the Aurora DB cluster.
An Aurora cluster can have only one master/ primary instance. There are no standby instances in the Aurora cluster. Read replicas could be used as standby instances at any given point.
Since the Aurora compute instances (EC2) and the Aurora storage cluster is independent, it is quite easy to scale the storage (can scale up to 64 Tebibyte [1]) rather scaling within EC2 instances in a traditional RDS Multi-AZ architecture.
There can be 15 read replicas (compared to 05 in a typical RDS Multi-AZ architecture). However the unlike in the typical RDS cluster, the reading happens synchronously in Aurora.
The cluster storage is billed based on the storage usage. The maximum storage level that you have used for the cluster is the billing storage for your cluster. It is also known as the High Water Mark. Even if you do not use the same storage capacity later on, you are still billed of the maximum storage capacity OR the High water mark level. If you do not use the high water mark level lately, the storage that you are not using can be re-used.
4.0 High Availability / Fail-Over
Aurora maintains HA by having Aurora Read Replicas in the other Availability Zones. Aurora automatically fails over to an Aurora replica in case the primary DB instance becomes unavailable.
4.1 Aurora Read Replicas
Up to 15 read replicas can be replicated across multiple Availability Zones. Since the storage cluster is represented as a single, logical volume, Aurora replicas can return the same data as query results with a minimum replica lag. This is quite different to RDS (the data is replicated to RDS Read Replicas in an asynchronous manner), where there is no need to replicate any data because the data is shared among all the instance (primary and read replicas) in the storage cluster.
4.2 Primary Instance Fail-Over
If the primary instance fails, Aurora automatically fails over to a new primary instance. It does either by creating new primary instance or promoting a read replica. There will be a slight interruption in this process to the cluster. The promotion of a read replica will take less time than creating a new primary instance. But if there are no read replicas in the cluster, there is no option but to create a new primary instance. Hence AWS recommends you to have at least 1 or more read replicas (with the same specification of the primary instance) to minimize the down time of the cluster.
5.0 Migration from other Database Engines
5.1 From RDS (MySQL/PostgreSQL) to Aurora
The data can be migrated from Amazon RDS for MySQL and Amazon RDS for PostgreSQL into Aurora. (For the moment the migration is limited to these two database engines only). Here migration can be done for both ways.
This can be done by creating RDS snapshots from RDS MySQL/ PostgreSQL databases and restore them to Aurora. (This is directly facilitated from the AWS Management RDS console).
In addition to that, the migration can be done from a standalone MySQL as well. (See the diagram).
5.2 From other Database Engines
If you wish to migrate databases from other databases (other than MySQL/ PostgreSQL), you can use AWS DMS.
6.0 Connection Endpoints
When your application connects to an Aurora cluster you will have to route through a Connection Endpoint. A Connection Endpoint is represented as an Aurora specific URL that contains a host address and a port. In a nutshell, the Connection Endpoints basically abstract the underline database cluster connections allowing them to be more abstract to the application.
When you create an Aurora MySQL/PostgreSQL instance, AWS creates three endpoints at three levels by default.
- Cluster level endpoints (Cluster endpoints)
- Read Replica level endpoints (Reader endpoints)
- Instance level endpoints (Instance endpoints)
Cluster Level Endpoints
This connects to the current primary DB instance in the DB cluster.
This is the only endpoint that can perform write operations. This is the first to be created while setting up the cluster with a single DB instance.
It provides the fail-over support for read/write connections to the DB cluster. If the current primary DB instance of a DB cluster fails, Aurora automatically routed to a read replica and promote that instance to a primary DB instance. That will be automatically reflected to the handler and everything is done automatically. Due to this smooth transition by the handler, the client will not feel that much of a downtime during the fail-over.
Reader Level Endpoints
These are built in endpoints for read replicas. If you have multiple read replicas, the reader level endpoint will balance the load among all the read replicas. If there are no read replicas available in the cluster, then the traffic will be transferred to the master instance.
Instance level Endpoints
At the instance level, one endpoint is created per instance. With the instance endpoint, you are connecting directly to the instance just like a traditional connection. Use of the instance endpoint only (without the cluster endpoint) is discouraged without a strong justification. You can use cluster endpoints along with instance endpoints for the manual load balancing of read queries.
Custom level Endpoints
In addition to above three endpoint levels, you can create your own custom level endpoints for your custom level requirements. These are called Custom level Endpoints. Unlike other three endpoint levels, this has to be created by yourself.
7.0 Aurora — Security
AWS Aurora security is managed at many levels.
- Using IAM can control who can access Aurora DB cluster.
- At the VPC level — (Aurora cluster should be created within a VPC)
- Encryption — The Encryption is handled at in transit and at rest. While creating the cluster, there is a check box to be clicked to activate the cluster encryption. However, you cannot encrypt an already created cluster. But you can restore an unencrypted cluster snapshot as an encrypted one. Once you encrypt the cluster, you cannot decrypt it as well.
8.0 Aurora — Global Databases
Aurora Global Database consists of
- One primary region — Performs both read and write.
- One secondary region — where read-only replicas reside; can be scaled out to have more replicas within the region itself.
The Global Databases are useful when any application has a worldwide access. Then the secondary region can be located based on the demand it has in worldwide. It can be useful in a regional fail-over as well. If for some reason, when one cluster goes down, the cluster can be promoted to be the primary one and even can have the read/write capability under one minute.
The data replication happens between regions with very low latency. (even under a second). A dedicated infrastructure is being used to do this task.
9.0 Aurora — BackTrack
Backtracking is bit similar to Backup-Restore procedure but it does not have the same versatility that you see in the Backup-Restore process. However, this is quite an easy process, if you had done a mistake in the databases to backtrack to the previous state. This will ease more time than the former method. However, the Backup-Restore process will allow you to work with snapshots, can work with different clusters compared to Backtrack, which confines you only to a single cluster.
10.0 Aurora MySQL Cross Region Replication
This is different to backtrack and here we do not use any global databases. This is only Aurora MySQL database replication from one region to another.
You can have up to five (05) cross-region DB clusters that are Read Replicas.
Here, Aurora takes a snapshot of the source cluster and transfers the snapshot to the read replica region. Compared to Global Databases, this takes more time for cross region read replica creation. These regional read replicas can always promote to a standalone cluster at any point.
11.0 Aurora Serverless
Aurora Serverless is the only RDBMS serverless offering of AWS other than the NoSQL serveless offering DynamoDB. It is compatible with both MySQL and PostgreSQL database engines.
Aurora Serverless has following benefits
1. Simple — Removes the complexity of managing database instances and capacity. The database will automatically start up, shut down, and scale to match your application’s needs. It is simple, cost effective option for infrequent, intermittent or unpredictable workloads.
2. Scalable — Seamlessly scale compute and memory capacity as needed, with no disruption to client connections.
3. Cost Effective — Pay only for the database resources you consume, on a per-second basis. You don’t pay for the database instance unless it’s actually running.
4. Highly Available — Built on distributed, fault-tolerant, self-healing Aurora storage with 6-way replication to protect against data loss.
References
- Tebibyte (Wiki) : https://en.wikipedia.org/wiki/Tebibyte
- Failover with Amazon Aurora PostgreSQL : https://aws.amazon.com/blogs/database/failover-with-amazon-aurora-postgresql/
- Deep Dive into Aurora (Youtube) : https://www.youtube.com/watch?v=U42mC_iKSBg
- Deep Dive into Aurora (Slidesshare): https://www.slideshare.net/AmazonWebServices/srv308-deep-dive-on-amazon-aurora