I was playing with a multi-AZ RDS the other day and noticed that I was burning money more quickly than I had anticipated/wanted, so I decided to go to a smaller instance size that wasn’t as expensive (until the environment was under production load performance wasn’t that important).
Maybe I should take a step back. AWS offers a service called RDS (that stands for Relational Database Service). The best description of this offering is straight from the AWS website: “Amazon RDS is a managed relational database service that provides you six familiar database engines to choose from, including Amazon Aurora, MySQL, MariaDB, Oracle, Microsoft SQL Server, and PostgreSQL. This means that the code, applications, and tools you already use today with your existing databases can be used with Amazon RDS. Amazon RDS handles routine database tasks such as provisioning, patching, backup, recovery, failure detection, and repair.”
Here’s the process to change the size of the multi-AZ RDS stack with zero downtime. Please note that your RDS environment must be multi-AZ. Obviously if you only have 1 server and turn it off…. you are going to have downtime 🙂 :
- Log into console
- Go to Services -> Relational Database Service
- Select Instances
- Select your “reader” instance:
- Choose Instance Action -> Modify:
- Select the size you want to change it to:
- Click Continue:
- Select “Apply Immediately” then click “Modify DB Instance”:
- In the RDS Instances window, wait for the instance Status to go from “modifying” to “available”:
- Now failover your current writer to reader status. Select your “writer” instance:
- Choose instance action -> failover
- Once the stack has failed over perform the same modification steps on the second RDS instances. Once that is complete failback the stack.
All of the standard caveats apply: If you are in production, you still need a maintenance window and you still need to notify your users/monitors/app teams 🙂
That’s it! Let me know if you have any questions!
Nice post, Chris!
It is a good idea to use Aurora to minimize downtime on RDS. However, as you mentioned if you do it in production you may still incur some minimal downtime.
To address the RDS limitation, we use a workaround to briefly launching Master to Master MySQL RDS configuration that allowed us to avoid downtime entirely during the maintenance.
Since we can write to both masters, we now can switch one APP server at the time to move to the new Master which already configured with the new storage for example.
It works well for us to avoid any downtime for all AWS maintenances as well a for our scheduled maintenance.
The solution is limited to MySQL however as it supports M/M.
.
Here are more details on how we do it:
https://workmarket.tech/zero-downtime-maintenances-on-mysql-rds-ba13b51103c2
How do you do this when running Oracle under RDS, I wonder?
Hi Ahmed,
You actually cannot do this process for Oracle RDS 🙁
That’s not zero downtime. Existing connections to the writer will fail when they try to perform reads and the failover is not instant.
So no, there will still be downtime.
Hi Henrik,
In the screenshots you will see that there are scheduling options to perform the resize at a time that is amenable to the workload. You obviously would want to schedule it when the writer isn’t hitting the back end 😉
I agree with Henrik. Even though you can schedule, it’s not a zero downtime approach. If I reboot a simple ec2 web server when there is nobody accessing it, there is still downtime in place anyway. For me, the title should be like “best approach”, or something like that.
Dear guys, I am trying to above method programatically in lambda python. But my sns topic/metric is on writer instance which check writers cpu capacity is >80% and it sends SNS – Triggers Lambda which shud increase the reader instance. But seems there is no metric which sends both writer and reader endpoint. Can you guys has some idea then please let me know
Lovely post!
Thanks for simplifying this process. 😀
Select your “reader” instance : In the screenshot you selecting writer which is misleading. But who need those screenshot right..
waiting to be active takes 10 minutes in my rds, those minutes are downtime 🙁