Google cloud platform (GCP) provides RDBMS (Relational Database Management Service), Cloud SQL fully managed highly available service. As per the google documentation it is highly available but a good software design always prepares for failure.
Consider a scenario in which your server fails from that zone. To handle this situation, when cloud SQL enabled on an instance it provides data redundancy across more than one zone in single region. Does this means you are safe now? Your primacy Cloud SQL instance fails, it will automatically fail over to your standby instance within a region in different zone without any intervention, and your client application will continue as expected.
But what will happen if entire region fails which takes all your Cloud SQL instances (Primacy as well as standby). I know regional failures are rare but organization should always be prepared for any worst case scenario. Regional failure can be a service failure – situation in which specific service faces an issue that can cause all the services in the region goes down.
How can we protect data against region failure, very high level of redundancy is required. Now instead of relying on single region, need to define cross-region high availability.
To Make Cloud SQL’s regional deployment highly available, primacy instance exists in one GCP zone and a standby instance exists in another zone. Through synchronous replication to each zones persistence disk, all writes made to primary and also in the standby instance. This configuration helps from zonal graceful failure to standby instance. In this case data will be available to client application without any interruption.
Cloud SQL’s regional high availability can be upgraded to cross-region high availability simply by provisioning a read replica in another region.
In order to have the ability to fail over to a different GCP Region in a Disaster Recovery scenario, at least one Cloud SQL read replica must be provisioned in a separate GCP Region prior to an incident.
Make sure don’t forget to configure cloud monitoring alerts on instances. This will notify as soon as instances are offline.
Even after configuring cloud monitoring how you will decide when to conduct cross-region failover? … Click Here
Even though GCP offers fully managed highly available Cloud SQL service, but this feature is limited to the GCP region that the instance resides in. For critical applications, a higher level of resiliency is recommended. In this situation organization can decide to move on the global database option (GCP cloud spanner) or use automation script which help then to recover from cloud SQL regional failure in less than 30 min.