Planning HADR in SQL Server

To successfully implement high availability for on-premises implementations of SQL Server, you need to understand the difference between high availability and disaster recovery, and how to plan solutions that can successfully protect data and services.
High availability and disaster recovery are related, though distinct, concepts. Both involve protecting and maintaining access to data and services. However, the purpose of a high availability solution is to prevent service outages from occurring in the first place. The purpose of a disaster recovery solution is to enable you to make services available again after an outage has occurred.

One type of disaster recovery solution that most organizations use in some form is data backups. The problems with relying solely on database backups for disaster recovery include:

  • Backups only preserve data
  • Recovery by using backups can be relatively slow

By using SQL Server AlwaysOn FCIs and AlwaysOn Availability Groups, you can create disaster recovery solutions that can rapidly bring services back online after an outage. However, these technologies do not eliminate the need for backups entirely. Even if you implement an AlwaysOn disaster recovery solution, you should still carry out a backup schedule to enable recovery from scenarios such as:

  • Recovery from logical data corruption
  • Widespread disaster

The process of planning high availability and disaster recovery typically includes calculating RTO and RPO. The RTO is the period of time within which the service needs to be restored, and the RPO is the period during which data loss is deemed acceptable in the event of a failure. RPO and RTO are targets dictated by the needs of the business, and planners of high availability and disaster recovery planning solutions can use these targets to guide them in their decision-making.The following table provides general guidelines that you can use to incorporate RTO and RPO into high availability and disaster recovery plans:

[table id=38/]

Considerations for Implementing High Availability and Disaster Recovery

In addition to RPO and RTO, there are various additional factors to consider when planning a high availability and disaster recovery solution.

  • The number of sites
  • The cost of maintaining idle servers for failover
  • Quorum considerations
  • Store tempdb locally to reduce I/O on shared storage
  • Use AlwaysOn technologies to implement high availability and disaster recovery solutions:
    • Multi-site failover cluster instance
    • AlwaysOn availability group
    • Failover cluster instance and availability group in combination

You can use the following combinations of SQL Server technologies to create high availability and disaster recovery solutions:

[table id=39/]

  • Degraded availability
  • Test high availability and disaster recovery plans
  • Redundancy solutions for network components such as network cards, switches, and routers.
  • Standardization of permissions and file paths across sites where appropriate.
  • The availability of security certificates in all sites.
  • Implement identical SQL Server logins on each SQL Server instance. Using Active Directory domain logins reduces complexity in this area because you do not need to ensure that passwords are the same on all instances.
  • A mechanism for ensuring that SQL Agent jobs can run on all instances.

There are several common architectures that database architects working with SQL Server use to implement high availability and disaster recovery. Multi-Site Failover Cluster Instance SolutionYou can use a multi-site AlwaysOn FCI to provide local high availability and a remote disaster recovery site. A multi-site FCI protects the entire SQL Server instance, both in terms of high availability and disaster recovery.Consider the following points for high availability and disaster recovery solutions that use a multi-site FCI:

  • To ensure that data remains available after a disaster, a multi-site FCI requires a storage solution in each site, and relies on hardware-based replication to copy data from the shared storage in the primary site to the shared storage in the secondary site. For this reason, the recommended shared storage solution for a multi-site FCI is two storage area networks (SANs). When you create a multi-site FCI, you should skip the storage validation step during configuration, and when prompted confirm that you do not require support from Microsoft for the cluster. In fact, selecting this option does not disqualify you from obtaining support from Microsoft because a multi-site FCI is a supported configuration.
  • You can use a multi-site FCI regardless of the recovery model that your databases use, unlike AlwaysOn Availability Groups, which require databases to use the FULL recovery model.
  • When you configure a multi-site FCI, the IP address resource dependency is automatically set to use an OR dependency.

AlwaysOn Availability Group Solution You can use an AlwaysOn Availability Group to create a high availability and disaster recovery solution.You should consider the following points when planning to use an AlwaysOn Availability Group for high availability and disaster recovery:

  • All servers in the Availability Group need to be in the same Active Directory domain.
  • Like the AlwaysOn FCI solution, using an AlwaysOn Availability Group for high availability and disaster recovery requires a WSFC to support it. However, a key difference is that an AlwaysOn Availability Group does not require shared storage, which means that the hardware and implementation costs are usually lower.
  • An AlwaysOn Availability Group delivers high availability and disaster recovery for a single database or group of databases, not for the entire SQL Server instance.
  • You can use the Availability Group secondary replicas, including the secondary replica at the disaster recovery site, as active secondaries. This enables organizations to make more efficient use of server resources. 
  • All databases that participate in an AlwaysOn Availability Group must use the FULL recovery model.
  • You can use up to eight secondaries if required, enabling you to scale out read-only workloads, improve response times by adding active secondaries to local sites such as branch offices, and add resiliency by placing replicas in more sites.
  • Removing the quorum vote from the secondary replica in the disaster recovery site is recommended practice. If the primary site does not contain an odd number of WSFC nodes, you can use the Node and Fileshare Majority quorum model and add a file share witness to ensure that the cluster has an odd number of quorum votes.
  • Failover in the primary site is automatic, but if a disaster occurs and you need to fail over to the disaster recovery site, you must perform a manual failover operation. To do this, you need to carry out the following actions in the disaster recovery site:
    • Force quorum on the cluster node that hosts the secondary replica in the disaster recovery site and start the Cluster Service. 
    • Force failover of the availability group. You can use the Transact-SQL ALTER AVAILABILIITY GROUP statement with the FORCE_FAILOVER_ALLOW_DATA_LOSS option to do this. You can also use SQL Server Management Studio (SSMS) or PowerShell.
    • Change the quorum voting configuration by removing votes from the nodes in the primary site and giving a vote to the node in the disaster recovery site.
    • Because an Availability Group only protects databases, you will need to transfer logins and SQL Server Agent jobs separately. 
    • After failover to the disaster recovery site, the service is not highly available. To make it highly available, you could add a secondary replica in the disaster recovery site.

Combined AlwaysOn Failover Cluster Instance and AlwaysOn Availability Group Solution You can use a combined AlwaysOn FCI and AlwaysOn Availability Group high availability and disaster recovery solution. The FCI provides local high availability for the SQL Server instances, and the AlwaysOn Availability Group enables disaster recovery for the database availability set.When planning a combined AlwaysOn FCI and AlwaysOn Availability Group solution, you should consider the following points:

  • The solution requires a single WSFC and a SQL Server AlwaysOn FCI at the primary site. You can use either a second SQL Server AlwaysOn FCI or a stand-alone SQL Server instance installed on a WSFC node in the disaster recovery site. The advantage to using a SQL Server AlwaysOn FCI at the disaster recovery site is that you can ensure continued high availability at the instance level after a disaster. 
  • Each site has its own storage, which is not visible to the nodes in the other site. When shared cluster storage is only shared between some of the nodes in a cluster, this is referred to as asymmetric storage. Windows Server 2012 supports asymmetric storage, as does Windows Server 2008 R2 with Service Pack one. 
  • You can use the secondary replica in the disaster recovery site as an active secondary if required. 
  • All databases that participate in the Availability Group availability set must use the FULL recovery model.
  • As with the AlwaysOn Availability Group solution described in the previous topic, you should remove quorum votes from the cluster nodes in the disaster recovery site. If there is an even number of nodes in the primary site, you can use one of these two configurations:
    • Node and fileshare majority quorum model with a fileshare witness.
    • Node and disk majority, using the asymmetric storage as a disk witness. 
  • Failover between FCI nodes is automatic, but when you install an AlwaysOn Availability Group on an FCI, failover between the primary replica and the secondary replica in the Availability Group is a manual operation. To fail over to the disaster recovery site, perform the following steps:
    • Force quorum on the cluster node that hosts the secondary replica in the disaster recovery site and start the Cluster Service. 
    • Force failover of the availability group. You can use the Transact-SQL ALTER AVAILABILIITY GROUP statement with the FORCE_FAILOVER_ALLOW_DATA_LOSS option to do this. You can also use SSMS or PowerShell.
    • Change the quorum voting configuration by removing votes from the nodes in the primary site and giving a vote to the node in the disaster recovery site.
    • Because an Availability Group only protects databases, you will need to transfer logins and SQL Server Agent jobs separately.

 

Share This Story

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp

Share Your Comments

About The Author

Search Articles

Categories

Follow Fard Solutions

Share Your Comments

Fard Solutions Sdn. Bhd. Company was established in 2012 in Kuala Lumpur. We are engaged in development, implementation, integration and maintenance of the most advanced database solutions and IT research & development (innovation).

(+60) 03 8994 6788

info@fard-solutions.com

Fard Technology Corp.
700-1199 West Hastings Street,
Vancouver, British Colombia,
V6E 3T5 Canada.

Fard Solutions Sdn. Bhd.
1-1C & 1-2C Incubator 1,
Technology Park Malaysia,
Bukit Jalil, 57000
Kuala Lumpur, Malaysia.

  • Copyrights © 2020 Fard Solutions Sdn. Bhd. , All rights reserved.
  • Privacy Policy
  • Terms of Service
  • A Project By Website Artisan