Applicable: SQL Server 2005, SQL Server 2008, SQL Server 2008 R2, SQL Server 2012 and SQL Server 2014.
In the era of data explosion and highly transactional systems, the performance and stability of the database systems are one of the major concerns. In some environment due to data size and frequent transactions the latest hardware technology is not able to provide data availability and best performance compare to the cost of hardware and software. Basically, the best approach is to scale-out the database environment to address this issue, database scale-out can be done by different solutions which this article provides basic information regarding different type of data involved, and factors can affect scale-out solution. Many books, articles or even white papers have been written about SQL Server Scale-Out technologies, so this article focuses on the implemented Scale-Out solution by Fard Solutions Sdn Bhd.
What is Scale-Out?
Scalability is the application ability to effectively and efficiently use more resources in order to accomplish more useful work tasks. For example, an application can service four (4) users on a single-processor system which might be able to service fifteen (15) users on a four-processor system. In this case, the application scalable, otherwise it is not scalable.
There are two (2) different type of scalability as following:-
Scale-Up: Means scaling the application to a bigger and more powerful hardware, for example, going from a four (4) processor hardware to a sixty-four (64) processor hardware. This is the most common way to scale databases, usually when database runs out of resources on the current hardware, companies purchase more powerful hardware to increase the performance and stability. This approach has advantage of not requiring significant changes to the database. In some cases, Scale-Up solution does not work even the latest and most powerful hardware is provided, that is the time to implement Scale-Out solution.
Scale-Out: Unlike Scale-Up solution, it needs expanding to multiple hardware. It is usually has some initial hardware cost advantages, for example, eight (8) hardware with four (4) processors are generally cost less than one thirty-two (32) processor hardware, but this advantage is often cancelled due to maintenance costs. The data redundancy offered by a Scale-Out solution is also useful from an availability perspective.
Type of Data Involved in Scale-Out Solution
Major challenge to implement successful Scale-Out solution is recognizing types of data used by applications, each type of data imposes different requirements on a Scale-Out solution architecture. When there is only one database, it is easier to just assume all data are same, but when it starts splitting up and replicating data for Scale-Out, it become more important to understand how data is used. Commonly in many applications multiple Scale-Out approaches are required due to multiple types of data are involved. This section explains three different types of data and how they influence Scale-Out solutions.
As its name implies, is data that is used by an application but might not be maintained by the same application. Reference data is relatively stable and it is valid for specific period. For example, catalogues used by order entry systems, schedules for train or airlines, and charts of accounts used in financial systems. Reference data might frequently change in some other systems which lead to confusion. For example, if a price on a price list changed many times a day, customers would be confused and dissatisfied, reference data is often changed at fixed intervals – For example, prices may change nightly or even weekly and account numbers might change once a month. Reference data is easy to Scale-Out to provide improved performance with a minimal investment.
Activity data is the data associated with a particular activity or business transaction. For example, a purchase order or stock sale will generate some data that is associated with the transaction. This data is relevant within the scope of a particular business activity and except for historical reasons. In general activity data becomes reference data once the activity is completed and same Scale-Our considerations apply. Activity data usually has low concurrency and update rates requirements. Activity data Scale-Out solution is easy to implement by replicating data into different hardware with provided data partitioning for better performance.
Resource data is the core data that every business depends on – such as, inventory, account data, customer master data and so on. If resource data is lost, basically the company is out of business; therefore, resource databases use a lot of data integrity and high availability features to ensure that critical data is always available. Resource data usually needs high concurrency and update rates requirements. The resource data is generally only the currently active data. Inactive accounts, discounted parts and so on, are generally maintained in relatively static history tables and then become reference data. Due to this transformation from resource data to reference data reduces the size and need for Scale-Out of resource data.
Scale-Out Effective Factors
The characteristics of data usage can impact the choice of Scale-Out solution. SQL Server currently supports several Scale-Out technologies, and the choice of which one to use depends largely on the type of data and application involved.
Some data is updated very frequently. For example, log data from web servers, and order entry systems (notice that these are both INSERT intensive rather than UPDATE intensive, but for our purposes, they can be considered same thing). Data that is updated very frequently is often difficult to replicate very efficiently, because the overhead of replicating updates limits the scalability of the replicated copies. There should be an understanding regarding the update rate of data when there is consideration of any replication options for Scale-Out solution.
A database with very low update rates is easy to replicate, and therefore replication can be a good Scale-Out solution, so that database can be marked as Read-Only most of the times.
Anticipation of Change in Application
The ability of change in an application can have a large impact on which Scale-Out solution are viable for you. Some Scale-Out solutions work with no changes to the application at all, some may require fairly minor changes to the data access layer, queries and stored procedures, and others may require re-designing the way the application works. For sure, the maximum flexibility is available if the new application design takes place. It is better to design the application for Scale-Out solution even it is not required for the initial implementation. Packaged applications and some legacy applications may be impossible to change, and therefore any Scale-Out solution needs to be transparent to the application code.
Ability to Partition
Partitioning data among several databases is the one of the most effective solution to Scale-Out data so data each database server can handle a portion of the data. Be aware not all data can be effectively partitioned, and if it can be partitioned, the way it is partitioned has a major effect on performance. For example, order entry system’s database might be partitioned based on what was ordered – book orders in one database, clothing orders in another database and so on. Aside from the obvious issues, such as what happens to an order that includes both books and clothes, this scheme would not work well due to query joins and the order categories.
Another way to partition order entry system’s database would be to split the orders by order-number range or order-date range. This might make sense if orders are most often accessed by order number instead of by location or order category. Not all application database can be effectively partitioned, and choosing the right partitioning scheme is essential for effective Scale-Out with partitioned data.
To address the order entry system’s database partitioning issue, it is recommended to make partition based on customer number and keep each relevant order data in the same server to reduce the distributed joins.
Data Coupling and Interdependence
Another way to distribute data is to split it based on usage. If some parts of the database are used by different applications, it might makes sense to split the database along application boundaries, then each application processes dedicated data which belongs to it. For example, if the order entry system’s data is split to all order lines are in one database and all the order headers are in another database, virtually every query that accesses orders will have to do a distributed join between two databases, at this point, the network traffic generated by distributed joins will cancel our any benefits from distributing the data.
Perhaps the best tactic to determine how a database should be split is through a careful analysis of the data mode to find out the relationships between entities.
Enterprise Scale-Out Solution
Now that we understand the factors and the type of data involved, affect the Scale-Out solution. Currently SQL Server provides few Scale-Out solution called ‘Scalable Shard Database’, ‘Peer to Peer Replication’, ‘Distributed Partitioned Views’ and ‘Linked Servers and Distributed Queries’. All of mentioned solutions have Advantage and Disadvantage as following:-
In this section, we are going to design a new Scale-Out solution based on available features in SQL Server. For starting point, the ‘AdventureWorks’ database will be our sample. We would like to make a Scale-Out approach which is ‘Less Transparent to Application’, ‘Low Network Traffic’ and ‘High Performance’.
The early stage to implement such Scale-Out is database design analysis, therefore we are able to design a robust and long term solution. In this stage we are in need to analyze and find out the group of related entities in the database and place them into different hardware server. As figure 1 shows, the ‘AdventureWork’ database has a lot of relationship between entities. In this case we are going to design and implement Scale-Out solution based on ‘Linked Server and Distributed Queries’ and ‘Data-Dependent Routing’ features and solutions. After analyzing the ERD, we need to find close entities to each other and make few ‘Entity Island’, the number of ‘Entity Island’ is highly depends on the number of hardware servers that you are willing to use. Figure 2 shows three (3) ‘Entity Island’ that each sits into individual hardware to provide better performance and Scalability.
Each ‘Entity Island’ will be placed into an individual hardware, and will be connected to each other through ‘Linked Server and Distributed Queries’ feature of SQL Server. At this stage database is already Scaled-Up, the major problem is the internal network traffic between each server node and application connectivity. To address these two (2) major issues, in this Scale-Out solution the application is in need to have an ‘Orchestrator’. The ‘Orchestrator’ needs to be an intelligent enough to track entities and Entity Island to provide better connectivity and reduce the internal network traffic. The Orchestrator is responsible to parse and find the source tables in the query and advice the most appropriate Entity Island to reduce the network traffic and distributed queries between Entity Islands. Figure 3 shows the deployment of Scale-Out solution.
Above figure illustrates the server-side deployment of databases and Entity Islands and each Entity Island is connected to other Entity Islands via Linked Server and Distributed Queries. Bear in mind that Orchestrator is needed to be placed either in-side application or as a web-service to re-route the application request to more efficient and relevant Entity Island.
This Scale-Out solution provides major performance and scalability improvements. The following table provides the advantage and disadvantage of the mentioned solution:-
The decision to implement a Scale-Out solution is influenced by several factors. The following table summarizes the importance of these factors for each solution.
With a good understanding of the data, requirements, and constraints for an application, an effective SQL Server solution can be designed to meet almost any level of Scale-Out.
Author: Hamid Jabarpour Fard