In the previous blog post regarding SQL Server In-Memory Internals, I wrote an overview of how In-Memory OLTP engine controls the concurrency without acquiring Lock and Latches, and how it handles the transactions. In this blog post I would like to write more in depth regarding concurrency control in this new SQL Server engine.
Basically the structure of the In-Memory Optimized rows reflects the engine behavior regarding concurrency control, lets take a look into the row structure.
As above figure, the in-memory row structure is totally different from disk-base table row structure, it even does not use 8KB pages to store the data, instead it is allocating and storing data once it is needed unlike disk-based tables to prepare the data pages in buffer pool then read/write the data rows.
The row header consists of the following fields:-
- Begin_Ts: It reflects the timestamp of the ‘Insert’ once the transaction is committed.
- End_Ts: It reflects the timestamp of the ‘Delete’ once the transaction is committed.
- StmtId: It is to avoid Halloween attack by keeping the unique ‘StmtId’, which is created the row.
- IdxLinkCount: Represents the number of indexes.
- Padding: Not yet used, extra bytes for future features.
- Index Pointer: It is C language pointer, and it points to the index and next row.
The payload section of the row contains all the data columns and index keys, the payload format is vary and depends on the table schema and In-Memory Native Compiler generates DLLs (C Language) to describe the payload format for transactional operations.
As I mentioned above, SQL Server In-Memory OLTP does not need to acquire Lock or Latch to provide concurrency, instead it does row versioning by creating a new version of the row during modifications. In-Memory storage engine uses Begin_Ts and End_Ts fields to determine which row needs to be visible for the specific transaction.
[info]The In-Memory storage engine totally is not aware of ‘Versioning’ concept and it only sees multiple rows that are connected to each other by index linkage.[/info]
Every database that supports Memory-Optimized tables has the following internal counters in order to support Multi-Version Concurrency Control (MVCC):-
- Transaction-ID: This is incremental unique global value for In-Memory OLTP transactions and restarts once the SQL Server instance restarts.
- Global Transaction Timestamp: This holds the unique ‘Commit’ timestamp and SQL Server increment it for every new transaction. SQL Server instance restart does not affect this counter.
SQL Server keeps track of each active transaction in an internal, global transaction table. When a transaction starts, SQL Server increments the Transaction-ID counter, and assigns a unique transaction ID to the transaction. When the transaction issues a
COMMIT, SQL Server generates a commit timestamp, which it stores in the internal table initially, and then writes the value to the header of affected rows, once it validates the transaction and hardens the changes to the log.
SQL Server also stores in the internal table a system timestamp that indicates the transaction’s start time; it is actually the timestamp for the last committed transaction at the time this transaction started and indicates when the transaction began, relative to the serialization order of the database. More than one transaction can have the same start timestamp. This start timestamp is never used anywhere else, and is only in the internal table while the transaction is active.
Now, I would like to write in more detail how SQL Server In-Memory OLTP storage engine does row versioning and handles the transactions. It is worth taking a preview here of a simple data modification example. Imagine we have two rows as the following in our memory-optimized table:-
Now, we assume a transaction with
Tx1 transaction Id and the timestamp of
90 is going to do the following operations:-
Lets see how the row version evolve during regular-processing, validation and post-processing or commit phases and how SQL Server controls which row version needs to be visible for the active concurrent transactions.
During regular-processing, SQL Server places the transaction id (Tx1) for
End_Ts for deleted rows and for
Begin_Ts for Inserted rows. At same time SQL Server uses an extra bit flag to represent that is transaction id instead of timestamp value.
For the delete operation, SQL Server locates the row via using an index and places the transaction id into the
End_Ts field to mark it as deleted, and the update operation occurs as an atomic operation consists of two separate operations as insert followed by delete. At the end, our transaction
COMMIT and SQL server generates commit timestamp at 120 (For example) and stores this value into the Global Transactions Table, this timestamp represents that transaction logically completed but not yet write into the row headers, because SQL Server not yet validates the transaction and also not yet written the transaction into the log file and SQL Server will not acknowledge the commit to the user until validation completes.
[info] SQL Server will optimistically assume that the transaction will actually commit, and makes the row available to other transactions as soon it receives the
Lets make it more interesting, imagine there is another transaction with transaction id
Tx2 and wants to read the data before SQL Server validates the
In Disk-based tables, SQL Server does pessimistic concurrency and holds Lock on data to avoid reader to read the data that is updating by writer. In contrast, thanks to the MVCC model, the same reader of a memory-optimized table simply continues processing. let’s assume
Tx2 starts at timestamp 100, after
Tx1 started but before
Tx1 issues a
Tx2 will read the
End-Ts contains a Transaction-ID,
Tx1, indicating that the row may have been deleted.
Tx1 in the Global Transaction Table and checks its status to determine if the deletion of
Tx1 is still active and so
Tx2 can access the
When it accesses the
Begin-Ts, check the Global Transaction Table, find
Tx1 is still active, follow the index pointer in the header back to the previous row version, and instead return the row version
Tx2 will not return the row
Now different scenario, lets assume that
Tx2 is started at timestamp 121 after
Tx1 Committed but SQL Server not yet validates
Tx1. Base on our assumption,
Tx2 can read data row that has
Begin_Ts less than equal 121 and
End_Ts more than 121, or infinite.
Since SQL Server has yet to validate transaction
Tx1, it registers a
Commit Dependency between
Tx1. This means that SQL Server will not complete validation of
Tx2, nor acknowledge the commit of Tx2 to the user, until it completes validation of
[info] Any blocking waits that arise from the resolution of commit dependencies will be minimal. Of course, a long-running transaction in an OLTP system is still going to cause some blocking, although never lock-related blocking.[/info]
Tx2, starting at timestamp
100, instead of just reading the rows, wants to update
End-Ts value for the
Tx1 will commit, immediately indicates a write-write conflict, and
Tx2 will abort immediately.
Once the transaction is committed, SQL Server generates the commit timestamp in Global Transaction Table and lead the transaction enters the validation phase. In this phase SQL Server detects the potential transaction isolation level violations, for example:
Tx1 was accessing the memory-optimized table in
REPEATABLE READ isolation. It reads a row value and then
Tx2 updates that row value, which it can do because SQL Server acquires no locks in the MVCC model, and issues a commit before
Tx1 commits. When
Tx1 enters the validation phase, it will fail the validation check and SQL Server will abort the transaction. If there are no violations, SQL Server proceeds with other actions that will culminate in guaranteeing the durability of the transaction.
The following list is the summarized processes that SQL Server does during validation phase:-
- Validate the changes made by a transaction
- Wait for any commit dependencies to resolve
- Write the transaction to disk
- Mark the transaction commit as validated
- Clear dependencies
Post-Processing or Commit Phase
At this phase, SQL Server writes the commit timestamp into the row header of involved rows in the transaction, based on our example, the rows will be look like the following:-
The SQL Server In-Memory OLTP engine supports true optimistic concurrency by using MVCC model and Bw-Tree row structure, based on in-memory row versioning. MVCC means that SQL Server can avoid read-write conflicts without the need for any locking or latching, and will raise write-write conflicts immediately, rather than after a delay.