Last Updated on July 1, 2024 by Abhishek Sharma
In the realm of database management systems (DBMS), concurrency control is a critical aspect that ensures the consistency, integrity, and isolation of data when multiple transactions occur simultaneously. As modern applications increasingly demand concurrent access to databases, effective concurrency control mechanisms become essential to prevent conflicts, anomalies, and ensure that transactions are executed in a manner that maintains the database’s correctness. This article explores the concept of concurrency control in DBMS, discussing its importance, methodologies, and addressing frequently asked questions related to its implementation and challenges.
To have prerequisite knowledge of serializability, generalisation of transitive dependence can be an added bonus in this article on concurrency control in DBMS.
What is Concurrency Control?
Concurrently control is required in the first place due to the transactions that take place without any rules or protocols set in place for them leading to risk in terms of the integrity of the data.
A Transaction is an atomic work done in a database where a transaction requires an appropriate waiting time to be executed. The anomaly arises when there are numerous transactions lined up and a single transaction leads to delayed processing. To curb this issue, concurrency was introduced to reduce the cumulative waiting time and increase the throughput by means of making transactions occur concurrently under the hood.
When multiple transactions take place simultaneously, there is always a risk of data being exploited and resulting in unnecessary errors. Hence, the principles of atomicity, consistency, integrity and serializability are always obeyed as part of concurrency control in DBMS.
Concurrency Control in Depth
Looking at the real-world example to understand concurrency control in DBMS with more depth, let’s suppose a courier delivery service has to export five units and one ship can take one unit at a time. Therefore, to avoid delay in the export of couriers, other ships need to be arranged to take the other four units concurrently.
If it were to wait for one ship to reach before others could depart, it would have been costly in time. Hence, concurrency is useful in the case of multiple transactions executing with better throughput.
Now that we have some clarity on the topic, we will move further to study concurrency control problems and concurrency control techniques in DBMS.
Concurrency Control Problems
The problems that arise with the application of concurrency control in DBMS are as follows:
Dirty Read Problem:
Prevalent issue when data is read by a transaction that is already updated in another transaction and left uncommitted at the point of instance causing an issue in concurrency.
With the help of the following table, it might be easy for you to understand the working of the dirty read problem.
Time | Transaction 1 | Transaction 2 |
---|---|---|
1 | READ (G) | β |
2 | G = G + 500 | β |
3 | WRITE G | β |
4 | β | READ G |
5 | β | COMMIT |
6 | ERROR ROLLBACK | β |
As the following table states that the value of G was updated in 3rd second and read by the second transaction in 4th second and committed due to which there was an erroneous rollback in the first transaction that made the value to initial again. This is known as the Dirty Read Problem.
Unrepeatable Read Problem:
In this problem, there is more than one value of the same data read in the same transaction. Let us have look at the table to understand this problem in concurrency control in DBMS.
Time | Transaction A | Transaction B |
---|---|---|
1 | READ G (500) | β |
2 | β | READ G |
3 | G = G + 100 | β |
4 | WRITE G | β |
5 | β- | READ G |
The value of G was read in transaction B before it was overwritten to be 500+100 i.e 600 in transaction A and reread in transaction B where the value of G turns out to be 600 leading to the ambiguity of values in the same transaction.
Phantom Read Problem:
Data is read in the same transaction but when read later, it encounters an error due to deletion leading up to its usage in Transaction A.
Time | Transaction A | Transaction B |
---|---|---|
1 | READ G | β |
2 | β | READ G |
3 | DELETE G | β |
4 | β | READ G |
Since the data was deleted in the 3rd second in Transaction A, it is not going to be accessed by Transaction B in the 4th second. Therefore, it will throw an error, what is famously known as the Phantom Read Problem.
Lost Update Problem:
Similar to its counterparts discussed the difference is that two different transactions will have their updates made over creating a problem in concurrency control in DBMS.
Time | Transaction A | Transaction B |
---|---|---|
1 | READ G (500) | β |
2 | G = G + 100 | β |
3 | WRITE G | β |
4 | β | G = G + 200 |
5 | β | WRITE G |
6 | READ G | β |
In Transaction A, G was read and updated whilst the same was followed up in transaction B. When Transaction A reads the value for the second time in the 6th second, it was found that the value set up in Transaction A was altered creating a problem for concurrency control in DBMS.
Incorrect Summary Problem:
As the name suggests, the incorrect summation of data in respective transactions with the wrong aggregation is termed the Incorrect Summary Problem. The following data explains:-
Time | Transaction A | Transaction B |
---|---|---|
1 | READ G (500) | |
2 | SUM = 0 | |
3 | SUM = SUM + G | |
4 | READ F (200) | |
5 | F = F+100 | |
6 | READ F | |
7 | SUM = SUM + F |
In the given table, SUM adds up the value of G and F which is overwritten from 200 to 300. In the aftermath of the 5th second, when F is read and summed up, it becomes 800 instead of 700 making a point that Transaction B is affecting the result of Transaction A.
Concurrency Control Techniques in DBMS:
The concurrency control techniques in DBMS that are used to avoid problems related to concurrency control in DBMS in order to maintain consistency and serializability, also known as Protocols are as follow:-
Lock-Based Protocol:-
In order to avoid issues related to inconsistency, one of the foremost requirements is to achieve isolation between all the transactions and to achieve that, locking is done on transactions on account of any read/write operation.
The two variant locks used in the lock-based protocol are
- Shared Lock
- Exclusive Lock
Shared Lock:
It locks the write operations but enables the read operation to take place hence they go by the name, read-only locks. They are denoted by βSβ in a transaction.
Exclusive Lock:
For certain data, It locks both the read and write operations in a transaction and is denoted by βEβ.
Four types of lock-based protocols are:-
1. Simplistic Lock Protocol:-
It locks all the operations in the process, the moment when data is about to be updated and unlocks the operations afterwards.
2. Pre-claiming lock protocol:-
Before enabling locks, all the operations are analyzed and the ones that fall in the checklist of the problem-causing operations are locked only in case all the locks are available and the transaction is performed effectively else rollback is performed.
3. Two-Phase Locking:-
This technique is performed in three stages.
- Asks for the availability of locks
- On acquiring all locks, transaction releases first lock
- Rest of the locks are released one by one after each operation.
4. Strict Two-Phase Locking:-
Slight modification in Two-phase Locking where the locks are not released after each operation but once all the operations are done executing for good and the commit is triggered, the collective release of locks is performed.
Time-Based Protocol:-
All the transactions are tagged with a timestamp which denotes the time when the first and latest read and write operations were performed on them. Timestamp Ordering Protocol is put in place where the timestamp of operations is responsible for ensuring serializability. It holds three timestamps, one for operation and the other two for Read and Writing time. (R & W)
Let’s suppose A is our transaction and G is data and we want to perform a write operation on data.
If Timestamp(A) < R(G) which indicates the Timestamp on the operation is lesser than the time when the data was read last, in such a scenario, rollback will be performed as a result of the data that was read later than the timestamp of the transaction.
Also If Timestamp(A) < W(G) which indicates the Timestamp on the operation is lesser than the time when the data was written last, in such a scenario, rollback will be performed as a result of the data that was written later than the timestamp of the transaction.
On the contrary, when we want to perform a read operation on data then,
If Timestamp(A) = W(G) then the operation will be executed because the Timestamp of the transaction is greater than the last time the data was written which means that the serializability will be maintained.
Validation-Based Protocol
This protocol is divided into three phases which are as follows:-
1. Read Phase:- All the data modifications inside the transaction are stored in a local buffer and reused when needed in later operations.
2. Validation Phase:- Validation is performed to ensure that the actual values can replace what is already pre-existent in the buffer.
- Validation Test: Tests are performed on transactions in concurrency control in DBMS on the basis of their execution time such that Timestamp(A) End(A) – When transaction A is done executing, the start of transaction B is done. In this manner, the serializability is not affected.
- Validate(B) > End(A) > Start(B) – It states that the validation is performed after the end of transaction A if A ends executing after the onset of Transaction B, as one of the effective concurrency control techniques in DBMS.
3. Write Phase:- If the validation phase is performed well and good, the values are copied otherwise a rollback is performed.
Conclusion
Concurrency control in DBMS is fundamental for maintaining the consistency and integrity of databases in environments where multiple transactions occur concurrently. By employing various techniques such as locking, timestamp ordering, and multiversion concurrency control, databases can efficiently manage concurrent access and prevent issues like deadlocks, lost updates, and inconsistent reads. Understanding concurrency control mechanisms is crucial for database administrators and developers to ensure robust and reliable database systems. This knowledge enables them to design systems that can handle high levels of concurrency while maintaining data accuracy and reliability.
FAQs Related to Concurrency Control in DBMS:
FAQs Related to Concurrency Control in DBMS are:
1. What is concurrency control in DBMS?
Answer: Concurrency control in DBMS refers to the management of simultaneous execution of transactions in a way that ensures data consistency and integrity. It aims to coordinate concurrent accesses to a database to prevent conflicts and anomalies.
2. Why is concurrency control important in DBMS?
Answer: Concurrency control is important to prevent issues such as data inconsistency, lost updates, and other anomalies that can occur when multiple transactions access and modify the database concurrently. It ensures the correctness of transactions and the overall reliability of the database.
3. What is the difference between pessimistic and optimistic concurrency control?
Answer: Pessimistic concurrency control assumes conflicts will occur and uses locks to prevent them, while optimistic concurrency control assumes conflicts are rare and allows transactions to proceed without locks, checking for conflicts only at the end of the transaction.
4. What is a lock in concurrency control?
Answer: A lock is a mechanism used to control access to a data item. Locks can be shared (read locks) or exclusive (write locks), and they ensure that transactions do not interfere with each other inappropriately.
5. What is the purpose of a timestamp in concurrency control?
Answer: Timestamps are used in concurrency control to order transactions chronologically. They help enforce serializability by ensuring that transactions are executed in a sequence that respects their timestamps, preventing conflicts.