Database Transactions and Concurrency Control: MVCC vs Locking

Understanding ACID properties and concurrency control mechanisms for data integrity

Database Transactions and Concurrency Control: MVCC vs Locking



Overview

Today, we’ll explore the concepts of database transactions and concurrency control.

Transactions are units of work executed atomically in databases, and when multiple users access data simultaneously, concurrency control techniques are needed to prevent conflicts.

This article will examine the concept of transactions and their execution methods, and compare representative concurrency control techniques: MVCC (Multi-Version Concurrency Control) and Locking mechanisms.



What is a Transaction?

A transaction represents a logical unit of work in a database. All operations within a transaction must either complete successfully, or if even one fails, the entire transaction must be rolled back.


ACID Properties of Transactions

Transactions must follow ACID principles:

Property Description
Atomicity (원자성) All operations within a transaction must either fully execute or fully rollback if any operation fails
Consistency (일관성) The database must maintain a consistent state after a transaction executes
Isolation (고립성) Multiple concurrent transactions should not interfere with each other
Durability (지속성) Once a transaction successfully completes, changes must be permanent



Transaction Execution Process

Transactions execute through the following steps:

  1. BEGIN TRANSACTION: Start the transaction
  2. SQL Execution: Perform operations like INSERT, UPDATE, DELETE
  3. COMMIT: Reflect changes if all operations succeed
  4. ROLLBACK: Cancel all changes if an error occurs


Transaction Example (MySQL)

-- Start transaction
START TRANSACTION;

-- Withdraw 1,000,000 from Account A
UPDATE accounts SET balance = balance - 1000000 WHERE account_id = 1;

-- Deposit 1,000,000 to Account B
UPDATE accounts SET balance = balance + 1000000 WHERE account_id = 2;

-- Commit if all operations complete successfully
COMMIT;


If the second UPDATE encounters an error, you can ROLLBACK to cancel the first UPDATE as well:

-- Transaction rollback example
ROLLBACK;



What is Concurrency Control?

Concurrency control is a technique to ensure data integrity when multiple users attempt to modify the same data simultaneously.

The two main concurrency control techniques are MVCC (Multi-Version Concurrency Control) and Locking mechanisms.


MVCC vs Locking Comparison

Comparison MVCC Locking
Concept Maintains multiple versions so transactions execute independently Locks specific data and restricts access by other transactions
Read Method Reads older data to prevent conflicts Reads latest data but may conflict with other transactions
Write Method Creates new version for changed data Maintains lock until transaction completes
Performance Excellent read performance, write performance burden Excellent write performance, relatively lower read performance
Examples PostgreSQL, MySQL (InnoDB), Oracle MySQL (MyISAM), MSSQL
Disadvantages Increased storage space as old versions accumulate Performance degradation due to increased lock wait time



Deep Dive: MVCC (Multi-Version Concurrency Control)

MVCC maintains multiple versions of data to prevent conflicts between transactions.


Key MVCC Concepts

1. Snapshot Isolation

2. Undo Log Utilization


Advantages of MVCC


Disadvantages of MVCC


MVCC Implementation Example

-- PostgreSQL MVCC in action

-- Transaction 1: Start at time T1
BEGIN;
SELECT balance FROM accounts WHERE id = 1; -- Sees version at T1
-- Reads: balance = 1000

-- Transaction 2: Starts at time T2 (after T1)
BEGIN;
UPDATE accounts SET balance = 500 WHERE id = 1;
COMMIT; -- Creates new version at T2

-- Back to Transaction 1
SELECT balance FROM accounts WHERE id = 1; -- Still sees T1 version
-- Still reads: balance = 1000 (snapshot isolation)
COMMIT;

-- New Transaction 3: Starts at time T3 (after T2)
BEGIN;
SELECT balance FROM accounts WHERE id = 1; -- Sees T2 version
-- Reads: balance = 500
COMMIT;



Deep Dive: Locking Mechanisms

Locking restricts access to specific data by setting locks that prevent other transactions from accessing it.


Key Locking Concepts

1. Shared Lock (S Lock)

2. Exclusive Lock (X Lock)


Lock Compatibility Matrix

Lock Type Shared Lock (S) Exclusive Lock (X)
Shared Lock (S) ✅ Compatible ❌ Incompatible
Exclusive Lock (X) ❌ Incompatible ❌ Incompatible


Advantages of Locking


Disadvantages of Locking


Locking Example

-- MySQL InnoDB Locking

-- Transaction 1: Acquire exclusive lock
START TRANSACTION;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE; -- X lock acquired
-- Other transactions trying to access id = 1 will wait

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
-- Lock still held...

-- Transaction 2: Attempts to read
START TRANSACTION;
SELECT * FROM accounts WHERE id = 1; -- WAITS for Transaction 1
-- This query blocks until Transaction 1 commits or rolls back

-- Transaction 1 completes
COMMIT; -- Lock released

-- Transaction 2 can now proceed
-- Query executes immediately after lock release
COMMIT;



Transaction Isolation Levels

Different isolation levels provide varying degrees of transaction isolation:


Isolation Levels Comparison

Level Dirty Read Non-Repeatable Read Phantom Read Performance
READ UNCOMMITTED ✅ Possible ✅ Possible ✅ Possible ⭐⭐⭐⭐⭐ Highest
READ COMMITTED ❌ Not Possible ✅ Possible ✅ Possible ⭐⭐⭐⭐ High
REPEATABLE READ ❌ Not Possible ❌ Not Possible ✅ Possible ⭐⭐⭐ Medium
SERIALIZABLE ❌ Not Possible ❌ Not Possible ❌ Not Possible ⭐⭐ Low


Setting Isolation Levels

-- MySQL/PostgreSQL
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- Check current isolation level
SELECT @@transaction_isolation; -- MySQL
SHOW TRANSACTION ISOLATION LEVEL; -- PostgreSQL



Deadlock Detection and Resolution


What is a Deadlock?

A deadlock occurs when two or more transactions are waiting for each other to release locks, creating a circular dependency.

Transaction A: Holds Lock on Row 1, Waits for Lock on Row 2
Transaction B: Holds Lock on Row 2, Waits for Lock on Row 1
    ↓
Deadlock! Neither can proceed.


Deadlock Example

-- Transaction A
START TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- Lock Row 1
-- ... some processing ...
UPDATE accounts SET balance = balance + 100 WHERE id = 2; -- Wait for Row 2

-- Transaction B (concurrent)
START TRANSACTION;
UPDATE accounts SET balance = balance - 50 WHERE id = 2; -- Lock Row 2
-- ... some processing ...
UPDATE accounts SET balance = balance + 50 WHERE id = 1; -- Wait for Row 1

-- DEADLOCK DETECTED!
-- One transaction will be rolled back automatically


Deadlock Prevention Strategies

1. Consistent Lock Ordering

-- Always acquire locks in the same order
-- Good: Always lock lower ID first
UPDATE accounts SET balance = balance - 100 
WHERE id IN (1, 2) 
ORDER BY id; -- Ensures consistent order


2. Use Timeouts

-- MySQL
SET innodb_lock_wait_timeout = 50; -- 50 seconds

-- PostgreSQL
SET lock_timeout = '5s'; -- 5 seconds


3. Minimize Transaction Duration

-- Bad: Long-running transaction
START TRANSACTION;
SELECT * FROM accounts; -- Lots of processing
-- ... application logic, network calls, etc ...
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

-- Good: Short transaction
-- Do processing outside transaction
START TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;



Monitoring Transactions and Locks


PostgreSQL Monitoring

-- View active transactions
SELECT * FROM pg_stat_activity 
WHERE state = 'active';

-- View locks
SELECT 
    l.relation::regclass,
    l.locktype,
    l.mode,
    l.granted,
    a.usename,
    a.query,
    a.query_start
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT l.granted;

-- Kill blocking transaction
SELECT pg_terminate_backend(pid) 
FROM pg_stat_activity 
WHERE pid = <blocking_pid>;


MySQL Monitoring

-- View active transactions
SELECT * FROM information_schema.INNODB_TRX;

-- View locks
SELECT * FROM performance_schema.data_locks;

-- View lock waits
SELECT 
    r.trx_id waiting_trx_id,
    r.trx_mysql_thread_id waiting_thread,
    r.trx_query waiting_query,
    b.trx_id blocking_trx_id,
    b.trx_mysql_thread_id blocking_thread,
    b.trx_query blocking_query
FROM information_schema.INNODB_LOCK_WAITS w
JOIN information_schema.INNODB_TRX b ON b.trx_id = w.blocking_trx_id
JOIN information_schema.INNODB_TRX r ON r.trx_id = w.requesting_trx_id;

-- Kill blocking connection
KILL <thread_id>;



Best Practices for Transaction Management


1. Keep Transactions Short

-- Bad: Long transaction
BEGIN;
SELECT COUNT(*) FROM large_table; -- Takes 10 seconds
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
-- More operations...
COMMIT; -- Locks held for entire duration

-- Good: Optimize queries first
SELECT COUNT(*) FROM large_table; -- Run outside transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT; -- Locks held briefly


2. Access Data in Consistent Order

-- Establish a convention, e.g., always by ID ascending
UPDATE accounts SET balance = balance - amount 
WHERE id IN (account1_id, account2_id)
ORDER BY id; -- Prevents deadlocks


3. Use Appropriate Isolation Levels

-- For most applications, READ COMMITTED is sufficient
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Only use SERIALIZABLE when absolutely necessary
-- (bank transfers, inventory management)
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;


4. Handle Retries Gracefully

import psycopg2
from psycopg2 import OperationalError
import time

def execute_with_retry(conn, query, max_retries=3):
    for attempt in range(max_retries):
        try:
            cursor = conn.cursor()
            cursor.execute(query)
            conn.commit()
            return True
        except OperationalError as e:
            if 'deadlock detected' in str(e):
                if attempt < max_retries - 1:
                    time.sleep(0.1 * (2 ** attempt))  # Exponential backoff
                    conn.rollback()
                    continue
            raise
    return False



Performance Optimization Techniques


1. MVCC Optimization (PostgreSQL)

-- Regular VACUUM to clean up old row versions
VACUUM ANALYZE accounts;

-- Aggressive VACUUM for heavily updated tables
VACUUM FULL accounts; -- Reclaims space but locks table

-- Autovacuum tuning
ALTER TABLE accounts SET (
    autovacuum_vacuum_threshold = 100,
    autovacuum_vacuum_scale_factor = 0.1
);


2. Index Optimization

-- Add indexes to reduce lock contention
CREATE INDEX idx_accounts_id ON accounts(id);

-- Use covering indexes for read-heavy workloads
CREATE INDEX idx_accounts_balance ON accounts(id, balance);


3. Partitioning for Reduced Lock Contention

-- Partition large tables to reduce lock scope
CREATE TABLE accounts (
    id BIGSERIAL,
    balance NUMERIC,
    created_at TIMESTAMP,
    region VARCHAR(10)
) PARTITION BY LIST (region);

CREATE TABLE accounts_us PARTITION OF accounts FOR VALUES IN ('US');
CREATE TABLE accounts_eu PARTITION OF accounts FOR VALUES IN ('EU');
CREATE TABLE accounts_asia PARTITION OF accounts FOR VALUES IN ('ASIA');



Real-World Scenarios


Scenario 1: E-Commerce Order Processing

-- High concurrency order placement
BEGIN TRANSACTION;

-- Check inventory (with lock)
SELECT quantity FROM products 
WHERE id = 123 FOR UPDATE;

-- Verify sufficient stock
IF quantity >= order_quantity THEN
    -- Decrement inventory
    UPDATE products 
    SET quantity = quantity - order_quantity 
    WHERE id = 123;
    
    -- Create order
    INSERT INTO orders (user_id, product_id, quantity) 
    VALUES (456, 123, order_quantity);
    
    COMMIT;
ELSE
    ROLLBACK;
END IF;


Scenario 2: Banking Transfer

-- Prevent negative balance and ensure atomicity
BEGIN TRANSACTION;

-- Lock both accounts
SELECT balance FROM accounts WHERE id IN (from_id, to_id) 
ORDER BY id FOR UPDATE;

-- Validate sufficient funds
IF (SELECT balance FROM accounts WHERE id = from_id) >= amount THEN
    UPDATE accounts SET balance = balance - amount WHERE id = from_id;
    UPDATE accounts SET balance = balance + amount WHERE id = to_id;
    
    -- Log transaction
    INSERT INTO transfers (from_account, to_account, amount, timestamp)
    VALUES (from_id, to_id, amount, NOW());
    
    COMMIT;
ELSE
    ROLLBACK;
END IF;


Scenario 3: Distributed Transaction (2PC)

-- Two-Phase Commit for distributed systems
-- Phase 1: Prepare
BEGIN;
PREPARE TRANSACTION 'transfer_123';

-- All participants prepare...

-- Phase 2: Commit
COMMIT PREPARED 'transfer_123';

-- Or rollback if any participant fails
ROLLBACK PREPARED 'transfer_123';



Troubleshooting Common Issues


Issue 1: Transaction Timeout

Symptom: Lock wait timeout exceeded

Solution:

-- Increase timeout
SET innodb_lock_wait_timeout = 120; -- MySQL

-- Or identify blocking transactions
SELECT * FROM information_schema.INNODB_TRX
WHERE trx_state = 'LOCK WAIT';


Issue 2: High Lock Contention

Symptom: Many transactions waiting for locks

Solution:

-- Identify hot spots
SELECT 
    object_name,
    COUNT(*) as lock_count
FROM performance_schema.data_locks
GROUP BY object_name
ORDER BY lock_count DESC;

-- Consider partitioning or sharding hot tables


Issue 3: MVCC Bloat (PostgreSQL)

Symptom: Database size growing despite deletions

Solution:

# Check table bloat
SELECT schemaname, tablename, 
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

# Run VACUUM FULL during maintenance window
VACUUM FULL accounts;



Conclusion

Today we explored transactions and concurrency control in depth.


Key Takeaways:

Transactions follow ACID principles and require concurrency control for concurrent execution

MVCC excels in read performance but consumes more storage space

Locking has good write performance but risks deadlocks

MySQL (InnoDB), PostgreSQL use MVCC, while MySQL (MyISAM), MSSQL utilize Locking

Choose appropriate concurrency control based on your database and transaction characteristics


💡 Understanding the trade-offs between MVCC and Locking mechanisms is crucial for maintaining database performance and consistency!


Best Practices Summary:

  1. Keep transactions as short as possible
  2. Access resources in consistent order to prevent deadlocks
  3. Use appropriate isolation levels for your use case
  4. Implement retry logic for transient failures
  5. Monitor lock contention and transaction duration
  6. Regular maintenance (VACUUM, statistics updates)
  7. Partition large tables to reduce lock scope


Database performance and consistency depend on proper transaction management and concurrency control. Master these concepts to build reliable, scalable systems!



References