Types of Database Replication

Last Updated : 11 Dec, 2024

Making duplicates of the important documents so you have backups in case something happens to the original is similar to database replication. There are different ways to make these copies, like having one main copy (master) that gets updated and then making copies (slaves) of that updated version. Another way is to have multiple main copies (masters) that can all be updated and share those updates. In this article, we will see different types of database replication.

Types-of-Database-Replication

Table of Content

Let's understand the different types of database replication:

1. Master-Slave Replication

The process of copying and synchronizing data from a primary database (the master) to one or more secondary databases (the slaves) is known as master-slave replication.

In this configuration, all write operations, including inserts, updates, and deletions, must be received by the master database.
After modifications are made to the master database, a copy of the data is kept in the slave databases.

Example: Imagine a library with two branches

Master branch: This is the main library with the original and constantly updated collection of books.
Slave branch: This is a smaller branch that receives copies of new books from the master branch at regular intervals. Students can only borrow books that are physically present in the slave branch.

Master-Slave-Replication

How Master-Slave Replication works?

Step 1: Write Operations: The master database keeps track of the change in its transaction log whenever a write operation is carried out on it.
Step 2: Replication Process: A replication process or thread in the master database reads the transaction log and updates the slave databases.
Step 3: Network Communication: The changes are transmitted over the network from the master to the slave(s).
Step 4: Applying Changes: Upon receiving the changes, the slave database applies them to its own copy of the data. The slave may also have a replication process or thread that manages this process.
Step 5: Acknowledgment: Once the changes are applied, the slave sends an acknowledgment back to the master to confirm that the changes have been received and applied successfully.

Applications of Master-Slave Replication

E-commerce Websites: Using slave servers to handle read-heavy operations such as product listings, while the master server handles write operations like order processing.
Content Management Systems: Distributing read operations for viewing content across multiple slave servers, while the master server manages content updates and changes.

Benefits of Master-Slave Replication

High Availability: A slave database can be promoted to become the new master in the event of a master database failure, guaranteeing that the system will continue to function and be accessible.
Scalability: The system can manage additional users and data without compromising speed by shifting read operations to the slave databases, which reduces the stress on the master database.
Data Consistency: Master-slave replication keeps all copies of the data up to date by replicating changes made to the master database to the slave databases, which guarantees data consistency across many databases.

Challenges of Master-Slave Replication

Replication Lag: Data inconsistencies may result from a latency (replication lag) between the time a change is performed on the master database and when it is replicated to the slave databases.
Single Point of Failure: Because the master database is a single point of failure, if it breaks down, the system might not function at all until another master is promoted.
Limited Write Scalability: Since write operations are limited to the master database, it can become a bottleneck for write-heavy applications.

2. Master-Master Replication

Bidirectional replication, sometimes referred to as master-master replication, is a configuration where two or more databases are set up as master databases, each of which is able to accept write operations. In other words, any modifications made to one master database are reflected in all other master databases within the setup.

A write action on one master database replicates the update to all other master databases.
Conflict resolution procedures are required to guarantee data consistency in the event that conflicting writes take place on various master databases.

Example: Imagine two highly trained air traffic controllers managing air traffic in a busy airspace

Each controller has a designated sector and full authority to direct planes within their zone.
They constantly communicate and share information to ensure flight paths don't conflict, maintaining overall airspace safety.
If one controller becomes unavailable, the other can seamlessly take over responsibility for both sectors, guaranteeing uninterrupted traffic flow.

Master-Master-Replication

How Master-Master Replication works?

Step 1: Write Operations: When a write operation (such as an insert, update, or delete) is performed on one master node, that node records the change in its transaction log.
Step 2: Replication Process: The master node has a replication process or thread that reads the transaction log and sends the changes (or updates) to the other master nodes.
Step 3: Network Communication: The changes are transmitted over the network from one master node to the other master nodes. This communication can be synchronous or asynchronous, depending on the configuration.
Step 4: Applying Changes: Upon receiving the changes, each master node applies them to its own copy of the data. The nodes may also have replication processes or threads that manage this process.
Step 5: Conflict Resolution: Whenever conflicting writes occur (i.e., the same data is modified on different master nodes simultaneously), conflict resolution mechanisms are needed to ensure data consistency.
Step 6: Acknowledgment: Once the changes are applied, each master node sends an acknowledgment back to the originating node to confirm that the changes have been received and applied successfully.

Applications of Master-Master Replication

Multi-Datacenter Applications: Utilizing master-master replication for active-active configurations across different data centers, providing low-latency access to data.
Collaborative Editing Platforms: Allowing users to concurrently edit documents by syncing changes between multiple master servers.

Benefits of Master-Master Replication

Improved Write Scalability: Since write operations can be distributed among multiple master databases, the overall write performance of the system can be improved, especially in write-heavy applications.
High Availability: If one master database fails, the other master databases can continue to accept write operations, ensuring that the system remains available.

Challenges of Master-Master Replication

Complexity: Setting up and managing master-master replication can be complex, especially when dealing with issues such as conflict resolution, data consistency, and network configuration.
Conflict Resolution: Conflicts can arise if the same data is modified on different master nodes simultaneously. Implementing conflict resolution mechanisms can be challenging and may require manual intervention in some cases.

3. Snapshot Replication

Creating a copy of the whole database at a certain moment in time and then replicating that snapshot to one or more destination servers is known as snapshot replication. This is typically done for reporting, backup, or distributed database purposes.

Example: Imagine taking a photo of a messy room (database) at a specific time

The snapshot captures the state of the room (database) at that exact moment.
You can use the snapshot to restore the room (database) to its previous state if needed.

Snapshot-Replication

How Snapshot Replication works?

Step 1: Initial Snapshot: A full copy of the database is taken at the publisher (source database server). This snapshot includes all the tables, data, and schema at a specific point in time.
Step 2: Distribution: The snapshot is stored in a distribution database. This database acts as a repository for the snapshot and the subsequent changes.
Step 3: Replication Process: Changes (inserts, updates, deletes) made to the publisher's database are tracked. These changes are stored in the distribution database. The distribution database periodically replicates these changes to subscriber databases.
Step 4: Subscriber Updates: Subscribers receive the replicated changes from the distribution database. They apply these changes to their own databases to keep them synchronized with the publisher.

Applications of Snapshot Replication

Data Warehousing: Creating regular snapshots of the production database for analysis and reporting without affecting the live database.
Auditing and Compliance: Maintaining snapshots of data for auditing purposes to ensure compliance with regulations.

Benefits of Snapshot Replication

Easy Implementation: Snapshot replication is relatively easy to set up and manage compared to other forms of replication.
Offline Access: Snapshots can be used to provide offline access to data for reporting or analysis purposes.
Data Protection: It can serve as a backup mechanism, providing a point-in-time copy of the database that can be restored if needed.

Challenges of Snapshot Replication

Data Consistency: Keeping multiple copies of the database synchronized can be challenging, especially in environments with frequent updates.
Storage Requirements: Storing multiple copies of the database, including snapshots and changes, can require significant storage capacity.

4. Transactional Replication

One way to maintain several copies of a database synchronized in real-time is through transactional replication. This implies that any modifications made to a particular table (or group of tables) in one database—referred to as the publisher—are instantly copied to other databases—referred to as subscribers. This guarantees data consistency across many locations by guaranteeing that all copies of the data are identical at any given time.

Example: Picture a live stock market with constantly changing prices

Every price change (transaction) is immediately broadcasted to all connected screens (replicas).
Everyone sees the same price updates in real-time.

Transactional-Replication-(1)

How Transactional Replication works?

Step 1: Publisher and Subscriber: You define a table or set of tables in the publisher database that you want to replicate. Each subscriber database receives updates for these specific tables.
Step 2: Changes are Tracked: The publisher continuously monitors the selected tables for any changes, such as inserts, updates, or deletes.
Step 3: Transactions Captured: Each change is grouped into a transaction, ensuring data integrity and consistency.
Step 4: Distributor Sends Updates: A central server called the distributor receives the transactions from the publisher and prepares them for distribution to the subscribers.
Step 5: Subscribers Apply Updates: The subscribers receive the transactions from the distributor and apply them to their local copies of the tables, maintaining data consistency.

Applications of Transactional Replication

Financial Services: Ensuring near real-time replication of financial transactions across multiple databases for auditing and compliance.
Online Gaming: Synchronizing player actions and game state in real-time across game servers to maintain a consistent player experience.

Benefits of Transactional Replication

Real-time Updates: Data changes are immediately reflected across all replicas,providing high availability and data consistency.
Disaster Recovery: Replicated copies serve as backups for disaster recovery in case of failures at the primary database.
Data Distribution: Enables geographically dispersed locations to have access to the latest data without performance penalty.

Challenges of Transactional Replication

Configuration: Setting up and maintaining transactional replication requires technical expertise and careful configuration. Understanding replication agents, distributors, and subscriber configurations can be complex.
Overhead: Replicating transactions adds additional processing load to the publisher database, potentially impacting its performance.

5. Merge Replication

Merge replication is a database synchronization method allowing both the central server (publisher) and its connected devices (subscribers) to make changes to the data, resolving conflicts when necessary.

Two-way synchronization: Unlike transactional replication, where updates flow primarily from the publisher to subscribers, merge replication allows bidirectional data flow. This means both the central server and devices can modify the data, even when offline.
Conflict resolution: With multiple parties editing the same data, conflicts are bound to occur. Merge replication employs pre-defined rules or user interventions to resolve conflicting changes.

Merge-Replication

Example: Imagine a team working on a shared document (database) in Google Docs

Team members can edit the document offline (locally) and their changes are saved temporarily.
When they connect online, their changes are merged with the main document, resolving any conflicts.

How Merge Replication works?

Step 1: Publisher and Subscribers: Similar to other methods, you define tables in the publisher database for replication. Subscribers can also have read/write access to these tables.
Step 2: Changes are Tracked: Both the publisher and subscribers track changes made to the tables.
Step 3: Conflicts are Possible: Since both sides can modify data, conflicts can occur when different changes are made to the same data item.
Step 4: Synchronization and Conflict Resolution: When a subscriber connects to the network, it sends its changes to the publisher. The publisher merges these changes with its own and other subscribers' changes. If conflicts arise, pre-defined rules determine which change takes precedence.
Step 5: Updates Distributed: The resolved updates are then distributed back to all subscribers, ensuring everyone has the latest data.

Applications of Merge Replication

Field Service Applications: Giving field workers the option to operate offline and synchronize their updates with a central server whenever they are able to reconnect.
Healthcare Systems: Enabling medical professionals to access and update patient records offline, with changes syncing back to the central database when online.

Benefits of Merge Replication

Offline Updates: Devices can work with data even when disconnected, making updates later when reconnected.
Two-way Synchronization: Allows bidirectional data flow between publisher and subscribers, ideal for distributed environments.
Flexibility: Offers various conflict resolution options to suit different data handling needs.

Challenges of Merge Replication

Complexity: Managing conflict resolution, data synchronization, and troubleshooting requires significant technical expertise and can be error-prone.
Performance: Merging and resolving conflicts adds processing overhead to both publisher and subscribers, potentially impacting performance and network bandwidth.
Data consistency: Potential errors in conflict resolution or synchronization can lead to data inconsistencies across different copies, requiring careful measures to ensure data integrity.

Differences between Master-Slave Replication and Master-Master Replication

Aspect	Master-Slave Replication	Master-Master Replication
Data Flow	One-way: from master to slave	Bi-directional: between masters
Write Operations	Only master allows writes; slaves are read-only	Both masters allow writes
Read Operations	Slaves can handle read operations	Both masters can handle read operations
Data Consistency	Asynchronous, potential delay in consistency	Can be synchronous, immediate consistency possible
Conflict Resolution	Simpler, conflicts less likely due to one-way flow	More complex, conflicts may occur and need resolution

Differences between Snapshot Replication and Transactional Replication

Aspect	Snapshot Replication	Transactional Replication
Data Capture	Takes a point-in-time snapshot of the entire database	Captures and replicates individual transactions in real-time
Frequency of Updates	Typically used for less frequent updates	Used for more frequent updates, providing near real-time replication
Size of Data Transfer	Transfers the entire dataset during each replication cycle	Transfers only the changes made since the last replication cycle, reducing data transfer
Consistency	Provides a consistent snapshot of the database at a specific point in time	Maintains near real-time consistency between the publisher and subscribers
Use Cases	Suitable for reporting, backup, or distributing read-only copies of the database	Used for scenarios where near real-time data synchronization is required

Conclusion

In conclusion, database replication is a fundamental concept in system design that plays a crucial role in ensuring data availability, scalability, and fault tolerance. By understanding these above types of replication and their respective use cases, system designers can make informed decisions to meet the specific requirements of their applications, ensuring data integrity, availability, and performance.

Configurations of Database Replication in System Design

sanketsay9qs

Improve

Article Tags :

System Design