MongoDB is a NoSQL database that uses JSON-like documents with dynamic schemas. When working with databases, it’s always good to have a contingency plan in case one of your database servers fails. Sidebar, you can reduce the chances of that happening by leveraging a nifty management tool for your WordPress site.

This is why it’s useful to have many copies of your data. It also reduces read latencies. At the same time, it can improve the database’s scalability and availability. This is where replication comes in. It’s defined as the practice of synchronizing data across multiple databases.

In this article, we’ll be diving into the various salient aspects of MongoDB replication, like its features, and mechanism, to name a few.

What Is Replication in MongoDB?

In MongoDB, replica sets perform replication. This is a group of servers maintaining the same data set through replication. You can even use MongoDB replication as a part of load balancing. Here, you can distribute the write and read operations across all the instances, based on the use case.

What Is a MongoDB Replica Set?

Every instance of MongoDB that’s part of a given replica set is a member. Every replica set needs to have a primary member and at least one secondary member.

The primary member is the primary access point for transactions with the replica set. It’s also the only member that can accept write operations. Replication first copies the primary’s oplog (operations log). Next, it repeats the logged changes on the secondaries’ respective datasets. Hence,  Every replica set can only have one primary member at a time. Various primaries receiving write operations can cause data conflicts.

Usually, the applications only query the primary member for write and read operations. You can design your setup to read from one or more of the secondary members. Asynchronous data transfer can cause secondary nodes’ reads to serve old data. Thus, such an arrangement isn’t ideal for every use case.

Replica Set Features

The automatic failover mechanism sets MongoDB’s replica sets apart from its competition. In the absence of a primary, an automated election among the secondary nodes picks a new primary.

MongoDB Replica Set vs MongoDB Cluster

A MongoDB replica set will create various copies of the same data set across the replica set nodes. The primary aim of a replica set is to:

  • Offer a built-in backup solution
  • Increase data availability

A MongoDB cluster is a different ball game altogether. It distributes the data across many nodes through a shard key. This process will fragment the data into many pieces called shards. Next, it copies each shard to a different node. A cluster aims to support large data sets and high-throughput operations. It achieves it by horizontally scaling the workload.

Here’s the difference between a replica set and a cluster, in layman’s terms:

  • A cluster distributes the workload. It also stores fragments of data(shards) across many servers.
  • A replica set duplicates the data set completely.

MongoDB allows you to combine these functionalities by making a sharded cluster. Here, you can replicate every shard to a secondary server. This allows a shard to offer high redundancy and data availability.

Maintaining and setting up a replica set can be technically taxing and time-consuming. And finding the right hosting service? That’s a whole other headache. With so many options out there, it’s easy to waste hours researching, instead of building your business.

Let me give you a brief about a tool that does all of this and so much more so that you can go back to crushing it with your service/product.

Kinsta’s Application Hosting solution, which is trusted by over 55,000 developers, you can get up and running with it in just 3 simple steps. If that sounds too good to be true, here are some more benefits of using Kinsta:

  • Enjoy better performance with Kinsta’s internal connections: Forget your struggles with shared databases. Switch to dedicated databases with internal connections that have no query count or row count limits. Kinsta is faster, more secure, and won’t bill you for internal bandwidth/traffic.
  • A feature set tailored for developers: Scale your application on the robust platform that supports Gmail, YouTube, and Google Search. Rest assured, you’re in the safest hands here.
  • Enjoy unparalleled speeds with a data center of your choice: Pick the region that works best for you and your customers. With over 25 data centers to choose from, Kinsta’s 260+ PoPs ensure maximum speed and a global presence for your website.

Try out Kinsta’s application hosting solution for free today!

How Does Replication Work in MongoDB?

In MongoDB, you send writer operations to the primary server (node). The primary assigns the operations across secondary servers, replicating the data.

This is a flow-chart of how replication works in MongoDB, for 3 nodes (1 primary, 2 secondaries)
MongoDB Replication Process Illustration (Image Source: MongoDB)

Three Types of MongoDB Nodes

Of the three types of MongoDB nodes, two have come up before: primary and secondary nodes. The third type of MongoDB node that comes in handy during replication is an arbiter. The arbiter node doesn’t have a copy of the data set and can’t become a primary. Having said that, the arbiter does take part in elections for the primary.

We’ve previously mentioned what happens when the primary node goes down, but what if the secondary nodes bit the dust? In that scenario, the primary node becomes secondary and the database becomes unreachable.

Member Election

The elections can occur in the following scenarios:

  • Initializing a replica set
  • Loss of connectivity to the primary node (that can be detected by heartbeats)
  • Maintenance of a replica set using rs.reconfig or stepDown methods
  • Adding a new node to an existing replica set

A replica set can possess up to 50 members, but only 7 or fewer can vote in any election.

The average time before a cluster elects a new primary shouldn’t go beyond 12 seconds. The election algorithm will try to have the secondary with the highest priority available. At the same time, the members with a priority value of 0 cannot become primaries and don’t participate in the election.

This is a diagram depicting a secondary node becoming a primary in MongoDB after the election.
Secondary node becoming a primary (Image Source: Medium)

The Write Concern

For durability, write operations have a framework to copy the data in a specified number of nodes. You can even offer feedback to the client with this. This framework is also known as the “write concern.” It has data-bearing members that need to acknowledge a write concern before the operation returns as successful. Generally, the replica sets have a value of 1 as a write concern. Thus, only the primary should acknowledge the write before returning the write concern acknowledgment.

You can even increase the number of members needed to acknowledge the write operation. There’s no ceiling to the number of members you can have. But, if the numbers are high, you need to deal with high latency. This is because the client needs to wait for acknowledgment from all the members. Also, you can set the write concern of the “majority.”This calculates more than half of the members after receiving their acknowledgment.

Read Preference

For the read operations, you can mention the read preference that describes how the database directs the query to members of the replica set. Generally, the primary node receives the read operation but the client can mention a read preference to send the read operations to secondary nodes. Here are the options for the read preference:

  • primaryPreferred: Usually, the read operations come from the primary node but if this isn’t available the data is pulled from the secondary nodes.
  • primary: All the read operations come from the primary node.
  • secondary: All the read operations are executed by the secondary nodes.
  • nearest: Here, the read requests are routed to the nearest reachable node, which can be detected by running the ping command. The outcome of reading operations can come from any member of the replica set, irrespective of whether it’s the primary or the secondary.
  • secondaryPreferred: Here, most of the read operations come from the secondary nodes, but if none of them is available, the data is taken from the primary node.

Replication Set Data Synchronization

To maintain up-to-date copies of the shared data set, secondary members of a replica set replicate or sync data from other members.

MongoDB leverages two forms of data synchronization. Initial sync to populate new members with the full data set. Replication to execute ongoing changes to the complete data set.

Initial Sync

During the initial synchronization, a secondary node runs the init sync command to synchronize all data from the primary node to another secondary node that contains the latest data. Therefore, the secondary node consistently leverages the tailable cursor feature to query the latest oplog entries within the local.oplog.rs collection of the primary node and applies these operations within these oplog entries.

From MongoDB 5.2, initial syncs can be file copy based or logical.

Logical Sync

When you execute a logical sync, MongoDB:

  1. Develops all collection indexes as the documents are copied for each collection.
  2. Duplicates all databases except for the local database. mongod scans every collection in all the source databases and inserts all data into its duplicates of these collections.
  3. Executes all changes on the data set. By leveraging the oplog from the source, the mongod upgrades its data set to depict the current state of the replica set.
  4. Extracts newly added oplog records during the data copy. Make sure that the target member has enough disk space within the local database to tentatively store these oplog records for the duration of this data copy stage.

When the initial sync is completed, the member transitions from STARTUP2 to SECONDARY .

File Copy-Based Initial Sync

Right off the bat, you can only execute this if you use MongoDB Enterprise. This process runs the initial sync by duplicating and moving the files on the file system. This sync method might be faster than logical initial sync in some cases. Keep in mind, file copy-based initial sync might lead to inaccurate counts if you run the count() method without a query predicate.

But, this method has its fair share of limitations as well:

  • During a file copy-based initial sync, you cannot write to the local database of the member that is being synced. You also cannot run a backup on the member that is being synced to or the member that is being synced from.
  • When leveraging the encrypted storage engine, MongoDB uses the source key to encrypt the destination.
  • You can only run an initial sync from one given member at a time.

Replication

Secondary members replicate data consistently after the initial sync. Secondary members will duplicate the oplog from their sync from the source and execute these operations in an asynchronous process.

Secondaries are capable of automatically modifying their sync from source as needed based on the changes in the ping time and state of other members’ replication.

Streaming Replication

From MongoDB 4.4, sync from sources sends a continuous stream of oplog entries to their syncing secondaries. Streaming replication reduces the replication lag in high-load and high-latency networks. It can also:

  • Diminish the risk of losing write operations with w:1 due to primary failover.
  • Decrease staleness for reads from secondaries.
  • Reduce the latency on write operations with w:“majority” and w:>1. In short, any write concern that needs waiting for replication.
Multithreaded Replication

MongoDB used to write operations in batches through multiple threads to improve concurrency. MongoDB groups the batches by document id while applying each group of operations with a different thread.

MongoDB always executes write operations on a given document in its original write order. This changed in MongoDB 4.0.

From MongoDB 4.0, read operations that targeted secondaries and are configured with a read concern level of “majority” or “local” will now read from a WiredTiger snapshot of the data if the read occurs on a secondary where the replication batches are being applied. Reading from a snapshot guarantees a consistent view of the data, and lets the read occur simultaneously with the ongoing replication without needing a lock.

Therefore, secondary reads needing these read concern levels no longer need to wait for replication batches to be applied and can be handled as they are received.

How To Create a MongoDB Replica Set

As mentioned previously, MongoDB handles replication through replica sets. Over the next few sections, we’ll highlight a few methods that you can use to create replica sets for your use case.

Method 1: Creating a New MongoDB Replica Set on Ubuntu

Before we get started, you’ll need to ensure that you’ve got at least three servers running Ubuntu 20.04, with MongoDB installed on each server.

To set up a replica set, it’s essential to provide an address where each replica set member can be reached by others in the set. In this case, we keep three members in the set. While we can use IP addresses, it’s not recommended as the addresses could change unexpectedly. A better alternative can be using the logical DNS hostnames when configuring replica sets.

We can do this by configuring the subdomain for each replication member. While this can be ideal for a production environment, this section will outline how to configure DNS resolution by editing each server’s respective hosts’ files. This file allows us to assign readable host names to numerical IP addresses. Thus, if in any event your IP address changes, all you have to do is update the hosts’ files on the three servers rather than reconfigure the replica set by scratch!

Mostly, hosts is stored in the /etc/ directory. Repeat the below commands for each of your three servers:

sudo nano /etc/hosts

In the above command, we are using nano as our text editor, however, you can use any text editor that you prefer. After the first few lines which configure the localhost, add an entry for each member of the replica set. These entries take the form of an IP address followed by the human-readable name of your choice. While you can name them whatever you’d like, be sure to be descriptive so you’d know to differentiate between each member. For this tutorial, we’ll be using the below hostnames:

  • mongo0.replset.member
  • mongo1.replset.member
  • mongo2.replset.member

Using these hostnames, your /etc/hosts files would look similar to the following highlighted lines:

This is a snapshot of the /etc/hosts files containing the hostnames along with the IP address.
Hostnames Illustration

Save and close the file.

After configuring the DNS resolution for the replica set, we need to update the firewall rules to allow them to communicate with each other. Run the following ufw command on mongo0 to provide mongo1 access to port 27017 on mongo0:

sudo ufw allow from mongo1_server_ip to any port 27017

In place of the mongo1_server_ip parameter, enter your mongo1 server’s actual IP address. Also, if you’ve updated the Mongo instance on this server to use a non-default port, be sure to change 27017 to reflect the port that your MongoDB instance is using.

Now add another firewall rule to give mongo2 access to the same port:

sudo ufw allow from mongo2_server_ip to any port 27017

In place of the mongo2_server_ip parameter, enter your mongo2 server’s actual IP address. Then, update the firewall rules for your other two servers. Run the following commands on the mongo1 server, making sure to change the IP addresses in place of the server_ip parameter to reflect those of mongo0 and mongo2, respectively:

sudo ufw allow from mongo0_server_ip to any port 27017
sudo ufw allow from mongo2_server_ip to any port 27017

Lastly, run these two commands on mongo2. Again, be sure that you enter the correct IP addresses for each server:

sudo ufw allow from mongo0_server_ip to any port 27017
sudo ufw allow from mongo1_server_ip to any port 27017

Your next step is to update each MongoDB instance’s configuration file to allow external connections. To allow this, you need to modify the config file in each server to reflect the IP address and indicate the replica set. While you can use any preferred text editor, we are using the nano text editor once again. Let’s make the following modifications in each mongod.conf file.

On mongo0:

# network interfaces
net:
port: 27017
bindIp: 127.0.0.1,mongo0.replset.member# replica set
replication:
replSetName: "rs0"

On mongo1:

# network interfaces
net:
port: 27017
bindIp: 127.0.0.1,mongo1.replset.member
replication:
replSetName: "rs0"

On mongo2:

# network interfaces
net:
port: 27017
bindIp: 127.0.0.1,mongo2.replset.member
replication:
replSetName: "rs0"
sudo systemctl restart mongod

With this, you’ve enabled replication for each server’s MongoDB instance.

You may now initialize the replica set by using the rs.initiate() method. This method is only required to be executed on a single MongoDB instance in the replica set. Make sure that the replica set name and member match the configurations you made in each config file previously.

rs.initiate(
  {
    _id: "rs0",
    members: [
      { _id: 0, host: "mongo0.replset.member" },
      { _id: 1, host: "mongo1.replset.member" },
      { _id: 2, host: "mongo2.replset.member" }
    ]
  }
)

If the method returns “ok”: 1 in the output, it means that the replica set was started correctly. Below is an example of what the output should look like:

 "ok": 1,
  "$clusterTime": {
    "clusterTime": Timestamp(1612389071, 1),
    "signature": {
      "hash": BinData(0, "AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      "keyId": NumberLong(0)
    }
  },
  "operationTime": Timestamp(1612389071, 1)
}

Shut Down MongoDB Server

You can shut down a MongoDB server by using the db.shutdownServer() method. Below is the syntax for the same. Both force and timeoutsecs are optional parameters.

db.shutdownServer({
  force: <boolean>,
  timeoutSecs: <int>
})

This method may fail if the mongod replica set member runs certain operations as index builds. To interrupt the operations and force the member to shut down, you can input the boolean parameter force to true.

Restart MongoDB With –replSet

To reset the configuration, make sure that every node in your replica set is stopped. Then delete the local database for every node. Start it again using the –replSet flag and run rs.initiate() on only one mongod instance for the replica set.

mongod --replSet "rs0"

rs.initiate() can take an optional replica set configuration document, namely:

  • The Replication.replSetName or the —replSet option to specify the replica set name in the _id field.
  • The members’ array, which contains one document for each replica set member.

The rs.initiate() method triggers an election and elects one of the members to be the primary.

Add Members to Replica Set

To add members to the set, start mongod instances on various machines. Next, start a mongo client and use rs.add() command.

The rs.add() command has the following basic syntax:

rs.add(HOST_NAME:PORT)

For example,

Assume mongo1 is your mongod instance, and it’s listening on port 27017. Use the Mongo client command rs.add() to add this instance to the replica set.

rs.add("mongo1:27017")

Only after you’re connected to the primary node can you add a mongod instance to the replica set. To verify if you’re connected to the primary, use the command db.isMaster().

Remove Members

To remove a member, we can use rs.remove()

To do so, firstly, shut down the mongod instance you wish to remove by using the db.shutdownServer() method we discussed above.

Next, connect to the replica set’s current primary. To determine the current primary, use db.hello() while connected to any member of the replica set. Once you’ve determined the primary, run either of the following commands:

rs.remove("mongodb-node-04:27017")
rs.remove("mongodb-node-04")
This is a snapshot of the output after carrying out the rs.remove() command.
The above image shows that the node was successfully removed from the replica set. (Image Source: Bmc)

If the replica set needs to elect a new primary, MongoDB might disconnect the shell briefly. In this scenario, it’ll automatically reconnect once again. Also, it may display a DBClientCursor::init call() failed error even though the command succeeds.

Method 2: Configuring a MongoDB Replica Set for Deployment and Testing

In general, you can set up replica sets for testing either with RBAC enabled or disabled. In this method, we’ll be setting up replica sets with the access control disabled for deploying it in a testing environment.

First, create directories for all the instances that are a part of the replica set using the following command:

mkdir -p /srv/mongodb/replicaset0-0  /srv/mongodb/replicaset0-1 /srv/mongodb/replicaset0-2

This command will create directories for three MongoDB instances replicaset0-0, replicaset0-1, and replicaset0-2. Now, start the MongoDB instances for each of them using the following set of commands:

For Server 1:

mongod --replSet replicaset --port 27017 --bind_ip localhost,<hostname(s)|ip address(es)> --dbpath /srv/mongodb/replicaset0-0  --oplogSize 128

For Server 2:

mongod --replSet replicaset --port 27018 --bind_ip localhost,<hostname(s)|ip address(es)> --dbpath /srv/mongodb/replicaset0-0  --oplogSize 128

For Server 3:

mongod --replSet replicaset --port 27019 --bind_ip localhost,<hostname(s)|ip address(es)> --dbpath /srv/mongodb/replicaset0-0  --oplogSize 128

The –oplogSize parameter is used to prevent the machine from getting overloaded during the test phase. It helps reduce the amount of disk space each disk consumes.

Now, connect to one of the instances using the Mongo shell by connecting using the port number below.

mongo --port 27017

We can use the rs.initiate() command to start the replication process. You will have to replace the hostname parameter with your system’s name.

rs conf = {

  _id: "replicaset0",

  members: [

    {  _id: 0,  host: "<hostname>:27017},

    {  _id: 1,  host: "<hostname>:27018"},

    {  _id: 2,  host: "<hostname>:27019"}

   ] }

You may now pass the configuration object file as the parameter for the initiate command and use it as follows:

rs.initiate(rsconf)

And there you have it! You’ve successfully created a MongoDB replica set for development and testing purposes.

Method 3: Transforming a Standalone Instance to a MongoDB Replica Set

MongoDB allows its users to transform their standalone instances into replica sets. While standalone instances are mostly used for the testing and development phase, replica sets are part of the production environment.

To get started, let’s shut down our mongod instance using the following command:

db.adminCommand({"shutdown":"1"})

Restart your instance by using the –repelSet parameter in your command to specify the replica set you’re going to use:

mongod --port 27017 – dbpath /var/lib/mongodb  --replSet replicaSet1 --bind_ip localhost,<hostname(s)|ip address(es)>

You must specify the name of your server along with the unique address in the command.

Connect the shell with your MongoDB instance and use the initiate command to start the replication process and successfully convert the instance to a replica set. You can perform all the basic operations like adding or removing an instance using the following commands:

rs.add(“<host_name:port>”)
rs.remove(“host-name”)

Additionally, you can check the status of your MongoDB replica set using the rs.status() and rs.conf() commands.

Method 4: MongoDB Atlas — A Simpler Alternative

Replication and sharding can work together to form something called a sharded cluster. While setup and configuration can be quite time-consuming albeit straightforward, MongoDB Atlas is a better alternative than the methods mentioned before.

It automates your replica sets, making the process easy to implement. It can deploy globally sharded replica sets with a few clicks, enabling disaster recovery, easier management, data locality, and multi-region deployments.

In MongoDB Atlas, we need to create clusters – they can either be a replica set, or a sharded cluster. For a particular project, the number of nodes in a cluster in other regions is limited to a total of 40.

This excludes the free or shared clusters and the Google cloud regions communicating with each other. The total number of nodes between any two regions must meet this constraint. For example, if there’s a project in which:

  • Region A has 15 nodes.
  • Region B has 25 nodes
  • Region C has 10 nodes

We can only allocate 5 more nodes to region C as,

  1. Region A+ Region B = 40; meets the constraint of 40 being the maximum number of nodes allowed.
  2. Region B+ Region C = 25+10+5 (Additional nodes allocated to C) = 40; meets the constraint of 40 being the maximum number of nodes allowed.
  3. Region A+ Region C =15+10+5 (Additional nodes allocated to C) = 30; meets the constraint of 40 being the maximum number of nodes allowed.

If we allocated 10 more nodes to region C, making region C have 20 nodes, then Region B + Region C = 45 nodes. This would exceed the given constraint, so you may not be able to create a multi-region cluster.

When you create a cluster, Atlas creates a network container in the project for the cloud provider if it wasn’t there previously. To create a replica set cluster in MongoDB Atlas, run the following command in Atlas CLI:

atlas clusters create [name] [options]

Make sure that you give a descriptive cluster name, as it cannot be changed after the cluster is created. The argument can contain ASCII letters, numbers, and hyphens.

There are several options available for cluster creation in MongoDB based on your requirements. For example, if you want continuous cloud backup for your cluster, set --backup to true.

Dealing With Replication Delay

Replication delay may be quite off-putting. It’s a delay between an operation on the primary and the application of that operation from the oplog to the secondary. If your business deals with large data sets, a delay is expected within a certain threshold. However, sometimes external factors could also contribute and increase the delay. To benefit from an up-to-date replication, make sure that:

  1. You route your network traffic in a stable and sufficient bandwidth. Network latency plays a huge role in affecting your replication, and if the network is insufficient to cater to the needs of the replication process, there will be delays in replicating data throughout the replica set.
  2. You have a sufficient disk throughput. If the file system and disk device on the secondary are unable to flush data to disk as quickly as the primary, then the secondary will have difficulty keeping up. Hence, the secondary nodes process the write queries slower than the primary node. This is a common issue in most multi-tenant systems, including virtualized instances and large-scale deployments.
  3. You request a write acknowledgment write concern after an interval to provide the opportunity for secondaries to catch up with the primary, especially when you want to perform a bulk load operation or data ingestion that requires a large number of writes to the primary. The secondaries won’t be able to read the oplog fast enough to keep up with changes; particularly with unacknowledged write concerns.
  4. You identify the running background tasks. Certain tasks like cron jobs, server updates, and security check-ups might have unexpected effects on the network or disk usage, causing delays in the replication process.

If you’re unsure if there’s a replication lag in your application, fret not – the next section discusses troubleshooting strategies!

Troubleshooting MongoDB Replica Sets

You’ve successfully set up your replica sets, but you notice your data is inconsistent across servers. This is heavily alarming for large-scale businesses, however, with quick troubleshooting methods, you may find the cause or even correct the issue! Given below are some common strategies for troubleshooting replica set deployments that could come in handy:

Check Replica Status

We can check the current status of the replica set and the status of each member by running the following command in a mongosh session that is connected to a replica set’s primary.

 rs.status()

Check the Replication Lag

As discussed earlier, replication lag can be a serious problem as it makes “lagged” members ineligible to quickly become primary and increases the possibility that distributed read operations will be inconsistent. We can check the current length of the replication log by using the following command:

rs.printSecondaryReplicationInfo()

This returns the syncedTo value which is the time when the last oplog entry was written to the secondary for each member. Here’s an example to demonstrate the same:

source: m1.example.net:27017
    syncedTo: Mon Oct 10 2022 10:19:35 GMT-0400 (EDT)
    0 secs (0 hrs) behind the primary
source: m2.example.net:27017
    syncedTo: Mon Oct 10 2022 10:19:35 GMT-0400 (EDT)
    0 secs (0 hrs) behind the primary

A delayed member may show as 0 seconds behind the primary when the inactivity period on the primary is greater than the members[n].secondaryDelaySecs value.

Test Connections Between All Members

Each member of a replica set must be able to connect with every other member. Always make sure to verify the connections in both directions. Mostly, firewall configurations or network topologies prevent normal and required connectivity which can block replication.

For example, let’s assume that the mongod instance binds to both localhost and hostname ‘ExampleHostname’ which is associated with the IP Address 198.41.110.1:

mongod --bind_ip localhost, ExampleHostname

To connect to this instance, remote clients must specify the hostname or the IP Address:

mongosh --host ExampleHostname
mongosh --host 198.41.110.1

If a replica set consists of three members, m1, m2, and m3, using the default port 27017, you should test the connection as below:

On m1:

mongosh --host m2 --port 27017
mongosh --host m3 --port 27017

On m2:

mongosh --host m1 --port 27017
mongosh --host m3 --port 27017

On m3:

mongosh --host m1 --port 27017
mongosh --host m2 --port 27017

If any connection in any direction fails, you’d have to check your firewall configuration and reconfigure it to allow the connections.

Ensuring Secure Communications With Keyfile Authentication

By default, keyfile authentication in MongoDB relies on the salted challenge response authentication mechanism (SCRAM). In order to do this, MongoDB must read and validate the user’s provided credentials that include a combination of the username, password, and authentication database that the specific MongoDB instance is aware of. This is the exact mechanism used to authenticate users who supply a password when connecting to the database.

When you enable authentication in MongoDB, Role-Based Access Control (RBAC) is automatically enabled for the replica set, and the user is granted one or more roles that determine their access to database resources. When RBAC is enabled, it means only the valid authenticated Mongo user with the appropriate privileges would be able to access the resources on the system.

The keyfile acts like a shared password for each member in the cluster. This enables each mongod instance in the replica set to use the contents of the keyfile as the shared password for authenticating other members in the deployment.

Only those mongod instances with the correct keyfile can join the replica set. A key’s length must be between 6 and 1024 characters and may only contain characters in the base64 set. Please note that MongoDB strips the whitespace characters when reading keys.

You can generate a keyfile by using various methods. In this tutorial, we use openssl to generate a complex 1024-random-character string to use as a shared password. It then uses chmod to change file permissions to provide read permissions for the file owner only. Avoid storing the keyfile on storage mediums that can be easily disconnected from the hardware hosting the mongod instances, such as a USB drive or a network-attached storage device. Below is the command to generate a keyfile:

openssl rand -base64 756 > <path-to-keyfile>
chmod 400 <path-to-keyfile>

Next, copy the keyfile to each replica set member. Make sure that the user running the mongod instances is the owner of the file and can access the keyfile. After you’ve done the above, shut down all members of the replica set starting with the secondaries. Once all the secondaries are offline, you may go ahead and shut down the primary. It’s essential to follow this order so as to prevent potential rollbacks. Now shut down the mongod instance by running the following command:

use admin
db.shutdownServer()

After the command is run, all members of the replica set will be offline. Now, restart each member of the replica set with access control enabled.

For each member of the replica set, start the mongod instance with either the security.keyFile configuration file setting or the --keyFile command-line option.

If you’re using a configuration file, set

  • security.keyFile to the keyfile’s path, and
  • replication.replSetName to the replica set name.
security:
  keyFile: <path-to-keyfile>
replication:
  replSetName: <replicaSetName>
net:
   bindIp: localhost,<hostname(s)|ip address(es)>

Start the mongod instance using the configuration file:

mongod --config <path-to-config-file>

If you’re using the command line options, start the mongod instance with the following options:

  • –keyFile set to the keyfile’s path, and
  • –replSet set to the replica set name.
mongod --keyFile <path-to-keyfile> --replSet <replicaSetName> --bind_ip localhost,<hostname(s)|ip address(es)>

You can include additional options as required for your configuration. For instance, if you wish remote clients to connect to your deployment or your deployment members are run on different hosts, specify the –bind_ip. For more information, see Localhost Binding Compatibility Changes.

Next, connect to a member of the replica set over the localhost interface. You must run mongosh on the same physical machine as the mongod instance. This interface is only available when no users have been created for the deployment and automatically closes after the creation of the first user.

We then initiate the replica set. From mongosh, run the rs.initiate() method:

rs.initiate(
  {
    _id: "myReplSet",
    members: [
      { _id: 0, host: "mongo1:27017" },
      { _id: 1, host: "mongo2:27017" },
      { _id: 2, host: "mongo3:27017" }
    ]
  }
)

As discussed before, this method elects one of the members to be the primary member of the replica set. To locate the primary member, use rs.status(). Connect to the primary before continuing.

Now, create the user administrator. You can add a user using the db.createUser() method. Make sure that the user should have at least the userAdminAnyDatabase role on the admin database.

The following example creates the user ‘batman’ with the userAdminAnyDatabase role on the admin database:

admin = db.getSiblingDB("admin")
admin.createUser(
  {
    user: "batman",
    pwd: passwordPrompt(), // or cleartext password
    roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
  }
)

Enter the password that was created earlier when prompted.

Next, you must authenticate as the user administrator. To do so, use db.auth() to authenticate. For example:

db.getSiblingDB(“admin”).auth(“batman”, passwordPrompt()) // or cleartext password

Alternatively, you can connect a new mongosh instance to the primary replica set member using the -u <username>, -p <password>, and the --authenticationDatabase parameters.

mongosh -u "batman" -p  --authenticationDatabase "admin"

Even if you do not specify the password in the -p command-line field, mongosh prompts for the password.

Lastly, create the cluster administrator. The clusterAdmin role grants access to replication operations, such as configuring the replica set.

Let’s create a cluster administrator user and assign the clusterAdmin role in the admin database:

db.getSiblingDB("admin").createUser(
  {
    "user": "robin",
    "pwd": passwordPrompt(),     // or cleartext password
    roles: [ { "role" : "clusterAdmin", "db" : "admin" } ]
  }
)

Enter the password when prompted.

If you wish to, you may create additional users to allow clients and interact with the replica set.

And voila! You have successfully enabled keyfile authentication!

Summary

Replication has been an essential requirement when it comes to databases, especially as more businesses scale up. It widely improves the performance, data security, and availability of the system. Speaking of performance, it is pivotal for your WordPress database to monitor performance issues and rectify them in the nick of time, for instance, with Kinsta APM, Jetpack, and Freshping to name a few.

Replication helps ensure data protection across multiple servers and prevents your servers from suffering from heavy downtime(or even worse – losing your data entirely). In this article, we covered the creation of a replica set and some troubleshooting tips along with the importance of replication. Do you use MongoDB replication for your business and has it proven to be useful to you? Let us know in the comment section below!

 

Jeremy Holcombe Kinsta

Content & Marketing Editor at Kinsta, WordPress Web Developer, and Content Writer. Outside of all things WordPress, I enjoy the beach, golf, and movies. I also have tall people problems ;).