High Availability, Erasure Coding, and Virtualization

September 5th, 2018 | Wojciech

This article will describe a few typical MooseFS cluster configurations. Each edition of MooseFS keeps data safe as data is spread across many Chunkservers and it is kept redundant. The main difference between configurations is cluster data availability. Below you can read more about High Availability, Erasure Coding, and Virtualization.

High Availability and No Single Point of Failure

Each edition of MooseFS keeps data safe as data is spread across many Chunkservers and it is kept redundant. However, there is also a higher level of cluster data availability which allows uninterrupted access to data. High Availability means that data is not just safe but also easily accessible by the storage clients.

To achieve HA configuration user has to:

Install at least 3 Chunkservers (if not using erasure codes) or at least 10 Chunkservers (if using Erasure Codes)
Install the Leader Master Server and at least 1 Follower Master Server
Set up replication goal greater than 1 for all data in a cluster or EC with at least 1 parity sum (EC “@1”)
Install the HA-aware (client side) mfsmount daemon
Set up LACP or other means for network HA.

Hardware

A minimal number of 3 Chunkservers are required due to the automated Leader Master Server election mechanism. The election process is designed in such a way that it prevents a possible cluster split-brain scenario. As the minimal odd number greater than 2 (required for redundancy) is 3 – so this is the minimal number of Chunkservers required.
Another reason for using at least 3 Chunkservers is to keep replication goal at the safe level (at least 2) even in case of failure of one Chunkserver. With 3 Chunkservers, when one of them goes down, data is still accessible and it may be replicated to the 2nd available Chunkserver. In the case when only 1 Chunkserver is available, MooseFS cluster waits for another one (in order to elect a new Leader Master Server) and is not able to perform any operations: data may be safe but is inaccessible.
One may use Erasure Coding with HA configuration. Such a case requires more Chunkservers as each chunk is divided into many “parts”. Please refer to Erasure Coding configuration described below.
Installing Metaloggers for HA configuration is not necessary as Follower Master Servers take care of metadata backups. A user may install more than one Follower Master Server to get a higher degree of cluster availability.

Client side

The HA-aware (client side) mfsmount daemon assures constant file system access for client applications – even during automatic Leader Master Server election. It is important to notice that with HA configuration all pending client-side I/O operations are not interrupted, they may be just sustained for a short period of time (usually less than a few seconds).
Using external gateways (SMB, NFS, etc.) in front of the MooseFS HA cluster requires configuring HA for these gateways independently. However, it may not be possible due to a protocol or implementation limitations. MooseFS protocol supports HA for all client-cluster communication.

LACP should be used for network redundancy (as mentioned above) otherwise network becomes a single point of failure.

HA configurations are available for MooseFS 3.x Pro and MooseFS 4.x versions. It is unavailable for MooseFS 1.x, 2.x and 3.x Community Edition.

Cluster with Erasure Coding

Erasure Coding is another way of ensuring data in a cluster. Instead of keeping several copies of each file, which is disk space inefficient, each chunk of data is divided into parts. There are special, additional parts called “parity stripes” or “erasure codes” which are calculated with a special algorithm from the original data. The extra parity stripes allow the cluster to recover missing parts of the original data when necessary.
One may define up to 9 parity stripes to be calculated for every 8 stripes of original data. It is an 8+n type of erasure coding algorithm. All stripes (both original and parity) are always of the same size.

Hardware configuration

A minimal number of Chunkservers for EC configuration is 8+2n, where n is the desired number of parity stripes to be calculated by a cluster.
It should be remembered that in configurations where erasure coding is used, parity sums are always calculated by Chunkservers. Data are written by cluster users’ is always kept in several copies in the first step which is an ordinary, non-EC mechanism of keeping data redundant. It is an independent (from client write operations) process to calculate erasure codes. Once parity codes are calculated keeping copies is no longer necessary, obsolete copies are deleted.
Therefore, there may be an increased demand for CPU power – during the calculation of parity codes (writing) and during data recovery (restoration of damaged data). MooseFS’s erasure code calculation algorithm is very efficient with a throughput of up to 5 GiBps per Chunkserver thanks to the implementation of XOR operations on large blocks of memory.

Virtualization

MooseFS cluster on virtual machines

It’s not recommended (although possible) to run the Leader Master Server as a virtual machine. Virtual machines are known for their periodic slowdowns and lags and they may slow down the entire storage cluster.
It’s also not recommended to run Chunkservers as virtual machines. The reason here is that virtual machines usually don’t have access to physical disks which slows down data transfer.
Additionally, virtualization adds an extra layer (host operating system) which slows down most I/O and memory operations and adds extra latency overhead on network operations.

MooseFS cluster in a container (LXC)

Although LXC containers are considered to be more efficient than virtual machines the above arguments also apply to containers. It may be useful however to use containers or VMs for testing or evaluating. There are Docker setup scripts available.

You can check results of MooseFS performance tests on Docker here.

If you want to know more, download MooseFS Hardware Guide.