How to Quickly Set Up a MooseFS Cluster in Google Cloud
Google Cloud Compute Engine and MooseFS are a fast combination — this guide walks you through creating a master VM, configuring firewall rules for the CGI monitor, and spinning up three chunkservers from a cloned instance template, all on Debian 9.
MooseFS Cluster in Google Cloud Compute Engine
We will use the GCP console to create a MooseFS cluster in Google Cloud with the master instance and 3 chunkserver instances. This guide uses Debian GNU/Linux 9 but you can choose Ubuntu 16.04.
Create Master Instance
For the master server, we will use an instance with 2 vCPUs, 7.5 GB RAM, and 10 GB HDD with Debian 9 Stretch preinstalled. See hardware Master Server requirements.
- First of all, go to Compute Engine / VM instances.
- Create an instance.
- Set instance name to mfsmaster and choose a region for your machine. Use the same region for all instances in this tutorial to minimize latency.
- Choose 2 vCPUs.
- Expand the Management, security, disks, networking, sole tenancy section and click on the networking tab.
- Add Network tag:
moosefs-cgi. This will allow us to add a firewall rule to use MooseFS CGI from the browser. - Finally, click create.
- After a moment you should see a similar result in Compute Engine / VM instances.
Create Firewall Rule for MooseFS CGI
Now we will add a firewall rule to allow connecting to MooseFS CGI on mfsmaster:9425.
- First, go to VPC Network > Firewall rules tab.
- Then click Create Firewall Rule.
- Set name and tag to moosefs-cgi.
- Set source IP to yours. We will use 0.0.0.0/0, which is not recommended practice since everyone will have access.
- Finally set protocol and port: tcp:9425 and create the rule.
- You should be able to see the rule in the list.
Install MooseFS repository
First of all, connect to your machine using SSH.
Update your system:
sudo su
apt update
apt upgrade -y
Download and add repository key:
wget -O - https://ppa.moosefs.com/moosefs.key | apt-key add -
Add an appropriate repository entry in /etc/apt/sources.list.d/moosefs.list:
echo "deb http://ppa.moosefs.com/moosefs-3/apt/$(awk -F= '$1=="ID" { print $2 ;}' /etc/os-release)/$(lsb_release -sc) $(lsb_release -sc) main" > /etc/apt/sources.list.d/moosefs.list
And run:
apt update
Install MooseFS Master
Install the Master with CGI using the following command:
apt install moosefs-master moosefs-cgi moosefs-cgiserv moosefs-cli
To start the master server simply type:
mfsmaster
We don’t recommend automatically starting MooseFS Master, so you will need to run it manually on every restart.
To enable autostart of MooseFS CGI run these commands:
systemctl enable moosefs-cgiserv
systemctl start moosefs-cgiserv
Look at MooseFS CGI
If you open in browser http://MASTER-EXTERNAL-IP:9425 (replace MASTER-EXTERNAL-IP with the external IP address of your MooseFS master machine), you should be able to see CGI.
The grid with the goals in the Info tab will be empty and after you create some files you will see chunk status there. The default goal is set to 2, so each chunk of the file should be on two servers. If all chunks are in cell 2/2 it means all the files are synced and the cluster is balanced. In the Servers tab you will see later the status of your three chunkservers and disk space available.
Create Chunkserver Instance
In contrast to the master server, for the chunkserver machines we will use n1-standard-1 (1 vCPU, 3.75 GB memory) with 100 GB SSD disks.
- Go to Compute Engine / VM instances.
- Create an instance.
- Set instance name to chunkserver and choose the very same region as for the master server.
- Change boot disk to 100 GB SSD.
- Click create.
Install MooseFS repository
Connect to your machine using SSH. Update your system:
sudo su
apt update
apt upgrade -y
Download and add repository key:
wget -O - https://ppa.moosefs.com/moosefs.key | apt-key add -
Add an appropriate repository entry in /etc/apt/sources.list.d/moosefs.list:
echo "deb http://ppa.moosefs.com/moosefs-3/apt/$(awk -F= '$1=="ID" { print $2 ;}' /etc/os-release)/$(lsb_release -sc) $(lsb_release -sc) main" > /etc/apt/sources.list.d/moosefs.list
And run:
apt update
Install MooseFS Chunkserver
apt install moosefs-chunkserver
Configure MooseFS Chunkserver
To run chunkserver we will need to define disks for MooseFS. Chunkserver reads disk configuration from the /etc/mfs/mfshdd.cfg file. We will configure it to use all of the space on the disk, leaving 5 GB free:
mkdir -p /mnt/hd1
chown -R mfs:mfs /mnt/hd1
echo "/mnt/hd1 -5GiB" >> /etc/mfs/mfshdd.cfg
Autostart vs instance template
Warning!
If you want to create a chunkserver instance template – don’t start the MooseFS Chunkserver process! It will register to the Master server and won’t connect to any other master servers. If you want to create a snapshot/image with an autostarting Chunkserver process, turn off the MooseFS Master instance, run the next two lines, and then create the image.
To enable autostart of MooseFS run these commands:
systemctl enable moosefs-chunkserver
systemctl start moosefs-chunkserver
Create Chunkserver Instance Template
Because we will create more than 1–2 instances of chunkserver, we will clone the instance. To clone the instance with the disk we will need to create an image of the disk, then create an instance template to easily create the next 2 chunkserver instances.
- To create a disk image, first stop the chunkserver in Compute Engine / VM instances; after creating the image run it again.
- Go to Compute Engine / Images.
- Create an image.
- Go to Compute Engine / Instance templates.
- Create an instance template.
- Change disk — choose disk from the image.
- Now we will create the chunkserver-template.
MooseFS Chunkservers from the Template
We will create two chunkservers using the instance template.
- Create VM from chunkserver-template.
- Create chunkserver-1 instance. Choose the same zone as before for low latency.
- Create another instance chunkserver-2 in the same zone. As a result, you should be able to see 3 chunkservers and the master machine.
Connect to the MooseFS storage
In order to mount the storage, you will need a machine. You can create a new VM instance and install the MooseFS repository, or connect to a chunkserver. In both cases, you will need to install MooseFS Client with the following command as the superuser:
apt install moosefs-client
You should now be able to mount MooseFS. Create the /mnt/moosefs directory and mount storage there:
mkdir /mnt/moosefs
mfsmount /mnt/moosefs/
Create a file
Now we will create a file in MooseFS and check where the file is written. To create a 1 GB file use the dd command:
dd if=/dev/zero of=/mnt/moosefs/test-1-gigabyte-file bs=4k iflag=fullblock,count_bytes count=1G
ls -lh /mnt/moosefs
Shutdown one of chunkservers
You can shut down one of the chunkservers to see how MooseFS will react. You should be able to see that some of the chunks are now only on one of the servers. These chunks are labeled as undergoal because they are in fewer instances than the goal. The server will be temporarily flagged as in maintenance — you can remove the server completely in the Servers tab to start the data replication process.
Summary
You just learned how to easily set up a MooseFS cluster in Google Cloud! Now you can check the best practices for maximum performance or how to train neural networks using TensorFlow and MooseFS.