Blog · 2017-11-24 · MooseFS Team

MooseFS Performance Scores High on InfiniBand Network

IOzone benchmarks on a 56 Gbit/s IPoIB cluster – run jointly by Core Technology and ICM University of Warsaw – show MooseFS hitting over 18 GB/s sequential read throughput in a 32-thread distributed setup. The full results tables, optimal block-size and thread-count findings, and hardware configuration details are all here.

InfiniBand network logo

We are excited to announce that tests were successfully conducted by Core Technology in cooperation with the Interdisciplinary Centre for Mathematical and Computational Modelling at the University of Warsaw, to check the performance of MooseFS over IPoIB configuration, demonstrating throughput numbers in single client and distributed setup environments. These tests were performed with MooseFS 4.0 software version but the results are also achievable with MooseFS 3.0.92+ version.

The tests show that the MooseFS distributed file system is able to achieve very good performance with IPoIB protocol. Also, we get to understand that the best performance can be achieved using at least 4 threads and block size of at least 64k. Block size is a very important aspect of TCP/IP network communication, especially for random operations.

The gathered data shows that not in all cases, increasing the number of threads increases MooseFS client performance. When we use block sizes greater than 128k the performance of sequential and random read/write does not increase more. However, increasing the number of threads very quickly leads to maximum throughput for sequential read and write. Also, random read performance increases up to 12 threads for 2048k blocks and is linear for a 16k block in whole test range from 1 to 16 threads.

All the results were achieved with IPoIB configuration. Native IB throughput achieved in such a setup is unparalleled. All tests proved that storage based on MooseFS with InfiniBand network was able to provide exceptional performance. MooseFS network defined storage is a perfect solution for HPC environment. The optimal power of MooseFS is noticeable with parallel operations on many distributed MooseFS clients. This is indeed very good news for all MooseFS users!

About ICM UW

ICM UW (Interdisciplinary Centre for Mathematical and Computational Modelling at University of Warsaw) is a leading data science facility in Central Europe. High-performance computers used for processing, analysis, visualization and advanced computing tasks are ICM’s specialty. ICM’s goal is to understand data and provide innovative solutions to organizations and institutions, taking advantage of their data science expertise.

For more information please visit: http://icm.edu.pl

The tests were conducted in Single Client and Distributed Client setup environments. The below two sections provide a detailed analysis in these two setups.

Single Client Test

The following section provides single client test description and configuration details. Single client test means that in the whole MooseFS cluster setup, only one server was dedicated as MooseFS client. Benchmark was executed inside the MooseFS client mount point. Benchmark tool used in this test was IOzone software, version 3.465.

MooseFS client tests were performed to show the differences between different block sizes and a number of threads. In data transmission and data storage, a block, sometimes called a physical record, is a sequence of bytes or bits, usually containing some whole number of records, having a maximum length. The number of threads in IOzone benchmark means the number of parallel processes executed during measurement. Each thread operates on one file. In a single client test, the maximum number of threads was set to 16. It means that 16 files were created in MooseFS cluster.

To properly measure performance differences between different block sizes and a number of threads, the test was executed five times for each set of parameters. Maximum and minimum results were removed from average calculations.

IOzone command used in tests:

$ iozone -eI -r {blocksize} -s1g -i0 -i1 -i2 -t {threads}

IOzone benchmark options:

  • -e – Include flush (fsync, flush) in the timing calculations.
  • -I – DIRECT I/O for all file operations. Tells the file system that all operations are to bypass the buffer cache and go directly to disk.
  • -r – Record/block size.
  • -s – File size 1 GB.
  • -i – 0 = write operations, 1 = read operations, 2 = random read and random write operations.
  • -t – Allows the user to specify how many threads or processes to be active during the measurement.
Topology

Single client test cluster consisted of two master servers (leader and follower), seven chunk servers and one client-server. MooseFS client software was installed only on one physical server. All servers were connected through Mellanox FDR switch with 0.02 ms port to port latency declared by the producer. InfiniBand adapter used in each server was ConnectX-3 Mellanox card with maximum throughput 56 Gbit/s. All connections were made with QSFP+ fiber optic cables.

Configuration

To eliminate hard disk bottleneck, 100 GB RAM disks were created on each chunk server. Network transport used IPoIB protocol. No kernel modifications and no additional components were required. MooseFS replication was set to goal 1. Measured average ping between client-server and other servers in the cluster was 0.022 ms. The operating system was CentOS 7.3 with kernel 3.10.0-514.6.1.el7.x86_64.

Hardware configuration of all machines:

  • CPU – 2 × Intel Xeon CPU E5-2680 v3 2.5 GHz (12 cores, 24 threads)
  • RAM – 128 GB DDR4 2133 MHz
  • NIC – ConnectX-3 Mellanox MT27500 Family (56 Gbit/s)
  • Mellanox FDR switch
Results

The following subsection shows plots with test results for sequential and random read/write operations. Figures 2 and 3 show how performance changes with block size and a number of processing threads. We chose 4 and 8 threads to prepare the block size plot (Figure 2) and 16k and 2048k blocks for the threads plot (Figure 3). Figures 4 and 5 show performance during random access read/write operations. The last plot (Figure 6) shows sequential and random access read/write IOPS with 16k blocks and threads in the range from 1 to 16.

Distributed Client Test

This section provides description and configuration details for the distributed test. In this test, all eight MooseFS servers worked as chunkserver and client simultaneously. IOzone benchmark software was executed in cluster testing mode. Each MooseFS client handled 4 separate IOzone processes, each IOzone process operated on four files. In total, the test had 32 threads distributed over eight servers. To properly present performance differences between different block sizes, the test was executed five times. Maximum and minimum results were removed from average calculations.

IOzone command line:

$ iozone -ceIT -i0 -i1 -i2 -+n -r {blocksize} -s1g -+H moosefs -m1 -+m hosts.cfg -t32

IOzone benchmark options:

  • -c – Include close() in the timing calculations.
  • -e – Include flush (fsync, flush) in the timing calculations.
  • -I – Direct I/O for all file operations. Tells the file system that all operations are to bypass the buffer cache and go directly to disk.
  • -T – Use POSIX threads for throughput tests.
  • -i – 0 = write, 1 = read, 2 = random read and random write operations.
  • -+n – No retests selected.
  • -r – Record/block size.
  • -s1g – File size 1 GB.
  • -+H – Hostname of the PIT server.
  • -+m – hosts.cfg file contains the configuration information of the clients for cluster testing.
  • -t – Allows the user to specify how many threads or processes to have active during the measurement.
Distributed Client Test Topology

Distributed client test cluster consists of two master servers and eight chunk servers and clients. All hardware components were the same as in the single client test. One additional chunkserver was prepared on the client machine from the previous test. All eight chunk servers used MooseFS client to run IOzone tests.

Distributed Test Results

The graph shows read, write, random read and random write operations throughput with different block sizes for 32-thread distributed test. On the X-axis is the block size and on the Y-axis is the throughput in gigabytes per second.

Appendix

This section provides detailed results gathered during single and distributed IOzone benchmark tests.

Table 1: MooseFS Single Client IOzone Test Results
Block sizeThreadsSeq. Read (MB/s)Seq. Read (IOPS)Seq. Write (MB/s)Seq. Write (IOPS)Rnd Read (MB/s)Rnd Read (IOPS)Rnd Write (MB/s)Rnd Write (IOPS)
4k1213546541132880325629613735151
4k240310311420552590461187924261839
4k439610135220051247902314636092234
4k635590860176449331323369735390347
4k836693679190485941784566834387759
4k1037997018207528932295866129475150
4k12408104362236603012787115031480390
4k14433110837256655263308443232883970
4k16429109837260665473789671630076921
8k1379485282242865747596022428619
8k26958897637648101891139544857324
8k4953121935386493731672140866384917
8k668888101344440302463147968587694
8k867886799361461523304218362780299
8k1068387478392501254265449455771307
8k1269388661449574615126559756772601
8k1472793104478612486017692756972781
8k1676197395489625826798689354770006
16k1565361593762408585543039525302
16k2105967756662423821621035477449535
16k417181099826854385029919128111671438
16k61493955646434117745729273117875389
16k81196765206634241462640063115874084
16k10114273102723462718005121995561111
16k12113072338815521339706209794960734
16k141125720218455408711227182994560450
16k161147734168535459112888242490457849
32k18062578157818499148472759919166
32k21384443031107354142818998120438540
32k424007679912794092753117000165052794
32k625948299211273606880025588160651403
32k8193661944116337230109535055183958860
32k10179757509123539517141045129138244226
32k12171354822135243270170654596145246452
32k14168854031136743747198663548143245812
32k16170754627138044166225272064142345540
64k19431508471511446229365992614821
64k21563250041412225944286848166626658
64k426914305921843494483813408270943345
64k6300948147177128339126720266269443102
64k8324451909180328846173727798257941265
64k10334753550181028967218734986195631299
64k12251140173194931185260341654199831973
64k14255740918197031518301748264198131695
64k16265842525196531444331152970197131543
128k110268206804643233126449037221
128k21779142311613129016214969192115365
128k4281022484303624287126210093354928394
128k6322325784246119691183514682330326423
128k8332426590243919512246919749315525241
128k10340427230235018798301724137243819505
128k12338627091247219773355628451251520119
128k14348427874247819822391531317249119925
128k16330326425249219938384330742248719899
256k110364143823329133413389063625
256k21713685216786710629251419597837
256k427271090932641305712424968371514860
256k630321212926911076318617444309112362
256k832871315028401136124889953361814473
256k10337813512256710268305912235262910518
256k12341113645264010560364014560265910637
256k14335413417265510621394215768263510542
256k16326713069264610584388615544264610585
512k11058211682916593346699601919
512k21689337716363272622124520544108
512k427855571331366261269253938977794
512k631776355284456891841368234786956
512k833806760282656522546509143848769
512k1034066813266153233078615627335465
512k1234376874274254833623724527385477
512k1434246849272954593969793927335465
512k1632776554274254843844768827305461
1024k110311031841841335335969969
1024k2164816481607160762862820802080
1024k427742774333033301258125839663966
1024k631763176308730871792179231033103
1024k832743274272127212480248037673767
1024k1034423442269826983118311827772777
1024k1233733373277727773602360227672767
1024k1433893389279527953917391727682768
1024k1633533353280528053838383827972797
2048k11020510815407337169958479
2048k21714857158179062931520901045
2048k42734136732831642125562740092005
2048k63075153831031551183891932981649
2048k832481624268613432507125338371919
2048k1033811691277713883054152728221411
2048k1233881694281914093602180128261413
2048k1433971699286414323869193428231411
2048k1633801690282414123899195028031401
Table 2: MooseFS Distributed IOzone Test with 32 Threads
Block sizeThreadsSeq. Read (MB/s)Seq. Read (IOPS)Seq. Write (MB/s)Seq. Write (IOPS)Rnd Read (MB/s)Rnd Read (IOPS)Rnd Write (MB/s)Rnd Write (IOPS)
4k325575142720750081281999734187937537137516
8k3210570135294676029730141262161583939120175
16k32157241006352794750859223091477711586101510
32k321758156259577112467614143132588240877062
64k321862329796278531256566805108881358557363
128k32185521484177839627121014481151400032001
256k3218590743627833313321021840871387215489
512k321870437409787815757103232064639647928
1024k32187001870078027802103711037138283828
2048k321824791237565378310424521249502475

If you want to download this article as PDF, please click here: MooseFS performance scores high on InfiniBand network (PDF).

See also what results MooseFS achieved during performance tests on Docker.