Same Goal, Different Paths: Why Method Matters

August 28th, 2025 | MooseFS Team

post_thumbnail

In the world of system administration, you quickly learn that there’s rarely only one way to get something done. But what’s easy to forget is that two methods that look like they do the same thing can have wildly different performance, resource usage, and side effects.

We recently ran a set of stress tests in our MooseFS lab that made this point crystal clear. The goal was simple: delete a huge number of files – about 34 million per directory – but the way we did it changed the outcome dramatically.

The Test Setup

We built a test directory containing 34 million empty files split into two subdirectories. Using MooseFS’s snapshot mechanics, we replicated that directory into multiple copies so we could run tests in parallel.

We then tried two different deletion methods:

  1. The traditional POSIX way: rm -rf
  2. A custom GO script that removes files using GO’s built-in os.RemoveAll() function

Test 1: rm -rf

For this test, we mounted the filesystem on 8 different machines and deleted 8 separate copies of the directory – each containing 34 million files.

Results:

  • Master CPU: ~99.9% usage, but still responsive to other commands.
  • Network load: Around 15 Mbits in / 0.5 Gbits out at two brief peaks; negligible otherwise.
  • Completion time: Just over 4 hours to delete 272 million files.

rm -rf works by listing the directory once, then performing a lookup and unlink for each file. It’s CPU-heavy, but relatively network-efficient.

Test 2: The GO Script

We modified the provided GO script so it would accept a path and directly remove files. The key difference: it deletes files in batches of 1024, then re-lists the entire directory before deleting the next batch. This behavior comes from Go’s built-in os.RemoveAll() function.

Results:

  • Master CPU: ~55% usage
  • Network load: A steady 2.5 Gbits/sec outbound
  • Estimated completion time: ~20 days (we stopped early)

Why so slow?
Because after each batch, the GO script:

  1. Closes the directory (releasedir)
  2. Opens it again (opendir)
  3. Re-reads millions of filenames (over the network!)

This repeated readdir process is hugely expensive in network terms. The actual deletion is slow enough to be impractical for large directories.

Why Virtualization Might Make It Worse

Our lab runs on bare metal, and even under load, the master node stayed responsive. In virtualized environments, especially if network performance is impacted by hypervisor overhead, these differences might be magnified. That could explain the extreme CPU usage and unresponsiveness some people see when using the GO script.

A Third Way: Snapshot Trick

MooseFS snapshots can offer a much faster way to delete massive directories – if you understand the trade-offs.

Snapshots in MooseFS are atomic lazy copies: only metadata is copied immediately, and data chunks are only duplicated when modified. If you set the snapshot flag on a directory and then remove it as a snapshot, the whole directory disappears almost instantly.

Example:

mfsseteattr -f snapshot -r my_directory_to_delete

mfsrmsnapshot my_directory_to_delete

On our test directory with 34 million files:

  • Setting the snapshot flag: 0.7 seconds
  • Removing the snapshot: 12 seconds (empty files)
  • Estimated with real data: ~24 seconds

This method does block the master briefly, but it’s predictable and far shorter than hours-long traditional deletion.

Takeaways

  1. The “same” operation can behave very differently depending on how it’s done.
    rm -rf and the GO script both “delete files,” but their performance, CPU load, and network footprint couldn’t be more different.
  2. Understand your filesystem’s mechanics.
    In MooseFS, readdir operations over huge directories are expensive, especially when repeated unnecessarily.
  3. Think about your environment.
    Virtualization can amplify inefficiencies – especially network-heavy ones.
  4. Sometimes unconventional is best.
    If you can handle a brief master-blocking event, the snapshot method can be a lifesaver.

Bottom line

Before you hit “Enter” on a seemingly routine command, remember: the how matters just as much as the what. The wrong method for the right job can turn a minutes-long task into a multi-day ordeal.

Looking Forward

After investigating this phenomenon, we made changes in MooseFS to improve how readdir works. It is now more resistant to the kind of repeated directory scans triggered by scripts like GO’s os.RemoveAll(), and the execution time of such scripts has improved noticeably.

That said, it’s important to remember that in some filesystems, deleting a file only marks it as removed. In those cases, repeatedly re-reading large directories can still lead to serious performance issues.

This improvement to readdir will be available in the next release of MooseFS.