When Replication Isn’t Atomic: Lessons from MooseFS Chunk Migration

September 24th, 2025 | MooseFS Team

post_thumbnail

A real-world MooseFS migration story recently shed light on the dangers of certain configuration decisions in distributed file systems. What seemed like a replication bug turned out to be a reminder of why best practices matter, and why relying on single-copy storage is a recipe for trouble.

The Problem: Missing Chunks During Replication

During a migration of millions of chunks to a more reliable Chunkserver, a user noticed something worrying: after rebooting the destination server, several chunks went missing.

Key observations:

  • Replication reduced the goal from two replicas to one before migration was fully complete.
  • Three chunks were lost, appearing as “Invalid copies” in mfsfileinfo.
  • The errors originated from read issues on the ageing source HDDs.

At first glance, this looked like a flaw in MooseFS replication – as if redundant copies were deleted before the system ensured safe placement on the new server.

Why the Chunks Went Missing

In reality, replication in MooseFS is safe: a Chunkserver only reports success once an entire chunk is written and synced (HDD_FSYNC_BEFORE_CLOSE = 1 ensures data is flushed to disk). Only after that confirmation does the Master instruct deletion of the old copy.

So what went wrong?

The source copy itself was already deteriorating. With the replica goal reduced to one, MooseFS had no safety net. When it tried to replicate the bad copy, the corruption was discovered, leaving the file invalid.

Priorities in Chunk Management

MooseFS doesn’t try to evaluate which replica is “better” before deleting. Instead, it resolves chunk “issues” by priority:

  1. Endangered
  2. Undergoal
  3. Wrong label (stored on the wrong class of server)
  4. Overgoal (too many copies)

In this case, overgoal took precedence over migrating data to the correct label. That meant the system dropped extra copies before ensuring new ones were in place.

This priority order wasn’t chosen at random. Historically, “wrong label” had higher priority, but that caused disk space problems: deletions were blocked or delayed while MooseFS tried to replicate first. Changing the order solved capacity issues for most users, but it also means that reducing a dataset to a single replica is fragile – especially if the only remaining copy is on unreliable storage.

The Real Issue

The missing chunks were the result of:

  • Reducing replica count to one copy for data that wasn’t disposable.
  • Migrating from unreliable source storage where corruption had already set in.
  • Expecting MooseFS to make “smart” decisions about which replica to keep, even though the system doesn’t evaluate replica quality.

In other words, this was a configuration problem, not a software flaw.

Best Practices to Avoid Data Loss

  1. Always keep at least two replicas for any data you care about. One copy is never safe.
  2. Treat single-copy goals as disposable only – use them for caches or temporary calculations, never for production data.
  3. Monitor disk health – don’t wait for replication to reveal corruption.
  4. Understand MooseFS priorities – overgoal always comes first, so plan migrations and replica reductions accordingly.

Final Thought

Distributed file systems like MooseFS make trade-offs to balance efficiency, capacity, and safety. But those trade-offs assume sane configuration. Relying on a single copy of data – especially on unreliable hardware – is a gamble that MooseFS cannot protect you from.