We are pleased to announce that MooseFS Team will be attending SC25 Supercomputing Conference 2025, St. Louis, Missouri, 16–21 November 2025, the International Conference for High Performance Computing, Networking, Storage, and Analysis. We invite you to visit us at Booth #6731 to see live demonstrations of MooseFS Pro, learn how MooseFS can simplify and scale your storage infrastructure, and talk directly with our team about your use cases and technical challenges.
SC25 is one of the world’s leading events for the HPC and data storage community, and we’re excited to connect with users, partners, and enthusiasts to share insights, exchange ideas, and showcase what’s next for MooseFS. Whether you’re already using MooseFS or exploring new storage solutions, we’d love to meet you in person and discuss how MooseFS can power your workloads.
Thank you for being a valued part of the MooseFS Community. We look forward to meeting you at upcoming events or connecting with you online.
We’re pleased to announce the release of MooseFS 4.58.2, a maintenance update that improves system reliability, optimizes Client performance, and refines internal consistency across the codebase.
This release is especially important for users operating Master instances that use large amounts of memory (around 500 GiB RAM or more). It introduces significant enhancements to the way MooseFS handles timeouts between Master Leader and Follower – improving resilience, stability, and upgrade safety.
Enhanced Timeout Management in the Master (Pro Edition)
In large deployments, metadata operations in the Master process can be memory-intensive. When the Master occupies several hundred gigabytes of RAM, forking a process (for saving metadata or replicating it to a Follower) can take longer than expected — sometimes exceeding the previous hardcoded 10-second timeout. This could lead to synchronization issues or metadata transfer failures between Master Leader and Follower nodes.
Version 4.58.2 addresses this by:
Unifying variable naming: MATOMA_TIMEOUT now replaces MASTER_TIMEOUT, consistent with other timeout variables.
Making the metadata download module fully respect configured timeout values.
Enabling the leader to automatically measure fork duration and instruct Followers to increase their timeouts dynamically if needed.
These changes ensure that synchronization between the Master Leader and Followers remains reliable, even under heavy memory loads or longer fork operations.
Important Recommendation for Large-Memory Deployments
We strongly recommend upgrading for all users running large-memory Masters (approximately 500 GiB RAM or more). Older versions pose a small but real risk: if a Follower becomes desynchronized for any reason, it may fail to resynchronize due to repeated timeouts. In that scenario, recovery could require an emergency upgrade rather than a planned one.
To avoid this, perform the upgrade following the standard procedure – starting with Followers – and temporarily increase MATOMA_TIMEOUT on those followers to a value above 10 seconds (30 seconds is very safe). After upgrading the Leader, you may reduce the timeout again if desired. Monitoring logs after the upgrade will help you confirm that fork times remain within safe limits.
Client-Side Optimization: Read-Ahead Cache Fix
On the client side, a subtle issue in the read-ahead algorithm was identified and fixed, thanks to a contribution from Chuandew. A small typo in a variable caused overlapping reads within the read-ahead cache, leading to redundant I/O and slightly inefficient behavior. The fix restores the intended logic, improving cache efficiency without changing external behaviour.
Minor Fixes and Community Contributions
This release also includes a series of minor corrections and typo fixes contributed by Community members – in particular, onlyjob and tianon.
We greatly appreciate the continued feedback and attention to detail from our users, which helps keep MooseFS stable, consistent, and user-friendly.
Why upgrade?
MooseFS 4.58.2 strengthens the platform’s resilience under heavy workloads, refines internal timeout handling, and includes valuable community-driven improvements. This update is a safe and recommended upgrade for all users – especially those running large-scale deployments.
We’re excited to announce that MooseFS will be exhibiting at GITEX GLOBAL 2025, the world’s largest technology and startup event, taking place in Dubai, October 13–17.
This year’s GITEX brings together the brightest minds and most innovative companies shaping the digital future – and MooseFS is proud to be part of that conversation. MooseFS continues to empower organizations across industries with scalable, fault-tolerant, and high-performance storage solutions.
At our booth, visitors will have the chance to:
Explore how MooseFS Pro ensures data reliability and efficiency at scale.
Discover real-world use cases across enterprise, cloud, and research environments.
Meet our team and discuss how MooseFS can help solve modern data challenges.
We’re also thrilled to join the Polish national delegation, highlighting innovation and technology excellence from Poland.
Visit us at Stand H2-C22 at GITEX GLOBAL 2025 in Dubai World Trade Centre. Let’s talk about the future of distributed storage – and how open-source innovation is shaping it.
We’re pleased to announce the release of MooseFS 4.58.1, a maintenance update that enhances stability and improves the usability of several tools. If you are currently running any of the 4.57.x or 4.58.x versions, we strongly recommend upgrading to benefit from these fixes.
Master Improvements
On the Master Server side, this release completes the fix for file lock handling in flocklocks.c. The earlier attempt in version 4.57.7 left some cases unresolved, but with this update the issue is now fully addressed. We have also refined the way archive flags are applied: files that are already in the trash will no longer have their archive flags set. This ensures consistent behaviour and avoids unnecessary chunk replication for data that is about to be deleted.
CLI Enhancements
The command-line interface (CLI) has also received attention. When all Chunkservers were disconnected, the CLI previously produced confusing error messages rather than showing the actual state. This problem has now been corrected, making monitoring more reliable. In addition, we made some internal improvements to the way mfscli is built. While these changes are technical in nature and not visible to end users, they help us keep the codebase cleaner and easier to maintain.
Tool Updates
In the tools area, mfssetfacl has been improved in two ways. First, its error reporting is now clearer and more informative. When an invalid ACL expression is provided, the tool highlights the exact point in the expression where parsing failed, making it easier for users to diagnose and correct mistakes. Second, numerical group identifiers are now properly supported again. Previously, only named groups were accepted, but with this fix administrators can use both names and numeric IDs as intended.
Why Upgrade?
MooseFS 4.58.1 does not require any configuration changes, so upgrading is straightforward. This release focuses on polishing the system to make it more predictable, easier to manage, and less prone to errors in daily operations. We encourage all users to update and take advantage of these improvements.
A real-world MooseFS migration story recently shed light on the dangers of certain configuration decisions in distributed file systems. What seemed like a replication bug turned out to be a reminder of why best practices matter, and why relying on single-copy storage is a recipe for trouble.
The Problem: Missing Chunks During Replication
During a migration of millions of chunks to a more reliable Chunkserver, a user noticed something worrying: after rebooting the destination server, several chunks went missing.
Key observations:
Replication reduced the goal from two replicas to one before migration was fully complete.
Three chunks were lost, appearing as “Invalid copies” in mfsfileinfo.
The errors originated from read issues on the ageing source HDDs.
At first glance, this looked like a flaw in MooseFS replication – as if redundant copies were deleted before the system ensured safe placement on the new server.
Why the Chunks Went Missing
In reality, replication in MooseFS is safe: a Chunkserver only reports success once an entire chunk is written and synced (HDD_FSYNC_BEFORE_CLOSE = 1 ensures data is flushed to disk). Only after that confirmation does the Master instruct deletion of the old copy.
So what went wrong?
The source copy itself was already deteriorating. With the replica goal reduced to one, MooseFS had no safety net. When it tried to replicate the bad copy, the corruption was discovered, leaving the file invalid.
Priorities in Chunk Management
MooseFS doesn’t try to evaluate which replica is “better” before deleting. Instead, it resolves chunk “issues” by priority:
Endangered
Undergoal
Wrong label (stored on the wrong class of server)
Overgoal (too many copies)
In this case, overgoal took precedence over migrating data to the correct label. That meant the system dropped extra copies before ensuring new ones were in place.
This priority order wasn’t chosen at random. Historically, “wrong label” had higher priority, but that caused disk space problems: deletions were blocked or delayed while MooseFS tried to replicate first. Changing the order solved capacity issues for most users, but it also means that reducing a dataset to a single replica is fragile – especially if the only remaining copy is on unreliable storage.
The Real Issue
The missing chunks were the result of:
Reducing replica count to one copy for data that wasn’t disposable.
Migrating from unreliable source storage where corruption had already set in.
Expecting MooseFS to make “smart” decisions about which replica to keep, even though the system doesn’t evaluate replica quality.
In other words, this was a configuration problem, not a software flaw.
Best Practices to Avoid Data Loss
Always keep at least two replicas for any data you care about. One copy is never safe.
Treat single-copy goals as disposable only – use them for caches or temporary calculations, never for production data.
Monitor disk health – don’t wait for replication to reveal corruption.
Understand MooseFS priorities – overgoal always comes first, so plan migrations and replica reductions accordingly.
Final Thought
Distributed file systems like MooseFS make trade-offs to balance efficiency, capacity, and safety. But those trade-offs assume sane configuration. Relying on a single copy of data – especially on unreliable hardware – is a gamble that MooseFS cannot protect you from.
We’re excited to announce the release of MooseFS 4.58.0. This version introduces important fixes, optimizations, and usability improvements designed to make the system faster, more reliable, and easier to work with at scale.
Smarter Directory Handling
One of the areas of focus in this release is directory handling. Previous support for partial directory reads caused unnecessary strain when working with directories containing millions of files. In MooseFS 4.58.0, this functionality has been enhanced. The Master Server now supports efficient partial reads, while the client has been updated to read directories piece by piece. This means that even extremely large directories can be browsed without overwhelming the system. To complement this, directory caching has been completely redesigned, and a new mfsreaddirplusminto option was introduced to optimize commands like ls -al when working with massive file sets. Together, these changes make directory browsing significantly faster and more responsive.
Performance Optimizations
Another area of improvement is performance optimization. Extended attribute listings now benefit from a listxattr cache, which delivers smoother performance in scenarios such as Samba mounts. On the hardware compatibility side, a long-standing CRC calculation issue affecting CPUs with unaligned memory access restrictions – such as certain ARM architectures – has been fixed, allowing MooseFS to compile and run reliably on a broader range of systems. Chunkservers also gained an important efficiency update: unnecessary checksum reads before scrub decisions were eliminated, allowing disks with very few chunks to spin down properly when idle. This reduces both power consumption and wear on hardware.
Packaging, Tools, and Service Management
There are also several improvements aimed at packaging, tools, and service management. Debian users will appreciate that postinstall scripts have been corrected to avoid permission conflicts when reinstalling MooseFS on systems that already have the mfs user configured. The mfscli tool now correctly displays chart data in interactive mode, fixing a bug that previously limited visibility even though the JSON output was unaffected. For environments running under systemd, Chunkserver service startup scripts’ definitions now include proper start and stop timeouts. This ensures cleaner shutdowns, particularly on systems with many disks where writing out .chunkdb files can take longer.
Network Block Device Fix
Finally, this release addresses a community-reported issue with Network Block Device (NBD). Users had experienced problems restoring NBD devices after sleep or hibernate, but MooseFS 4.58.0 resolves this issue and restores expected behavior.
Conclusion
Altogether, MooseFS 4.58.0 brings meaningful improvements to scalability, performance, and day-to-day usability. Faster directory operations, smarter caching, extended hardware support, and smoother service management make this an upgrade well worth applying. We strongly encourage all users to update to this version and benefit from these enhancements.
As always, we want to thank the MooseFS Community for the valuable feedback, bug reports, and contributions that help us improve the system with every release.
You can find more details about this release – and previous releases – in our changelog.
In the world of system administration, you quickly learn that there’s rarely only one way to get something done. But what’s easy to forget is that two methods that look like they do the same thing can have wildly different performance, resource usage, and side effects.
We recently ran a set of stress tests in our MooseFS lab that made this point crystal clear. The goal was simple: delete a huge number of files – about 34 million per directory – but the way we did it changed the outcome dramatically.
The Test Setup
We built a test directory containing 34 million empty files split into two subdirectories. Using MooseFS’s snapshot mechanics, we replicated that directory into multiple copies so we could run tests in parallel.
We then tried two different deletion methods:
The traditional POSIX way:rm -rf
A custom GO script that removes files using GO’s built-in os.RemoveAll() function
Test 1: rm -rf
For this test, we mounted the filesystem on 8 different machines and deleted 8 separate copies of the directory – each containing 34 million files.
Results:
Master CPU: ~99.9% usage, but still responsive to other commands.
Network load: Around 15 Mbits in / 0.5 Gbits out at two brief peaks; negligible otherwise.
Completion time: Just over 4 hours to delete 272 million files.
rm -rf works by listing the directory once, then performing a lookup and unlink for each file. It’s CPU-heavy, but relatively network-efficient.
Test 2: The GO Script
We modified the provided GO script so it would accept a path and directly remove files. The key difference: it deletes files in batches of 1024, then re-lists the entire directory before deleting the next batch. This behavior comes from Go’s built-in os.RemoveAll() function.
Results:
Master CPU: ~55% usage
Network load: A steady 2.5 Gbits/sec outbound
Estimated completion time: ~20 days (we stopped early)
Why so slow? Because after each batch, the GO script:
Closes the directory (releasedir)
Opens it again (opendir)
Re-reads millions of filenames (over the network!)
This repeated readdir process is hugely expensive in network terms. The actual deletion is slow enough to be impractical for large directories.
Why Virtualization Might Make It Worse
Our lab runs on bare metal, and even under load, the master node stayed responsive. In virtualized environments, especially if network performance is impacted by hypervisor overhead, these differences might be magnified. That could explain the extreme CPU usage and unresponsiveness some people see when using the GO script.
A Third Way: Snapshot Trick
MooseFS snapshots can offer a much faster way to delete massive directories – if you understand the trade-offs.
Snapshots in MooseFS are atomic lazy copies: only metadata is copied immediately, and data chunks are only duplicated when modified. If you set the snapshot flag on a directory and then remove it as a snapshot, the whole directory disappears almost instantly.
This method does block the master briefly, but it’s predictable and far shorter than hours-long traditional deletion.
Takeaways
The “same” operation can behave very differently depending on how it’s done. rm -rf and the GO script both “delete files,” but their performance, CPU load, and network footprint couldn’t be more different.
Understand your filesystem’s mechanics. In MooseFS, readdir operations over huge directories are expensive, especially when repeated unnecessarily.
Think about your environment. Virtualization can amplify inefficiencies – especially network-heavy ones.
Sometimes unconventional is best. If you can handle a brief master-blocking event, the snapshot method can be a lifesaver.
Bottom line
Before you hit “Enter” on a seemingly routine command, remember: the how matters just as much as the what. The wrong method for the right job can turn a minutes-long task into a multi-day ordeal.
Looking Forward
After investigating this phenomenon, we made changes in MooseFS to improve how readdir works. It is now more resistant to the kind of repeated directory scans triggered by scripts like GO’s os.RemoveAll(), and the execution time of such scripts has improved noticeably.
That said, it’s important to remember that in some filesystems, deleting a file only marks it as removed. In those cases, repeatedly re-reading large directories can still lead to serious performance issues.
This improvement to readdir will be available in the next release of MooseFS.
When you choose MooseFS Pro, you’re not just getting a powerful, high-performance distributed storage system – you’re also gaining access to our dedicated technical support team. We know that every organization operates differently, so we offer two levels of technical support: Basic and Premium. Both plans are designed to keep your MooseFS Pro deployment running smoothly, but they differ in the depth, speed, and personalization of the service you receive.
Basic Support – Solid Assistance for Confident Teams
The Basic Support plan is ideal for teams that have their own technical expertise but want direct access to official MooseFS guidance and updates. During the purchased support period, you’ll have access to the MooseFS Pro packages repository for your cluster, ensuring you can always install the latest software updates. Our team will help you resolve installation issues, clarify any points in the documentation, and explain MooseFS Pro features and processes.
We also assist in interpreting warning or error messages and provide basic guidance for importing or converting data from older MooseFS versions, competing products, or other sources. If you encounter reproducible bugs, our engineers will diagnose them promptly. You can reach us by email during business hours in English, and your issues will be prioritized based on urgency. For more complex situations, we can arrange an online meeting – such as a Google Meet session – to walk you through the solution.
Premium Support – Priority Care and Expert Partnership
The Premium Support plan includes everything in Basic Support but goes much further for organizations that require faster responses, more direct communication, and deeper engagement from our experts. Premium customers enjoy higher prioritization in our service queue, ensuring their requests are addressed without delay.
For urgent cases, you can call our on-duty Support Engineer directly via a dedicated phone line, bypassing the usual channels. When issues are too complex to resolve via email or chat, we can provide remote assistance, working hands-on with you to implement solutions quickly. Additionally, Premium Support gives you access to a dedicated Technical Advisor – a MooseFS Pro expert who knows your environment and can offer tailored recommendations as well as proactive guidance to keep your system running optimally.
Choosing the Right Plan for Your Needs
Selecting the right support level depends on the criticality of your MooseFS Pro deployment. If your environment is important but not mission-critical, and your internal team can manage most daily tasks, Basic Support offers dependable assistance when you need it. However, if your business depends on rapid problem resolution, minimal downtime, and personalized technical advice, Premium Support delivers the peace of mind that comes with priority service and a dedicated expert on your side.
Get in Touch
Whether you are ready to subscribe to a support plan or still considering your options, we are here to help you make the best choice for your environment. To learn more or to sign up for MooseFS Pro Technical Support, please contact us at contact@moosefs.com.
If you’re running MooseFS in production, you know how important it is to have full visibility into its performance and health. Since the latest release, MooseFS offers native support for exporting system metrics in a format compatible with Prometheus, making it easy to build robust monitoring and alerting pipelines.
In this post, we’ll walk you through how to integrate MooseFS with Prometheus, configure your metric exports efficiently, and avoid overloading your monitoring infrastructure – all while getting the critical insights you need.
Why Prometheus?
Prometheus is a popular open-source toolkit for monitoring and alerting. It’s designed for reliability and real-time performance analysis – making it a perfect match for distributed systems like MooseFS.
The good news: everything visible in the MooseFS GUI is also available as Prometheus metrics, meaning you can plug into familiar visualizations and metrics tooling with ease. These metrics are also annotated with descriptions that appear as tooltips in the MooseFS GUI, Prometheus, and Grafana.
Exporting Metrics from MooseFS
MooseFS exposes two primary data sources for metrics:
Command-Line Data Sets – Retrieved via mfscli, and aligned with GUI data panels.
Chart Metrics – Real-time stats for Master and Chunk Servers, available through the MooseFS GUI.
By default, all data sets and chart metrics are exported – but this can lead to performance issues if you’re running a large cluster. Fortunately, you can filter what you collect.
Recommended CLI Scopes
Here are the most useful and permitted mfscli data sets:
SIM: Master states
SLI: License info
SIG: General master info
SMU: Master memory usage
SIC: Chunk health status
SIL: Loop stats
SCS: Chunkserver connections
SMB: Metadata backups
SHD: HDD stats
SSC: Storage classes
SMO: Operation counters
SQU: Quotas
Some scopes (e.g., SMF, SOF, SMS) are intentionally not allowed, due to the risk of exporting excessive data.
Setting Up Prometheus to Scrape MooseFS Metrics
Prometheus scrapes MooseFS metrics from the same host and port as the GUI (default: 9425). Here’s a simple scrape config:
MooseFS can expose hundreds or thousands of metrics, especially when exporting charts per Chunkserver. If you don’t need all this data in Prometheus, consider using:
The MooseFS GUI for occasional deep dives
Prefix whitelisting/blacklisting to fine-tune what’s collected
Reduced scrape intervals for lightweight metrics
How MooseFS Metrics Are Structured
All metric names start with the mfs_ prefix and mirror the nested structure of the MooseFS JSON output (via mfscli -j).
Examples:
mfs_disks_total
mfs_chunkservers_hdd_used
mfs_info_chunks_summary_regularchunks
Two meta-metrics are also always included:
mfs_cli_execution_time: how long it took to collect metrics
mfs_cgi_info: version and status info of the CGI endpoint
Wrapping Up
Integrating MooseFS with Prometheus gives you full visibility into your storage infrastructure. Whether you want high-level trends or deep-dive operational metrics, you can shape the integration to suit your performance and visibility needs.
We are pleased to announce the release of MooseFS 4.57.7, a version that introduces major architectural improvements with a focus on performance, observability, and maintainability. This release brings a significant evolution in how MooseFS is managed and monitored – particularly through a complete overhaul of the Web GUI and a new foundation for metrics collection.
While much of the work in this version is technical in nature, administrators and users will benefit from a smoother experience, better diagnostics, and more efficient resource usage. The changes also prepare MooseFS for future extensibility and integrations.
Reengineered GUI: New Server, Unified Package, and Faster Response
One of the most visible – and impactful – changes in MooseFS 4.57.7 is the complete replacement of the legacy Web GUI server.
The old GUI architecture consisted of two components: mfscgi and mfscgiserv, both based on Python scripts. These have now been replaced by a new, dedicated GUI server written in C, introduced as part of the new mfsgui package.
This new GUI server brings several key improvements:
Improved performance: Users will notice much faster loading times, especially in charts and tables.
Simplified configuration: Settings are now handled through a proper configuration file (/etc/mfsgui.cfg), replacing the previous method of editing inline HTML content.
Visual consistency: The overall appearance of the GUI remains familiar, but various refinements have been made for improved usability. Tabs have been reorganized, and several tables have been cleaned up and restyled for clarity.
The GUI has also been refactored internally to support these changes and to make future improvements easier to implement.
Prometheus Metrics Integration
Another major enhancement in this release is native support for Prometheus-compatible metrics. Metrics are now directly available from the new GUI server without impacting the performance or availability of the interface.
This change allows administrators to seamlessly integrate MooseFS into their existing monitoring and observability pipelines – no third-party exporters or workarounds required.
Internal Refactors and Technical Improvements
In addition to visible changes, MooseFS 4.57.7 includes several internal updates aimed at improving code quality, maintainability, and stability across core components.
Client Code Improvement
We replaced the use of a potentially unsafe sprintf() function in client code. While the original use was not exploitable, the change was made to eliminate warnings on macOS and to follow modern safe coding practices.
CLI Refactor
The command-line interface code was reorganised into multiple Python modules to improve maintainability. There are no functional changes for end users – this refactor is internal and backward-compatible.
Daemon and Master Enhancements
A rare and long-standing bug in lock handling within the MooseFS master has been fixed. This issue only manifested under complex operation sequences and was reported in GitHub issue.
A defensive check was added in the daemons to handle an edge case where poll() might return zero. This scenario is highly unlikely but now safely accounted for.
An issue with disabling the Linux Out-of-Memory (OOM) killer has been resolved. The error-handling logic was previously misinterpreting return values, and the disabling operation was being performed incorrectly. This fix was prompted by an insightful community question in GitHub Discussions.
Upgrade Notes
Users upgrading to MooseFS 4.57.7 should take note of the following:
Uninstall mfscgi and mfscgiserv if present, and install mfsgui in their place.
Move GUI-related configuration to the new /etc/mfsgui.cfg file. Legacy configuration embedded in HTML is no longer used.
If you are using Prometheus or planning to, configure your monitoring system to collect metrics directly from the GUI server.
No changes are required for CLI usage or client interactions beyond standard package upgrades.
Conclusion
MooseFS 4.57.7 brings meaningful improvements to system transparency, configuration management, and long-term stability. By replacing the legacy Python GUI with a purpose-built C implementation, we’ve significantly improved speed and flexibility – while laying the groundwork for future enhancements.
We recommend all users upgrade to this version to benefit from its improvements and to ensure compatibility with upcoming features.
If you have feedback or questions, feel free to join the discussion on GitHub or contact us directly.
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.