Data deduplication

The Boomla Filesystem automatically deduplicates data on disk. This means if you have multiple copies of the same data, it will only use disk space once. Alternatively, one can also call this filesystem compression.

This only applies across all branches of a single website. If you have multiple websites, their storage requirements will be calculated independently. This is required as they may be stored on different servers.

Subtree size vs storage used

Calculating the storage space used by a website is a long process and can not be easily cached. For this reason, wherever you see the file size, subtree size or children size of a file, it means a simple mathematical sum for the given subtree, it does not take data deduplication into account.

For this reason and the way dependencies are handled by Boomla, you can easily see websites that are several GBs in size, while in reality, they are only a few MBs. Most often it is caused by the packages installed under /sys/packages.

So how are file sizes any useful? First, if you ignore the /sys file, the rest is most often pretty accurate. Second, file sizes give you a precise upper bound. If you see a file that is 1KB and another that is 10GB, the sizes do provide useful info.

Subscribe to our newsletter!