Version Control

Motivation

Boomla is a platform with a new kind of filesystem. While it is possible to use existing tools like Git for version controlling Boomla websites, they require one to step out of the platform, export the filesystem and do versioning there. This is not acceptable as one of the key promises of Boomla is to provide a platform one can use without knowing anything about its surrounding environment. Managing versions MUST be fully supported from within Boomla to live up to its promise.

Scope

The Boomla filesystem is already stored deduplicated in Merkle trees. This document is not concerned about the internals of the filesystem and how it is stored. The filesystem hash of the website is readily accessible and can be used for versioning.

Goals

  • Allow reclaiming space. Boomla is designed to store documents, not source code. The repository size may grow quickly, it must be possible to shrink it.
  • Undo/redo support.
  • Support developers - manual version management.
  • Support casual users. Automatically do versioning in the background.
  • Support commit groups, which allow storing commits in hierarchical order.

ToDo

  • Allow reclaiming space.
  • Casual workflow.
  • Commit groups.
  • Tags.

The Story

In Git, the so called “history” is often rewritten either to reduce repository size or to have a “nicer history”. Let’s clarify something.

If you can rewrite the history, whatever you are rewriting is not the history.

Let’s focus on making the “history” nicer. The “history” is art. It is the sum of all the changes that caused the website to become what it is. In other words, the so called Git “history” is really, a story. If we declare it to be a story, it becomes easier to discuss it. Changing the history is weird. Changing a story is not.

In Boomla, all the changes that cause a website to have reached its current form is called its story.

Why rewriting the Git history is bad

The Git history is the main clue for synchronizing instances of the same repository. It is theoretically append-only, comparing the Git history chains determines if push/pull is a valid action. That is powerful for synchronizing yet it prevents us from rewriting the Git history without breaking synchronization.

The Boomla Story and History

To have ease of synchronization and also have the ability to rewrite a website’s story, we need to separate the data structures for the two purposes. The Boomla Story is meant to reference intermediate states of a website along with messages to explain how it evolved. The Boomla History is a chain of hashes of the website’s story to aid synchronization across distributed nodes.

Two key requirements follow:

  • The state of a website must be purely identified by its state, without its previous states, so that editing the Story becomes cheap (no need to recalculate hash chains).
  • Old states of the Story are considered garbage. Although the History chain references them, they are not to be followed and not considered part of the repository. That allows us to reclaim space while also having an append-only History.

This subtle complication of the data structure introduces great flexibility.

Rolling back

Doing a rollback changes the Story of a website but it doesn’t roll back its History. From the point of view of its History, the changes were first added then removed. Pushing to remotes has no bad side-effects.

Rewriting the Story is also an append-only change to the History.

Transactions

Boomla is transactional, a new snapshot is recorded every time a website is modified.

Commits

When you are happy with your changes, you can make them permanent by explaining it in a commit. A commit records the active state of your website with an explanatory message in the website’s Story. After committing, you will start off from a clean state - an empty undo history.

Commits are much like Git commits. Alternatively, if you are on a casual workflow, the system could generate commits for you automatically. The precise method is not defined here.

Commit grouping

A commit group is a commit with child commits. Child commits provide a more granular story.

Let’s say we have a Story of commits A, B. The feature F is developed and is committed as 3 commits resulting in a Story of A, B, F1, F2, F3. To make the Story cleaner, one can merge them in a single commit group F, which contains child commits B, F1, F2, F3. The high level Story would then simply be A, B, F. Note that the child commits of F contain B, which is the commit where the feature development started from. It allows to see changes made in F1.

Transactions & reclaiming space

Boomla is designed for websites. Websites are storing documents, not source code. Documents are often big, especially if you change them a lot. Imagine you are editing a high resolution image that is 20MB and you save it a 100x. Woops, that’s 2GB of space! It would be nice to throw away your temporary versions.

To aid that, working snapshots are stored in a temporary area and thrown away as soon as you commit your final version. This way you are in control of what goes in your website’s story, while at the same time the space requirements are minimized.

Continuing the above example, assuming you only commit the final version, the first 99 edits are thrown away and you reclaimed 1980MB of space.

Reclaiming more space

You can always open the history view of your website and remove any intermediate commits if you wish. Even better, you may create commit groups for longer periods and remove their more granular inner commits.

For casual use cases, we may even support automatic garbage collection, as you probably don’t need hour level commits of casual contents after years have passed.

Branching

A branch is a hash pointer to a Story. Creating new branches is thus cheap. Branches are referenced by a domain. For example, if your main website is example.com you may have a feature-branch website feature-foo.example.com.

Merging

Merging creates a new commit group. It contains as children all the commits of the other branch from the the latest shared commit.

To clarify, let’s assume the following Story branches:

  A
  | \
  |   C
  |   D
  B 

After merging the branch of commit D onto the branch of commit B, we will end up having:

  A
  B
+ E

Where E contains child commits:

  A
  B
- E
    A
    C
    D

Tags

A tag is a hash pointer to a website’s state. Tags are part of the website’s Story which helps managing them.

Push/pull

Only fast-forward push/pulls are allowed. Any conflicts must be resolved locally.

Casual workflow

Just edit your website as if there was no versioning at all. When shit hits the fan and you want to restore something, your version history will be readily available.

The way the casual workflow makes commits is not defined in this document. To give you an idea, one possible solution is to generate commits after given inactivity by the user, say after 1 hour of inactivity.

Developer workflow

Let’s say you have a website example.com running on the master branch. You would start by creating a new branch to work on, feature.example.com. This creates a new branch of the entire story of the website.

As you make changes, new snapshots are recorded. You need to explain them and thus create commits in your website’s Story. If you created multiple commits, consider grouping them into a single one. Once you are done, merge back into master.

Archiving

Boomla defines that old versions of the Story (referenced by its History) are to be ignored, yet it is not defined that they also must be deleted. In environments where it is unacceptable to loose history, one can set up an archive node, which keeps all historic versions of the Story.

Undo, redo

Boomla offers undo/redo capabilities for uncommitted snapshots. The length of the undo history is not defined in this document.