umpf - Git on a New Level

Holger Assmann, Michael Olbrich | 2023-08-29 | kernel, linux, git, opensource, umpf

Modern software development is commonly accompanied by a version control system like Git in order to have changes to the source code being documented in a traceable manner. Furthermore, any version state should be easily reproducible at any time. However, for work on complex projects such as the BSP ("Board Support Package") of an embedded system with its multiple development aspects, the simple stacking of the individual changes on top of each other does not scale.

Oftentimes, different functionalities for a BSP are developed in parallel based on a common version state. Then there might be back ports to be integrated, and specific work intended for upstream has to be managed separately. This separation is commonly handled by introducing individual Git branches.

However, this model becomes complicated once the subsequent merging of more than two branches is due - e.g. in order to be able to test the entire project, or to publish a release. Manual merging of the individual branches is as time-consuming as it is error-prone and it collides with the short test cycles of iterative development models.

Merging Branches Reasonably

To simplify this last mentioned step and to be able to generate a reproducible patch stack from a selection of topic branches, we have developed the Universal Magic Patch Functionator (umpf) as an open source extension for the local Git installation.

With umpf, a linear patch series can be generated within the regular Git infrastructure and across multiple Git branches ("topic branches"), with their individual dependencies being preserved at the same time.

The Idea Behind umpf — umpf combines several parallel topic branches into a serialized patch stack. By referencing the respective HEADs, the underlying branches can always be restored to their state at the time of tagging.

This series of commits is described by a regular Git tag, which is supplemented by umpf with references to the contained topic branches. Such a utag thus combines several properties:

The marker includes a timestamp, which makes it uniquely referable.
A linear patch stack can be exported as a series.
The state of the underlying branches is preserved at the time of the umpf. Later changes of a branch (e.g. force pushes after bug fixes) have no influence on the resulting utag. Consequently, the actual development state at the time of tagging can be easily retraced and restored.
By archiving the contained branches, their dependencies among each other can be traced with normal Git tooling.
The project type is recognized heuristically and a suitable timestamp gets integrated - for example as EXTRAVERSION in a Makefile. This allows for a clear identification later, even of the already compiled and delivered software.

The generating of an utag and its linear patch series is relatively straight-forward and only requires an existing Git project. umpf itself is a bash script and uses the regular tools that Git provides. Therefore, tags and merges generated by umpf can later be processed by any normal Git installation. An utag is therefore also regularly added to the existing repository via git push and can be re-called again via git checkout.

In order to create an utag, umpf needs three parameters, which can either be provided by using a useries text file, or by calling umpf init:

base: The starting point on which the subsequent patch series is to be built. The umpf-base is usually an upstream commit, such as a release tag (e.g. for Linux: "v6.3").
name: The actual name of the umpf, which later also becomes part of the utag. It is recommended to derive it from the base, e.g. "6.3/release-name".
topic: The actual Git branches (topic branches) are each preceded by # umpf-topic:. They should themselves be built on the base commit to avoid complications. It is therefore recommended to re-base branches with a different starting point than the umpf-base first.

Backport Integration

Especially when working with upstream projects, it happens frequently that bug fixes or functional enhancements provided by third parties have to be integrated. The strength of umpf is that such changes can be maintained in a separate branch and thus do not interfere with your own work on the project.

However, this branch integration by umpf only works reliably when the underlying basis is comparable: Topic branches for v5.15 and v5.16, for example, can usually be processed together by umpf without any problems. On the other hand, a further branch based e.g. on v6.2, will lead to complications during umpfing due to the extensive differences in the overall code.

In order to integrate bugfixes and other backports, it is thus recommended to re-base those within their own topic branch on the umpf-base first.

Handling Merge Conflicts

umpf cannot resolve merge conflicts on its own. Manual conflict resolution during umpfing is possible and sometimes unavoidable, e.g. when working with device trees, but it should be kept to a minimum.

However, the solutions to individual conflicts can be recorded by using git-rerere, which automatically resolves known conflicts in case of recurrence. The branches to be integrated should nevertheless have already been restructured and cleaned up appropriately prior to an umpf.

Generating a Linear Series

A useries created as described above can now be used to derive an utag and thus unify all registered development branches into a common release tag:

~/epic-project $  cat ./useries

# umpf-base: v6.3
# umpf-name: 6.3/special-customer-release
# umpf-topic: v6.3/topic/bugfix-branch
# umpf-topic: v6.3/topic/more-fixes
# umpf-end

~/epic-project $  umpf tag ./useries

umpf will now stack all branches in the order they are mentioned in the useries onto the specified umpf-base. An autosquash is also performed so that fixup commits are resolved and do not end up in the final patchstack.

An example of how such an utag is represented in Git can be seen in the figure below. This is a so-called qualified umpf, since it contains all the necessary information for complete reconstruction:

Complete umpf — A "qualified" *umpf* tag (*utag*) as seen by Git.

umpf-base: The base commit, commonly an upstream tag.
Patch Stack: Commits of the topic branches stacked on the umpf-base. The order of the branches is the same as they were passed to umpf.
References: The state of the integrated branches at the time of the umpf. This state is referenced in the utag's commit via the branches' HEADs and thus preserved. Consequently, even a subsequent change to a branch has no effect if the utag is checked out again at a later time. The referencing also causes the Git Garbage Collection to leave the supposedly orphaned branch states in the repository. This way, that very constellation represented by the utag can be re-visited at any point in the future.
utag: A regular Git tag containing references to the underlying branches and their latest commit IDs (HEADs) at the time of umpfing.
Release Tag: Another regular Git tag. This has to be referenced for generating a patch series. The commit may perform a modification to the project code in order to include a timestamp.

The final export of the patch stack can be done with the umpf format-patch command. The export can be varied with parameters: An appended -bb creates a patch series compatible to Yocto projects, -p specifies the target path, and with -u the overwriting of an already existing series can be initiated. Further hints are provided by umpf --help.

umerge

Instead of an utag, we can also build an umerge without an existing useries step by step by calling umpf merge and manually inserting the individual topic branches. To do this, first use to use git checkout in order to switch to the desired umpf-base and then use umpf merge to load the desired branches:

~/epic-project $  git checkout v6.3
~/epic-project $  umpf merge v6.3/topic/bugfix-branch

umpf: merging 'v6.3/topic/bugfix-branch'...
[...]

~/epic-project $  umpf merge v6.3/topic/more-fixes

umpf: merging 'v6.3/topic/more-fixes'...
[...]

This merging of multiple branches is similar to an Octopus Merge, which can be used in Git for the simultaneous merging of more than two branches. Unlike Git, but like with an utag however, the origin of the added commits is preserved by an umpf merge: The underlying branches are referenced in their state at the time of the umerge by an additional entry in the commit message of the merge. This allows the origin of the individual commits - i.e. the original names of their branches as well as their history - to be traced at any time.

A umerge can also be created directly from a useries file by calling umpf build ./useries

An umerge is particularly suitable for iterative development, in which all topic branches of a project are required - e.g. for hardware-related kernel development. Additional changes to the source code can then be simply put on top of the umerge and be tested. Afterwards, by calling umpf distribute, the newly created patches can be interactively assigned to one of the original branches and will become integrated there by umpf directly.

umpf distribute can also be used if the commits were not stacked on an umerge, but on an utag instead.

*umpf* Stacking Feature — Dependencies between the topic branches of a single *useries* can be realized with *umerge*.

Furthermore, umerges can also be used directly in projects if there are dependencies between the individual topic branches. Through the Stacking Feature of umpf, the branch based on an umerge will be regularly integrated into the linearized sequence of commits during subsequent processing. It should be noted that the entry of the branch in question in the useries file must not be inserted before the branches on which it depends; the order of naming is decisive here.

Since the umerge preserves the state of the base branches the new branch relies on, subsequent work on those bases will logically not be carried over to an already existing umerge. This can lead to conflicts when a new utag shall be generated with an older umerge included. In such a case, changes done to the base branches must not affect direct dependencies of the dependent branch.

Why We Are umpfing

At Pengutronix, we have specialized in Embedded Linux, and are thus mainly working on the Linux kernel and associated projects. As part of our "mainline first" strategy, we bring the functionalities developed in our projects into the official upstream channels as early as possible. While this comes with many long-term benefits - such as ensuring the high quality standard of our code through community review - it also means that the resulting patches must not only work when directly integrated into our BSPs, but must also meet the fluctuating upstream situation. This requires a clean separation of the respective topic branch from the rest of the project.

The impetus for the development of umpf ultimately came from our work on driver development for embedded systems: Today, modern SoCs cover an incredibly broad spectrum of implementations and functionalities; at the same time, SoC components of identical design (so-called "intellectual property cores" or "IP cores") are often used in a modular manner - even by different manufacturers. Consequently, we have built up specific driver branches that can be used across several of our integration projects without further adaptation needed.

Working directly on the Linux kernel with its hundreds of branches and its continuous upstream development is of course a prime example of how extensive a project managed with Git can become. But even much smaller projects can benefit from automation aids like umpf once more than one branch is used.

Although an extension like our Universal Magic Patch Functionator might at some point have limitations when it comes to the actual possible complexity a Git repository can get to, umpf has nevertheless successfully enabled us for years to manage a whole range of different patch stacks for a large number of projects and to keep the general versioning effort relatively low in this regard.

We have now decided to make this tool available under MIT license to the open source community - and as always, we look forward to you participation!

Texts and photos are licensed under CC-BY-SA, except explicitly stated otherwise

umpf - Git on a New Level

Merging Branches Reasonably

Backport Integration

Handling Merge Conflicts

Generating a Linear Series

umerge

Why We Are umpfing

Further Readings

Pengutronix Recent Open Source Contributions - Linux 7.1 Edition

Pengutronix Recent Open Source Contributions - Linux 7.0 Edition

Pengutronix Recent Open Source Contributions

Managing Complexity with Open Source

Pengutronix at the Linux Plumbers Conference

Pengutronix at FrOSCon 2024

Pulse Width Modulation (PWM) is easy, isn't it? - Turning it off and on again

Yes we CAN... add new features

Pengutronix at Embedded World 2022

Optimising tig's bash completion by a factor of 1000