The Transactional Update Guide

Thorsten Kukuk

<kukuk@thkukuk.de>

Ignaz Forster

<iforster@suse.com>

Version 0.3, 12. September 2019

Abstract

This is the documentation for transactional-update and is intended for users,
administrators and packagers.

It describes how transactional-update with Btrfs works by giving an overview of
the design, what an administrator needs to know about setting up and operating
such a system and what a packager needs to know for creating compatible
packages.

For specific usage see the transactional-update man page or the list of Kubic
related commands.

-------------------------------------------------------------------------------

1. Introduction

    1.1. Description
    1.2. Definition
    1.3. Motivation

2. Concept

    2.1. Filesystem
    2.2. Updating the correct snapshot
    2.3. Workflow
    2.4. Simplified workflow

3. System setup

    3.1. Read-only file system
    3.2. /var
    3.3. /etc

4. Files
5. Porting to other systems
6. Author/acknowledgments
7. Copyright information for this document

Chapter 1. Introduction

1.1. Description

transactional-update is an application that allows to update a Linux system and
its applications in an atomic way: The update will be performed in the
background, not influencing the currently running system. The update will be
activated by a reboot instead, similar to rpm-ostree or CoreOS' previous
Container OS. However transactional-update is not another package manager, but
is reusing the existing system tools such as RPM as the packaging format and 
zypper as the package manager. It depends on Btrfs due to its snapshotting and
copy-on-write features.

The idea and reason to build up on existing tools is the ability to continue
using existing packages and tool chains for delivery and application of
updates. While currently only implemented for (open)SUSE environments the
concept is vendor independent and may also be implemented for other package
managers and package formats.

Conceptually transactional-update creates a new snapshot with btrfs before
performing any update and uses that snapshot for modifications. Since btrfs
snapshots contain only the difference between two versions and thus are usually
very small updates done with transactional-update are very space efficient.
This also means several snapshots can be installed at the same time without a
problem.

1.2. Definition

A transactional update (also known as atomic upgrade) is an update that

  * is atomic:

      + The update does not influence the running system.

      + The machine can be powered off at any time. When powered on again
        either the unmodified old state or the new state is active, but no
        state in between.

  * can be rolled back:

      + If the upgrade fails or if a newer software version turns out to not be
        compatible with your infrastructure, the system can quickly be restored
        to a previous state.

1.3. Motivation

Linux distributions have had working update mechanisms for many, many years -
so why do we need something new? Distributions evolved, introducing new
concepts such as rolling releases, containers or long time support releases.
While the classical update mechanisms are probably perfectly fine for a regular
desktop user, using a distribution with regular releases, other concepts may
require different concepts.

Distributions with rolling updates face the problem: how should intrusive
updates be applied to a running system - without breaking the update mechanism
itself? Examples like the migration from SysV init to systemd, a major version
update of a desktop environment while the desktop is still running or even only
a small update to D-Bus may give a good idea of the problem. The desktop
environment may simply terminate, killing the update process and leaving the
system in a broken, undefined state. If any update breaks such a system there
needs to be a quick way to roll back the system to the last working state.

On mission critical systems you want to make sure that no service or user
behaviour interferes with the update of the system. And conversely the update
should not modify the system, e.g. by uncontrolled restarts of services or
unexpected modifications to the system in post scripts. Potential interruptions
are deferred to a defined maintenance window instead. For really critical
systems the update can be verified (e.g. using snapper diff) or discarded
before actually booting into the new system. If an update encounters an error
the new snapshot will be discarded automatically.

For cluster nodes it is important that the system is always in a consistent
state, requires no manual interaction and is able to recover itself from error
conditions. For these systems transactional-updates provides automatic updates;
snapshots with failed updates will be automatically removed. Automatic reboots
can be triggered using a variety of different reboot methods (e.g. rebootmgr,
kured or systemd), making the appliance of the updates cluster aware.

Sometimes new kernel versions or software updates are incompatible with your
hardware or other software. In this case there should be a quick and easy way
to roll back to the state before the update was applied.

There are other solutions available for the above problems, like downloading
all RPMs upfront and apply them during the boot phase. This however will block
the system for an unknown period of time while the update is running, delaying
the availablility of the system.

Chapter 2. Concept

2.1. Filesystem

This chapter describes the handling of the root file system, i.e. the core
functionality of transactional-update. Of course not all information (such as /
var or /home) should be stored on the root volume, see Chapter 3, System setup
for a real world setup.

transactional-update is based around several concepts of the Btrfs file system,
a general purpose Copy-on-Write (Cow) filesystem with snapshot and subvolume
support. Subvolumes look like a directory, but behave like a mount point. They
can be accessed from the parent subvolume like a directory, or they can be
mounted on other directories of the same filesytem. Snapshots will be created
from existing subvolumes, excluding other subvolumes inside of it, and are
read-only by default.

Implementation note: transactional-update may also be implemented for any other
file system as long as it provides snapshot functionality and the ability to
boot from snapshots. See Chapter 5, Porting to other systems for requirements
and porting information.

2.2. Updating the correct snapshot

transactional-update is using zypper with the --root option pointing to the new
snapshot for package management. Other commands (such as the creation of
initrd) will be called with chroot.

2.3. Workflow

List of snapshots

At the beginning, there is a list of old snapshots, each one based on the other
one, and the newest one is the current root filesystem.

List of snapshots with new read-only Clone of current root filesystem

In the first step, a new read-only snapshot of the current root filesystem will
be created.

List of snapshots with a read-write Clone of current root filesystem

In the second step we switch the snapshot from read-only to read-write, so that
we can update it.

List of snapshots with a read-write Clone of current root filesystem, which
will be updated with zypper.

In the third step the snapshot will be updated. This can be zypper up or zypper
dup, the installation or removal of a package or any other modification to the
root file system.

List of snapshots with the clone again read-only.

In the fourth step the snapshot will be changed back to read-only, so that the
data cannot be modified anymore.

List of snapshots with the read-only Clone the new default.

The last step is to mark the updated snapshot as new root filesystem. This is
the atomic step: If the power would have been pulled before, the unchanged old
system would have been booted. Now the new, updated system will boot.

List of snapshots with the current root filesystem as newest at the end.

After reboot, the newly prepared snapshot is the new root filesystem. In case
anything goes wrong a rollback to any of the older snapshots can be performed.

List of snapshots with a read-write Clone of current root filesystem, which
will be updated with zypper.

If the system is not rebooted and transactional-update is called again a new
snapshot will be created and updated. This new snapshot is based on the current
running root filesystem again, not on the new default snapshot! For stacking
changes (i.e. if several commands are supposed to be combined in one single
snapshot) the shell command can be used to perform any number of operations.

2.4. Simplified workflow

In essence the logic of transactional-update can be summarized as follows:

  * SNAPSHOT_ID=`snapper create -p -d "Snapshot Update"`


  * btrfs property set ${SNAPSHOT_DIR} ro false


  * zypper -R ${SNAPSHOT_DIR} up|patch|dup|...


  * btrfs property set ${SNAPSHOT_DIR} ro true


  * btrfs subvol set-default ${SNAPSHOT_DIR}


  * systemctl reboot


Chapter 3. System setup

3.1. Read-only file system

transactional-update is typically used on a read-only root file system, even
though it also supports regular read-write systems.

3.2. /var

On a system with snapshot support /var should not be part of the root file
system, otherwise doing a rollback to a previous state would also roll back the
/var contents. On a read-only system this directory also has be mounted in
read-write mode anyway, as several variable data is written into it.

Due to the volatile nature of /var the directory will not be mounted into the
new snapshot during the transactional-update run, as this would break
atomicity: The currently running system depends on the old state of the data
(imagine a database migration was triggered by a package). Any modifications to
/var therefore have to be in the new system, i.e. modifying the contents of /
var as part of the packaging scripts is not allowed.

The only exception to this rule are directories: Those will be recreated during
the first boot into the updated system by the create-dirs-from-rpmdb.service
helper service. For all other cases please use one of the options described in
Packaging for transactional-updates and Migration / Upgrade in the Packaging
guidelines for more information. If a package is breaking this rule a warning
message indicating the affected file is printed at the end of the 
transactional-update run.

3.3. /etc

transactional-update also supports write support to /etc on an otherwise
read-only file system. To do so an overlayfs layer is put on top of the
system's /etc directory. All modified configuration files will end up in the
current snapshot's overlay in /var/lib/overlay/<snapshotnum>/etc.

Each snapshot will have one associated overlay directory. On creating a new
snapshot the previous snapshot's /etc state will be sychronized into the new
snapshot and used as a base. The overlay directories of the current and the new
snapshot are then mounted using overlay stacking, i.e. the new snapshot's
overlay will be mounted as the upperdir and the current snapshot's overlay as
lowerdir. This way changes applied to /etc after the snapshot was taken, but
before the reboot takes place, will still be visible to the new snapshot
(Exception: If the file has been modified both in the current and the new
snapshot, then the file state of the new snapshot will be visible).

If the --continue is used multiple times to extend a new snapshot while the
system has not been rebooted, and if that snapshot is based on the currently
active system, then the synchronization will only run for the first snapshot;
the additional snapshot layers will be added to lowerdir. Again this is to make
sure that changes to the running system will still be visible after booting
into the new system.

Let's have a look at an example fstab entry:

overlay  /etc  overlay  defaults,upperdir=/sysroot/var/lib/overlay/82/etc,lowerdir=/sysroot/var/lib/overlay/81/etc:/sysroot/var/lib/overlay/76/etc:/sysroot/etc,workdir=/sysroot/var/lib/overlay/work-etc,x-systemd.requires-mounts-for=/var,x-systemd.requires-mounts-for=/var/lib/overlay,x-systemd.requires-mounts-for=/sysroot/var,x-systemd.requires-mounts-for=/sysroot/var/lib/overlay,x-initrd.mount  0  0

  * We are currently in snapshot 82 as indicated by the upperdir directory.
    This can be confirmed by typing snapper list or btrfs subvolume get-default
    /. All changes to /etc will end up in this directory.

  * lowerdir contains two numbered overlay directories. The later directory
    with number 76 indicates the snapshot which was used as a base. This
    snapshot's /etc state was also synchronized into the read-only root file
    system of snapshot 82. As the lowerdir contains a second entry with number 
    81 it means that the --continue has been used before the system was
    rebooted. Gaps in the number such as seen here may indicate that those
    snapshots were discarded or a rollback to snapshot 76 was performed. The
    lowest snapshot is always /sysroot/etc, containing the root file system's
    contents.

  * As /etc is mounted by dracut during early boot the options have to be
    prefixed with /sysroot. The x-systemd. options are setting up the volume's 
    systemd dependencies correctly.

Overlays no longer referenced by any snapshots will be deleted during the 
transactional-update cleanup-overlays run.

Chapter 4. Files

/usr/etc/transactional-update.conf

    This is the reference configuration file for transactional-update,
    containing distribution default values. This file should not be changed by
    the administrator.

/etc/transactional-update.conf

    To change the default configuration for transactional-update copy or create
    this file and change the options accordingly. See transactional-update.conf
    (5) for a description of the configuration options. Values from this file
    will overwride the distribution default values.

/var/lib/overlay/

    See Section 3.3, ?/etc? for an explanation of this directory's contents.

Chapter 5. Porting to other systems

You need a CoW filesystem (or anything else with snapshots and rollback), else
this should work with every package manager.

Chapter 6. Author/acknowledgments

This document was written by Thorsten Kukuk <kukuk@suse.com> with many
contributions from Ignaz Forster <iforster@suse.com>.

Chapter 7. Copyright information for this document

