Unattended updates using OSTree¶

The basic OSTree upgrade operation is atomic, meaning the operation either applies an entire update or none at all. OSTree upgrades are never partial. However, upgrades can fail in other ways. A system can fail to boot, or it can boot but not work correctly. When an upgrade fails on a VM or a desktop computer, users can interact with the boot menu to reboot the system and fall back to the last working version of the OS image. However, users cannot interactively reboot the embedded systems used for automotive use cases. Instead, the system must automatically detect failures and fall back to the last working image. This process is known as an unattended update.

Watchdogs and boot-once mechanisms¶

The basic mechanism to facilitate unattended updates is an external watchdog.

At a high level, a watchdog workflow follows these steps:

Configure your system with an external watchdog.
The watchdog starts a timer.
An update begins.
If the update succeeds and the system boots within a predefined time limit without failures, the watchdog receives the stop command and stops.
If the boot succeeds but some other failure occurs, the system automatically rolls back and reboots into the previous version of the OS image.
If the boot fails, the watchdog cannot see a stop command. When the timer runs out, the watchdog resets the CPU and forces a reboot into the previous version of the OS image.

To use a watchdog, the system must support boot-once functionality. You can use boot-once mechanisms to configure your system to boot into a new version of an OS and, unless that boot succeeds, the next reboot rolls back into the original version of the OS.

Watchdog in a QEMU VM¶

How you use watchdogs and boot-once mechanisms depends largely on your specific hardware. A single set of instructions that applies to all hardware types does not currently exist. However, you can configure and implement watchdogs and boot-once mechanisms in a QEMU VM for experimentation purposes.

QEMU supports some emulated hardware watchdogs, but they reset the watchdog upon system reboot and are therefore incompatible with unattended updates. However, you can add a simple external watchdog script, /dev/virtio-ports/watchdog.0, by adding the --watchdog option when you run the automotive-image-runner script. Adding the --verbose option enables messages from the watchdog.

Boot-once mechanisms in `grub2`¶

OSTree images use grub2 to boot the system, which uses boot loader specification (BLS) files to describe the possible boot targets and supports a boot counter mechanism to trigger the fallback. After an update, OSTree creates BLS files for new and old targets, where the new target is first (the default boot) and the old target is second.

Each time grub boots, it loads the grubenv file, which stores the key/value state between boots. In particular, it supports the boot_counter and boot_success keys. If boot_counter is set, it decrements and saves back to grubenv with each boot. If boot_counter reaches zero, the boot fails, and the second BLS entry becomes the default boot. In this scenario, the update rolls back to the old target.

Greenboot, healthchecks, and watchdog services¶

Greenboot integrates with OSTree and systemd to add various forms of health checks that optimize the watchdog and boot-once mechanisms during updates.

Using Greenboot, the workflow of a typical update follows these steps:

rpm-ostree upgrade stages an update, which writes the basic OS in place for the next boot, but it doesn’t merge the system /etc into the new deployment or configure grub to boot it.
rpm-ostree triggers ostree-finalize-staged.service, which completes the update after the reboot.
greenboot-grub2-set-counter.service modifies grubenv to set boot_counter, enabling the boot-once mechanism and health checks for the new boot.
The system reboots.
Before triggering boot-complete.target in systemd, greenboot-healthcheck.service runs various checks on the system and detects whether the system functions (green) or fails (red).
If the system fails, the system logs the failure information and reboots. The failure triggers the boot_counter mechanism, and the system falls back to the old OSTree deployment. During the next boot, the greenboot-rpm-ostree-grub2-check-fallback.service service detects the fallback and makes the old default system permanent.
If the system succeeds, the greenboot-grub2-set-success.service removes the boot_counter key and sets boot_success=1 in grubenv. Consequent reboots use the new OS version.

The watchdog service files integrate with this workflow in two ways:

watchdog-ostree-start.service starts the watchdog before the ostree-finalize-staged.service completes the migration.
watchdog-ostree-stop.service starts after boot-complete.target, which indicates that the upgrade was successful, and stops the watchdog.

Falling back after a failed update¶

In this procedure, you update your image with an RPM that slows the boot time, which causes the build to fail and triggers the system fall back to the last known successful image.

Prerequisites

An OSTree-based image, such as the image that you created in Building OSTree-based image

Note

For demonstration purposes, the sample manifest file upgrade-demo.mpp.yml is compatible with this procedure because it preconfigures Greenboot and installs and enables watchdog tools and services.

Procedure

Keeping in mind that this manifest uses the old manifest format, edit the manifest so that:
- the mpp-vars section has the top has an entry version: 1.3
- the stage starting with type: org.osbuild.rpm, includes in the list of packages started under packages an entry - autosig-sample-slow-startup.
The autosig-sample-slow-startup RPM makes the boot time slower than the 30-second watchdog timer.
Build the resulting manifest:
```
$ automotive-image-builder build --target qemu --mode image --ostree-repo ostree-repo --export qcow2 \
    upgrade-demo.mpp.yml my-image.repo
```
Using the .repo extension instead of .qcow2 indicates to OSTree that you are updating or iterating on an image rather than creating a new image. The updated image is added to the OSTree repo as a new ref with a unique commit ID.

Run the image:

$ automotive-image-runner --verbose --watchdog --publish-dir=<ostree-repo-name> <image-name>.qcow2

For example:

$ automotive-image-runner --verbose --watchdog --publish-dir=ostree-repo my-image.qcow2
publishing ostree-repo on http://10.0.2.100/
port: 2222 → 22
MAC: FE:7a:05:f1:94:85
Image: my-image.qcow2
Running: /usr/bin/qemu-system-x86_64 -drive file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on
-drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1,snapshot=on,readonly=off -smp 8 -enable-kvm -m 2G -machine q35
-cpu host -device virtio-net-pci,netdev=n0,mac=FE:7a:05:f1:94:85
-netdev user,id=n0,net=10.0.2.0/24,guestfwd=tcp:10.0.2.100:80-cmd:netcat 127.0.0.1 46937,hostfwd=tcp::2222-:22 -qmp
unix:/tmp/runvm-ba27770687aa6dd1tklvorxb/qmp-socket,server=on,wait=off -device virtio-serial -chardev
socket,path=/tmp/runvm-ba27770687aa6dd1tklvorxb/watch-socket,server=on,wait=off,id=watchdog
-device virtserialport,chardev=watchdog,name=watchdog.0
-drive file=my-image.qcow2,index=0,media=disk,format=qcow2,if=virtio,id=rootdisk,snapshot=off
Stopped watchdog

Note

Watchdog status messages appear on the terminal command line. To observe watchdog messages, position the VM console so you can see the terminal command line.

After the image boots, log in as root using the password password.

From the VM console, verify the state of the system:

# rpm-ostree status
State: idle
Deployments:
● auto-sig:410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
              Version: 1.2 (2024-11-11T22:21:43Z)
               Commit: 410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
# cat /boot/grub2/grubenv
kernelopts=root=LABEL=root
boot_success=1
# GRUB Environment Block
boot_success=1
...

Run rpm-ostree upgrade and verify the state of the system:

# rpm-ostree upgrade
Staging deployment... done
Added:
  autosig-sample-slow-startup-0.1-1.el9.x86_64
Run "systemctl reboot" to start a reboot
# rpm-ostree status
State: idle
Deployments:
  auto-sig:cs9/x86_64/<target>-<manifest-name>
              Version: 1.3 (2024-12-05T16:46:21Z)
               Commit: 500891c082f0232ec520897b2f28db3b349a3e41ee2f03ba18a7ada9b685fcbb

● auto-sig:410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
              Version: 1.2 (2024-11-11T22:21:43Z)
               Commit: 410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
# cat /boot/grub2/grubenv
# GRUB Environment Block
boot_success=0
boot_counter=1
...

Reboot the system to deploy the new version of your image:
```
# systemctl reboot
```
On the terminal command line, notice that the watchdog timer starts, which coincides with the system reboot:
```
Starting watchdog for 30 sec
```
After the image boots, log in as root using the password password.

Quickly verify the state of the system because the VM rapidly reboots and rolls back to the older image version:

# rpm-ostree status
State: idle
Deployments:
    ● auto-sig:cs9/x86_64/<target>-<manifest-name>
              Version: 1.3 (2024-12-05T16:46:21Z)
               Commit: 500891c082f0232ec520897b2f28db3b349a3e41ee2f03ba18a7ada9b685fcbb

      auto-sig:410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
              Version: 1.2 (2024-11-11T22:21:43Z)
               Commit: 410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed

On the terminal command line, notice that the rollback process triggers the watchdog:
```
Triggering watchdog
Stopped watchdog
```
After the image reboots, log in again.

Run journalctl to review journald log messages. Notice the Stop watchdog message after the successful reboot occurs:

greenboot-rpm-ostree-grub2-check-fallback[561]: FALLBACK BOOT DETECTED! Default rpm-ostree deployment has been rolled back.
Reached target Boot Completion Check.
Starting Mark boot as successful in grubenv...
Starting greenboot Success Scripts Runner...
greenboot[670]: Boot Status is GREEN - Health Check SUCCESS
Starting Stop watchdog after update on successful boot...
Finished greenboot Success Scripts Runner.
watchdog-ostree-stop.service: Deactivated successfully.
Finished Stop watchdog after update on successful boot.
Finished Mark boot as successful in grubenv.

Verify the state of the system. Notice the VM rolled back to the previous OS version:

# rpm-ostree status
State: idle
Deployments:
      auto-sig:cs9/x86_64/<target>-<manifest-name>
              Version: 1.3 (2024-12-05T16:46:21Z)
               Commit: 500891c082f0232ec520897b2f28db3b349a3e41ee2f03ba18a7ada9b685fcbb

      ● auto-sig:410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
              Version: 1.2 (2024-11-11T22:21:43Z)
               Commit: 410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
# cat /boot/grub2/grubenv
# GRUB Environment Block
boot_success=1
...