Unattended updates using OSTree¶
The basic OSTree upgrade operation is atomic, meaning the operation either applies an entire update or none at all. OSTree upgrades are never partial. However, upgrades can fail in other ways. A system can fail to boot, or it can boot but not work correctly. When an upgrade fails on a VM or a desktop computer, users can interact with the boot menu to reboot the system and fall back to the last working version of the OS image. However, users cannot interactively reboot the embedded systems used for automotive use cases. Instead, the system must automatically detect failures and fall back to the last working image. This process is known as an unattended update.
Watchdogs and boot-once mechanisms¶
The basic mechanism to facilitate unattended updates is an external watchdog.
At a high level, a watchdog workflow follows these steps:
- Configure your system with an external watchdog.
- The watchdog starts a timer.
- An update begins.
- If the update succeeds and the system boots within a predefined time limit without failures, the watchdog receives the stop command and stops.
- If the boot succeeds but some other failure occurs, the system automatically rolls back and reboots into the previous version of the OS image.
- If the boot fails, the watchdog cannot see a stop command. When the timer runs out, the watchdog resets the CPU and forces a reboot into the previous version of the OS image.
To use a watchdog, the system must support boot-once functionality. You can use boot-once mechanisms to configure your system to boot into a new version of an OS and, unless that boot succeeds, the next reboot rolls back into the original version of the OS.
Watchdog in a QEMU VM¶
How you use watchdogs and boot-once mechanisms depends largely on your specific hardware. A single set of instructions that applies to all hardware types does not currently exist. However, you can configure and implement watchdogs and boot-once mechanisms in a QEMU VM for experimentation purposes.
QEMU supports some emulated hardware watchdogs, but they reset the watchdog upon system reboot and are therefore incompatible with
unattended updates. However, you can add a simple external watchdog script, /dev/virtio-ports/watchdog.0
, by adding the --watchdog
option when
you run the automotive-image-runner
script. Adding the --verbose
option enables messages from the watchdog.
Boot-once mechanisms in grub2
¶
OSTree images use grub2
to boot the system, which uses
boot loader specification (BLS)
files to describe the possible boot targets and supports a boot counter mechanism to trigger the fallback. After an update, OSTree
creates BLS files for new and old targets, where the new target is first (the default boot) and the old target is second.
Each time grub
boots, it loads the grubenv
file, which stores the key/value state between boots. In particular, it supports the boot_counter
and boot_success
keys. If boot_counter
is set, it decrements and saves back to grubenv
with each boot. If boot_counter
reaches zero,
the boot fails, and the second BLS entry becomes the default boot. In this scenario, the update rolls back to the old target.
Updating your system¶
Greenboot integrates with OSTree and systemd
to add various forms of health checks that optimize the
watchdog and boot-once mechanisms during updates.
Using Greenboot, the workflow of a typical update follows these steps:
rpm-ostree upgrade
stages an update, which writes the basic OS in place for the next boot, but it doesn’t merge the system/etc
into the new deployment or configuregrub
to boot it.rpm-ostree
triggersostree-finalize-staged.service
, which completes the update after the reboot.greenboot-grub2-set-counter.service
modifiesgrubenv
to setboot_counter
, enabling the boot-once mechanism and health checks for the new boot.- The system reboots.
- Before triggering
boot-complete.target
insystemd
,greenboot-healthcheck.service
runs various checks on the system and detects whether the system functions (green) or fails (red). - If the system fails, the system logs the failure information and reboots. The failure triggers the
boot_counter
mechanism, and the system falls back to the old OSTree deployment. During the next boot, thegreenboot-rpm-ostree-grub2-check-fallback.service
service detects the fallback and makes the old default system permanent. - If the system succeeds, the
greenboot-grub2-set-success.service
removes theboot_counter
key and setsboot_success=1
ingrubenv
. Consequent reboots use the new OS version.
The watchdog service files integrate with this workflow in two ways:
watchdog-ostree-start.service
starts the watchdog before theostree-finalize-staged.service
completes the migration.watchdog-ostree-stop.service
starts afterboot-complete.target
, which indicates that the upgrade was successful, and stops the watchdog.
Prerequisites
-
An OSTree-based image, such as the image that you created in Creating an OSTree-based image
Note
For demonstration purposes, the sample manifest file
upgrade-demo.mpp.yml
is compatible with this procedure because it preconfigures Greenboot and installs and enables watchdog tools and services.
Procedure
-
Update your image. The example command adds the
autosig-sample-slow-startup
as an extra RPM that makes the boot time slower than the 30-second watchdog timer, and it updates theversion
:$ automotive-image-builder build --target qemu --mode image --ostree-repo <ostree-repo-name> \ --define 'version="<X.X>"' --define 'extra_rpms=["<add_extra_rpms>"]' --export qcow2 \ <path>/<manifest-name>.mpp.yml <image-name>.repo
Note
The use of
--define
to modify a build from the command line is an acceptable method in a test environment. In a production environment, however, make changes directly to your manifest file.The example command adds the
autosig-sample-slow-startup
as an extra RPM that makes the boot time slower than the 30-second watchdog timer, and it updates theversion
:$ automotive-image-builder build --target qemu --mode image --ostree-repo ostree-repo \ --define 'version="1.3"' --define 'extra_rpms=["autosig-sample-slow-startup"]' --export qcow2 \ images/upgrade-demo.mpp.yml my-image.repo
Using the
.repo
extension instead of.qcow2
indicates to OSTree that you are updating or iterating on an image rather than creating a new image. The updated image is added to the OSTree repo as a new ref with a unique commit ID. -
Run the image:
For example:
$ automotive-image-runner --verbose --watchdog --publish-dir=ostree-repo my-image.qcow2 publishing ostree-repo on http://10.0.2.100/ port: 2222 → 22 MAC: FE:7a:05:f1:94:85 Image: my-image.qcow2 Running: /usr/bin/qemu-system-x86_64 -drive file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1,snapshot=on,readonly=off -smp 8 -enable-kvm -m 2G -machine q35 -cpu host -device virtio-net-pci,netdev=n0,mac=FE:7a:05:f1:94:85 -netdev user,id=n0,net=10.0.2.0/24,guestfwd=tcp:10.0.2.100:80-cmd:netcat 127.0.0.1 46937,hostfwd=tcp::2222-:22 -qmp unix:/tmp/runvm-ba27770687aa6dd1tklvorxb/qmp-socket,server=on,wait=off -device virtio-serial -chardev socket,path=/tmp/runvm-ba27770687aa6dd1tklvorxb/watch-socket,server=on,wait=off,id=watchdog -device virtserialport,chardev=watchdog,name=watchdog.0 -drive file=my-image.qcow2,index=0,media=disk,format=qcow2,if=virtio,id=rootdisk,snapshot=off Stopped watchdog
Note
Watchdog status messages appear on the terminal command line. To observe watchdog messages, position the VM console so you can see the terminal command line.
-
After the image boots, log in as
root
using the passwordpassword
. -
From the VM console, verify the state of the system:
# rpm-ostree status State: idle Deployments: ● auto-sig:410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed Version: 1.2 (2024-11-11T22:21:43Z) Commit: 410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed # cat /boot/grub2/grubenv kernelopts=root=LABEL=root boot_success=1 # GRUB Environment Block boot_success=1 ...
-
Run
rpm-ostree upgrade
and verify the state of the system:# rpm-ostree upgrade Staging deployment... done Added: autosig-sample-slow-startup-0.1-1.el9.x86_64 Run "systemctl reboot" to start a reboot # rpm-ostree status State: idle Deployments: auto-sig:cs9/x86_64/<target>-<manifest-name> Version: 1.3 (2024-12-05T16:46:21Z) Commit: 500891c082f0232ec520897b2f28db3b349a3e41ee2f03ba18a7ada9b685fcbb ● auto-sig:410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed Version: 1.2 (2024-11-11T22:21:43Z) Commit: 410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed # cat /boot/grub2/grubenv # GRUB Environment Block boot_success=0 boot_counter=1 ...
-
Reboot the system to deploy the new version of your image:
-
On the terminal command line, notice that the watchdog timer starts, which coincides with the system reboot:
-
After the image boots, log in as
root
using the passwordpassword
. -
Quickly verify the state of the system because the VM rapidly reboots and rolls back to the older image version:
# rpm-ostree status State: idle Deployments: ● auto-sig:cs9/x86_64/<target>-<manifest-name> Version: 1.3 (2024-12-05T16:46:21Z) Commit: 500891c082f0232ec520897b2f28db3b349a3e41ee2f03ba18a7ada9b685fcbb auto-sig:410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed Version: 1.2 (2024-11-11T22:21:43Z) Commit: 410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
-
On the terminal command line, notice that the rollback process triggers the watchdog:
-
After the image reboots, log in again.
-
Run
journalctl
to reviewjournald
log messages. Notice theStop watchdog
message after the successful reboot occurs:greenboot-rpm-ostree-grub2-check-fallback[561]: FALLBACK BOOT DETECTED! Default rpm-ostree deployment has been rolled back. Reached target Boot Completion Check. Starting Mark boot as successful in grubenv... Starting greenboot Success Scripts Runner... greenboot[670]: Boot Status is GREEN - Health Check SUCCESS Starting Stop watchdog after update on successful boot... Finished greenboot Success Scripts Runner. watchdog-ostree-stop.service: Deactivated successfully. Finished Stop watchdog after update on successful boot. Finished Mark boot as successful in grubenv.
-
Verify the state of the system. Notice the VM rolled back to the previous OS version:
# rpm-ostree status State: idle Deployments: auto-sig:cs9/x86_64/<target>-<manifest-name> Version: 1.3 (2024-12-05T16:46:21Z) Commit: 500891c082f0232ec520897b2f28db3b349a3e41ee2f03ba18a7ada9b685fcbb ● auto-sig:410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed Version: 1.2 (2024-11-11T22:21:43Z) Commit: 410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed # cat /boot/grub2/grubenv # GRUB Environment Block boot_success=1 ...