unbreak booting within the flash-via-USB image

Our rootfs has grown, and as a result, the default XZ-compressed image
(in the version that's suitable for U-Boot's consumption) no longer fits
into whatever RAM limits are in effect on this platform. Here's how it
looks during boot:

[    3.431410] sd 4:0:0:0: [sda] 2053120 512-byte logical blocks: (1.05 GB/1003 MiB)
[    3.439181] sd 4:0:0:0: [sda] Write Protect is off
[    3.444221] sd 4:0:0:0: [sda] No Caching mode page found
[    3.449562] sd 4:0:0:0: [sda] Assuming drive cache: write through
[    3.457877]  sda: sda1
[    3.460482] sd 4:0:0:0: [sda] Attached SCSI removable disk
[    4.439737] Initramfs unpacking failed: write error
[    4.517106] Freeing initrd memory: 147888K
[    4.522280] Freeing unused kernel image (initmem) memory: 1024K
[    4.528428] Run /init as init process
/init: [    4.540735] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
exec: line 15: /[    4.548977] CPU0: stopping
[    4.548982] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.54-10-g79692ad15f4e1 #1
[    4.548987] Hardware name: Marvell Armada 380/385 (Device Tree)
[    4.548994]  unwind_backtrace from show_stack+0x10/0x14
[    4.549010]  show_stack from dump_stack_lvl+0x40/0x4c
[    4.549021]  dump_stack_lvl from do_handle_IPI+0x18c/0x1c0
[    4.549030]  do_handle_IPI from ipi_handler+0x18/0x20
[    4.549037]  ipi_handler from handle_percpu_devid_irq+0x8c/0x1d8
[    4.549046]  handle_percpu_devid_irq from generic_handle_domain_irq+0x28/0x38
[    4.549057]  generic_handle_domain_irq from gic_handle_irq+0x74/0x88
[    4.549067]  gic_handle_irq from generic_handle_arch_irq+0x34/0x44
[    4.549076]  generic_handle_arch_irq from __irq_svc+0x88/0xb0
[    4.549083] Exception stack(0xc0f01f28 to 0xc0f01f70)
[    4.549089] 1f20:                   00008324 00000001 00000000 00000000 c0f09040 c104f6d0
[    4.549093] 1f40: 00000000 c0f04fb0 00000000 00000000 c0e45a6c 00000000 c104efc8 c0f01f78
[    4.549097] 1f60: c0a05ed8 c0a069d0 60000013 ffffffff
[    4.549100]  __irq_svc from default_idle_call+0x1c/0xb0
[    4.549109]  default_idle_call from do_idle+0x1ac/0x200
[    4.549123]  do_idle from cpu_startup_entry+0x28/0x2c
[    4.549136]  cpu_startup_entry from rest_init+0xac/0xb0
[    4.549148]  rest_init from arch_post_acpi_subsys_init+0x0/0x8
[    4.672002] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00 ]---

This is with a 145MB rootfs.cpio.uboot file (using the default GZ
compression). When we switch to the XZ compression algorithm, that file
is "only" 86MB, and the boot proceeds further:

[    3.259624] sd 4:0:0:0: [sda] 2053120 512-byte logical blocks: (1.05 GB/1003 MiB)
[    3.267368] sd 4:0:0:0: [sda] Write Protect is off
[    3.272358] sd 4:0:0:0: [sda] No Caching mode page found
[    3.277699] sd 4:0:0:0: [sda] Assuming drive cache: write through
[    3.285972]  sda: sda1
[    3.288528] sd 4:0:0:0: [sda] Attached SCSI removable disk
[   20.201047] Freeing initrd memory: 87576K
[   20.206235] Freeing unused kernel image (initmem) memory: 1024K
[   20.212365] Run /init as init process
[   20.238622] systemd[1]: System time is further ahead than 15y after build time, resetting clock to build time.
[   20.252325] systemd[1]: systemd 254 running in system mode (+PAM +AUDIT -SELINUX -APPARMOR -IMA -SMACK -SECCOMP -GCRYPT -GNUTLS +OPENSSL -ACL +BLKID +CURL +ELFUTILS -FIDO2 -IDN2 -IDN -IPTC +KMOD -LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE -TPM2 -BZIP2 -LZ4 -XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON -UTMP -SYSVINIT default-hierarchy=unified)
[   20.284175] systemd[1]: Detected architecture arm.

Welcome to Czech Light v2024.10.01-12-g07b54d8-dirty!

...so, yeah, "it works". Decompressing that XZ-compressed rootfs image
is significantly slower, but this is only relevant for flashing via USB,
so I can live with that.

The process ends up crashing with an unrelated issue later on, but hey,
one thing at a time:

[   20.332189] systemd[1]: Hostname set to <czechlight>.
[   20.337399] systemd[1]: Initializing machine ID from random generator.
[   20.634765] systemd[1]: /usr/lib/systemd/system/netopeer2.service:10: PIDFile= references a path below legacy directory /var/run/, updating /var/run/netopeer2-server.pid → /run/netopeer2-server.pid; please update the unit file accordingly.
[   20.712336] systemd[1]: Expecting device /dev/sda1...
         Expecting device /dev/sda1...
[   20.742488] systemd[1]: Listening on udev Control Socket.
[  OK  ] Listening on udev Control Socket.
[   20.792356] systemd[1]: Listening on udev Kernel Socket.
[  OK  ] Listening on udev Kernel Socket.
[   20.842539] systemd[1]: systemd-udevd.service: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.
[   20.855231] systemd[1]: systemd-udevd.service: (This warning is only shown for the first unit using IP firewalling.)
[   20.902274] (md-udevd)[96]: systemd-udevd.service: Failed to connect stdout to the journal socket, ignoring: No such file or directory
[   20.902418] systemd[1]: Starting Rule-based Manager for Device Events and Files...
         Starting Rule-based Manager for Device Events and Files...
[   20.983333] systemd[1]: Started Rule-based Manager for Device Events and Files.
[  OK  ] Started Rule-based Manager for Device Events and Files.
[    **] A start job is running for /dev/sda1 (1min 29s / 1min 30s)
[  110.890862] systemd[1]: dev-sda1.device: Job dev-sda1.device/start timed out.
[ TIME ] Timed out waiting for device /dev/sda1.
[  110.932064] systemd[1]: Dependency failed for Mount USB flash at /mnt.
[DEPEND] Dependency failed for Mount USB flash at /mnt.
[  110.972052] systemd[1]: Dependency failed for Reflash from USB.
[DEPEND] Dependency failed for Reflash from USB.
[  111.022057] systemd[1]: usb-flash.service: Job usb-flash.service/start failed with result 'dependency'.
[  111.062065] systemd[1]: Startup finished in 20.234s (kernel) + 1min 30.825s (userspace) = 1min 51.060s.
[  111.071850] systemd[1]: mnt.mount: Job mnt.mount/start failed with result 'dependency'.
[  111.079927] systemd[1]: dev-sda1.device: Job dev-sda1.device/start failed with result 'timeout'.

Change-Id: I800ae288cfbf6da88218465b92cecc9bd84d6616
Reported-by: Michal Altmann <m.altmann@trillinis.com>
1 file changed
tree: b55252a6481441f5a3138edad9fa124d15ab707a
  1. .gitmodules
  2. .zuul.yaml
  3. Config.in
  4. README.md
  5. board/
  6. ci/
  7. configs/
  8. crypto/
  9. dev-setup-git.sh
  10. doc/
  11. external.desc
  12. external.mk
  13. package/
  14. submodules/
  15. tests/
README.md

How to use this

This repository contains CzechLight-specific bits for Buildroot. Buildroot is a tool which produces system images for flashing to embedded devices. They have a nice documentation which explains everything that one might need.

The system architecture is described in another document. This is a quick build HOWTO.

Quick Start

Everything is in Gerrit. One should not need to clone anything from anywhere else. The build will download source tarballs of various open source components, though.

By default, each change of this repo uploaded to Gerrit causes the CI system to produce a firmware update. On Gerrit, the change will get a comment from Zuul with a link to the CI log server. Next to the logs, a file named artifacts/update.raucb can be used for updating devices.

Behind the scenes, the system uses Zuul with a configuration tracked in git.

Developer Workflow

Here's how to reproduce the build on a developer's workstation:

git clone ssh://$YOUR_LOGIN@cesnet.cz@gerrit.cesnet.cz:29418/CzechLight/br2-external czechlight
pushd czechlight
git submodule update --init --recursive
popd
mkdir build-clearfog
cd build-clearfog
../czechlight/dev-setup-git.sh
make czechlight_clearfog_defconfig
make -j8

A full rebuild takes about half an hour on a modern laptop.

WARNING: Buildroot is fragile. It is not safe to perform incremental builds after changing an "important" setting. Please check their manual for details.

Installing

Updates via RAUC

Apart from the traditional way of re-flashing the SD card or the eMMC from scratch, it's also possible to use RAUC to update. This method preserves the U-Boot version and the U-Boot's environment. Apart from that, everything starting with the kernel and the DTB file and including the root FS is updated. Configuration stored in /cfg is brought along and preserved as well.

To install an update:

# build node
make
rsync -avP images/update.raucb somewhere.example.org:path/to/web/root

# target, perhaps via an USB console or over SSH
rauc install http://somewhere.example.org/update.raucb
reboot

Once the updated FW slot boots, the configuration in /cfg will be automatically upgraded ("migrated") to the newest layout. A downgrade to an incompatible OS version might therefore fail during the next reboot. Completely removing all data in the newly updated slot's cfg partition will restore functionality, but it is effectively a factory reset.

Initial installation

Clearfog

On a regular Clearfog Base with an eMMC, one has to bootstrap the device first. If recovering a totally bricked board (or one that is fresh from factory), use the kwboot command to upload the initial, new enough U-Boot via the console. Ensure that the jumpers are set to 0 1 0 0 1 (default for eMMC boot is 0 0 1 1 1), and then use U-Boot's kwboot tool:

./host/bin/kwboot -b ./u-boot-spl.kwb -t -p /dev/ttyUSB0

Prepare a USB flash disk with a raw bootable image, images/usb-flash.img. Use a tool such as dd to overwrite the raw block device, do not copy the image file. Once in U-Boot, plug the USB flash disk and execute:

usb start; fatload usb 0:1 00800000 boot.scr; source 00800000

The system will boot and flash the eMMC from the USB drive. Once the status LED starts blinking in yellow, data are being transferred to the eMMC. The light changes to solid yellow in later phases of the flashing process. Once everything is done, the status LED shows a solid white light and the system reboots automatically.

Turn off power, remove the USB flash, re-jumper the board (0 0 1 1 1), power-cycle, and configure MAC addresses at the U-Boot prompt. The MAC addresses are found on the label at the front panel.

=> setenv eth1addr 00:11:17:01:XX:XX
=> setenv eth2addr 00:11:17:01:XX:YY
=> setenv eth3addr 00:11:17:01:XX:ZZ

Also set up the system type:

Modelczechlight variable value
ROADM Line Degreesdn-roadm-line-g2
ROADM Flex Add/Dropsdn-roadm-add-drop-g2
ROADM Hi-Res Add/Dropsdn-roadm-hires-add-drop-g2
ROADM Coherent Add/Dropsdn-roadm-coherent-a-d-g2
Inline Amplifiersdn-inline-g2
BiDi Amplifier C-Band + 1572nmsdn-bidi-cplus1572
BiDi Amplifier C-Band + 1572nm with OCMsdn-bidi-cplus1572-ocm
=> setenv czechlight sdn-roadm-line-g2
=> saveenv
Saving Environment to MMC... Writing to redundant MMC(0)... OK
=> boot

Once the system boots (which currently requires a reboot for some unknown reason -- fsck, perhaps?), configure hostname, plug in the network cable, and update SW:

# hostnamectl set-hostname line-XYZSERIALNO
# cp /etc/hostname /cfg/etc/
# rauc install http://somewhere.example.org/update.raucb
# reboot

Beaglebone Black

Obtain a reasonable Linux distro image for BBB and flash it to a µSD card. Unlock eMMC boot partitions (echo 0 > /sys/class/block/mmcblk1boot0/force_ro; echo 0 > /sys/class/block/mmcblk1boot1/force_ro). Clean the eMMC data (blkdiscard /dev/mmcblk1). Flash the content of images/emmc.img to device's /dev/mmcblk1. Flash what fits into /dev/mmcblk1boot0 and /dev/mmcblk1boot1. Fetching the image over web (python3 -m http.server and wget http://...:8000/emmc.img -O - | dd of=/dev/mmcblk1 conv=sparse) works well.