commit | 69ecf7e7527c8b392f6334c7a507b6acc77fc965 | [log] [tgz] |
---|---|---|
author | Tomáš Pecka <tomas.pecka@cesnet.cz> | Tue Dec 12 12:32:03 2023 +0100 |
committer | Jan Kundrát <jan.kundrat@cesnet.cz> | Mon Dec 18 12:38:49 2023 +0100 |
tree | 38fe2581604a66fb811daef9e47db4841ad02b86 | |
parent | 7b4af876dd812a689eb9078b3679371e1fbd5595 [diff] |
Sync velia It seems one of our boxes started behaving crazy. It reports 2.3kW input power in one of the PSUs while usually the values are around 21W. The value is way beyond what ietf-hardware YANG model can represent and libyang throws a validation error which kills velia and our whole NETCONF stack. Logs attached below (thanks to Jan Kundrát for those). This brings in the patch that tries to address the issue when trying to write invalid value into sysrepo. It *does not* fix the HW issue but rather it logs an error and signalizes an overflow/underflow in the sensor data. Some logs: * sysfs reads of the power_input values: Note the values at 14:11:29: add-drop-DQ000VOT ~ # while true; do date; cat /sys/class/hwmon/hwmon8/power*_input ; sleep 1; done ... Mon Dec 11 14:11:27 UTC 2023 25250000 21000000 Mon Dec 11 14:11:28 UTC 2023 25500000 21000000 Mon Dec 11 14:11:29 UTC 2023 2316000000 20000000 Mon Dec 11 14:11:30 UTC 2023 25250000 21000000 ... Mon Dec 11 14:12:01 UTC 2023 25250000 20000000 Mon Dec 11 14:12:02 UTC 2023 0 20000000 Mon Dec 11 14:12:03 UTC 2023 25000000 20000000 ... * and the original velia crash Dec 11 13:59:46 add-drop-DQ000VOT veliad-hardware[7997]: terminate called after throwing an instance of 'libyang::ErrorWithCode' Dec 11 13:59:46 add-drop-DQ000VOT veliad-hardware[7997]: what(): Couldn't create a node with path '/ietf-hardware:hardware/component[name='ne:psu2:power-in']/sensor-data/value': LY_EVALID ... Dec 11 13:59:46 add-drop-DQ000VOT main[7997]: Processing node update /ietf-hardware:hardware/component[name='ne:psu2:power-in']/sensor-data/value -> 2316000000 Dec 11 13:59:47 add-drop-DQ000VOT systemd[1]: velia-hardware-g2.service: Main process exited, code=dumped, status=6/ABRT Dec 11 13:59:47 add-drop-DQ000VOT systemd[1]: velia-hardware-g2.service: Failed with result 'core-dump'. Depends-on: https://gerrit.cesnet.cz/c/CzechLight/velia/+/6703 Change-Id: I07ef810a69842e3e37910ae3a427eaf432e1c00e
This repository contains CzechLight-specific bits for Buildroot. Buildroot is a tool which produces system images for flashing to embedded devices. They have a nice documentation which explains everything that one might need.
The system architecture is described in another document. This is a quick build HOWTO.
Everything is in Gerrit. One should not need to clone anything from anywhere else. The build will download source tarballs of various open source components, though.
By default, each change of this repo uploaded to Gerrit causes the CI system to produce a firmware update. On Gerrit, the change will get a comment from Zuul with a link to the CI log server. Next to the logs, a file named artifacts/update.raucb
can be used for updating devices.
Behind the scenes, the system uses Zuul with a configuration tracked in git.
Here's how to reproduce the build on a developer's workstation:
git clone ssh://$YOUR_LOGIN@cesnet.cz@gerrit.cesnet.cz:29418/CzechLight/br2-external czechlight pushd czechlight git submodule update --init --recursive popd mkdir build-clearfog cd build-clearfog ../czechlight/dev-setup-git.sh make czechlight_clearfog_defconfig make -j8
A full rebuild takes about half an hour on a modern laptop.
WARNING: Buildroot is fragile. It is not safe to perform incremental builds after changing an "important" setting. Please check their manual for details.
Apart from the traditional way of re-flashing the SD card or the eMMC from scratch, it's also possible to use RAUC to update. This method preserves the U-Boot version and the U-Boot's environment. Apart from that, everything starting with the kernel and the DTB file and including the root FS is updated. Configuration stored in /cfg
is brought along and preserved as well.
To install an update:
# build node make rsync -avP images/update.raucb somewhere.example.org:path/to/web/root # target, perhaps via an USB console or over SSH rauc install http://somewhere.example.org/update.raucb reboot
Once the updated FW slot boots, the configuration in /cfg
will be automatically upgraded ("migrated") to the newest layout. A downgrade to an incompatible OS version might therefore fail during the next reboot. Completely removing all data in the newly updated slot's cfg
partition will restore functionality, but it is effectively a factory reset.
On a regular Clearfog Base with an eMMC, one has to bootstrap the device first. If recovering a totally bricked board (or one that is fresh from factory), use the kwboot
command to upload the initial, new enough U-Boot via the console. Ensure that the jumpers are set to 0 1 0 0 1
(default for eMMC boot is 0 0 1 1 1
), and then use U-Boot's kwboot
tool:
./host/bin/kwboot -b ./u-boot-spl.kwb -t -p /dev/ttyUSB0
Prepare a USB flash disk with a raw bootable image, images/usb-flash.img
. Use a tool such as dd
to overwrite the raw block device, do not copy the image file. Once in U-Boot, plug the USB flash disk and execute:
usb start; fatload usb 0:1 00800000 boot.scr; source 00800000
The system will boot and flash the eMMC from the USB drive. Once the status LED starts blinking in yellow, data are being transferred to the eMMC. The light changes to solid yellow in later phases of the flashing process. Once everything is done, the status LED shows a solid white light and the system reboots automatically.
Turn off power, remove the USB flash, re-jumper the board (0 0 1 1 1
), power-cycle, and configure MAC addresses at the U-Boot prompt. The MAC addresses are found on the label at the front panel.
=> setenv eth1addr 00:11:17:01:XX:XX => setenv eth2addr 00:11:17:01:XX:YY => setenv eth3addr 00:11:17:01:XX:ZZ
Also set up the system type:
Model | czechlight variable value |
---|---|
ROADM Line Degree | sdn-roadm-line-g2 |
WSS Add/Drop | sdn-roadm-add-drop-g2 |
Hi-resolution Add/Drop | sdn-roadm-hires-add-drop-g2 |
Coherent Add/Drop | sdn-roadm-coherent-a-d-g2 |
Inline EDFA Amplifier | sdn-inline-g2 |
Some prototypes have deprecated PCBs (blue). On these, skip the -g2
suffix. All red PCBs are -g2
.
=> setenv czechlight sdn-roadm-line-g2 => saveenv Saving Environment to MMC... Writing to redundant MMC(0)... OK => boot
Once the system boots (which currently requires a reboot for some unknown reason -- fsck, perhaps?), configure hostname, plug in the network cable, and update SW:
# hostnamectl set-hostname line-XYZSERIALNO # cp /etc/hostname /cfg/etc/ # rauc install http://somewhere.example.org/update.raucb # reboot
Obtain a reasonable Linux distro image for BBB and flash it to a µSD card. Unlock eMMC boot partitions (echo 0 > /sys/class/block/mmcblk1boot0/force_ro; echo 0 > /sys/class/block/mmcblk1boot1/force_ro
). Clean the eMMC data (blkdiscard /dev/mmcblk1
). Flash the content of images/emmc.img
to device's /dev/mmcblk1
. Flash what fits into /dev/mmcblk1boot0
and /dev/mmcblk1boot1
. Fetching the image over web (python3 -m http.server
and wget http://...:8000/emmc.img -O - | dd of=/dev/mmcblk1 conv=sparse
) works well.