commit | 09fc70181da95c0c78fa55908d4cfd06cae76a31 | [log] [tgz] |
---|---|---|
author | Jan Kundrát <jan.kundrat@cesnet.cz> | Tue Jan 17 00:25:55 2023 +0100 |
committer | Jan Kundrát <jan.kundrat@cesnet.cz> | Tue Jan 17 00:25:55 2023 +0100 |
tree | 5ae18adf251fc77094e68cc5abb635655ff43ab2 | |
parent | 7c16208e52a7f25f691867c71aacd53fef8fb78b [diff] |
tests: wait longer for interface changes This has been failing in the CI every now and then with a message like this: /home/ci/src/cesnet-gerrit-public/CzechLight/velia/tests/sysrepo_system-ietfinterfaces-sudo.cpp:202: FATAL ERROR: REQUIRE( dataFromSysrepoNoStatistics(client, "/ietf-interfaces:interfaces/interface[name='" + IFACE + "']", sysrepo::Datastore::Operational) == expectedIface ) is NOT correct! values: REQUIRE( { "/ietf-ip:ipv4": "", "/ietf-ip:ipv6": "", "/ietf-ip:ipv6/address[ip='fe80::2:2ff:fe02:202']": "", "/ietf-ip:ipv6/address[ip='fe80::2:2ff:fe02:202']/ip": "fe80::2:2ff:fe02:202", "/ietf-ip:ipv6/address[ip='fe80::2:2ff:fe02:202']/prefix-length": "64", "/ietf-ip:ipv6/autoconf": "", "/name": "czechlight0", "/oper-status": "unknown", "/phys-address": "02:02:02:02:02:02", "/statistics": "", "/type": "iana-if-type:ethernetCsmacd", } == { "/ietf-ip:ipv4": "", "/ietf-ip:ipv6": "", "/ietf-ip:ipv6/autoconf": "", "/name": "czechlight0", "/oper-status": "unknown", "/phys-address": "02:02:02:02:02:02", "/statistics": "", "/type": "iana-if-type:ethernetCsmacd", } ) It turns out that this is indeed the main test thread being "too fast" (or rather the one handling the rtnetlink cache acting too slow) because the main test thread asks for the "updated data" before the netlink thread has gotten around to processing everything. There are three threads: - 18643 is the test's main one - 18782 is the one created by Rtnetlink's nlCacheMngrWatcher - 18783 is the one created by sysrepo for on-demand (pull) retrieval of interface statistics and neighbors The main test thread (18643) calls `ip addr flush`, and sleeps for 250ms right after the `ip` exits (which is not directly visible in the logs, but it will affect the next output from thread 18643). The netlink handler thread 18782 starts processing an update about removed addresses immediately: 03.816 [18643 main T] exec: /usr/bin/sudo /sbin/ip -l 0 addr flush dev czechlight0 03.851 [18782 system T] Netlink update on address of link 'czechlight0', action DEL 03.851 [18782 main T] Processing node deletion /ietf-interfaces:interfaces/interface[name='czechlight0']/ietf-ip:ipv4/address[ip='192.0.2.1'] 03.854 [18643 main T] /usr/bin/sudo /sbin/ip exited The address flushing thing will result in three address deletions (IPv4, then the IPv6-mapped IPv4, and then the IPv6 link-local based on MAC), but their timestamps indicate that this processing is rather slow: 03.851 [18782 system T] Netlink update on address of link 'czechlight0', action DEL 03.851 [18782 main T] Processing node deletion /ietf-interfaces:interfaces/interface[name='czechlight0']/ietf-ip:ipv4/address[ip='192.0.2.1'] ... 04.055 [18782 system T] Netlink update on address of link 'czechlight0', action DEL 04.055 [18782 main T] Processing node deletion /ietf-interfaces:interfaces/interface[name='czechlight0']/ietf-ip:ipv6/address[ip='::ffff:192.0.2.1'] ... 04.147 [18782 main T] Processing node deletion /ietf-interfaces:interfaces/interface[name='czechlight0']/ietf-ip:ipv6/address[ip='fe80::2:2ff:fe02:202'] That's unfortunately happening concurrently with the request for data from the test's main thread: 04.105 [18643 main E] dataFrom /ietf-interfaces:interfaces/interface[name='czechlight0'] Handling of this request also involves the statistics&neighbors extra thread (created by sysrepo for the on-demand, pull subscription): 04.119 [18643 sysrepo I] Published event "operational get" "/ietf-interfaces:interfaces/interface/statistics" with ID 5. 04.121 [18783 sysrepo I] Processing "ietf-interfaces" "operational get" event with ID 5. 04.124 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='czechlight0']/statistics/in-discards -> 0 ... I don't know if this is relevant or not (if this thread locks some libnl mutex, then it has a negative impact on the test, OTOH if it only "takes time", it actually might get the libnl cache thread 18782 more time to process the pending events). The point is, it's still too slow: 04.137 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='lo']/statistics/out-octets -> 0 04.138 [18783 sysrepo I] Successful processing of "operational get" event with ID 5. 04.138 [18643 sysrepo I] Event "operational get" with ID 5 succeeded. 04.143 [18643 sysrepo I] Published event "operational get" "/ietf-interfaces:interfaces/interface/ietf-ip:ipv4/neighbor" with ID 3. 04.144 [18783 sysrepo I] Processing "ietf-interfaces" "operational get" event with ID 3. 04.145 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='eth1']/ietf-ip:ipv4/neighbor[ip='172.16.200.1']/link-layer-address -> fa:16:3e:72:a8:30 04.146 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='eth1']/ietf-ip:ipv4/neighbor[ip='172.16.200.2']/link-layer-address -> fa:16:3e:6f:50:1b 04.147 [18783 sysrepo I] Successful processing of "operational get" event with ID 3. 04.147 [18782 system T] Netlink update on address of link 'czechlight0', action DEL 04.147 [18643 sysrepo I] Event "operational get" with ID 3 succeeded. 04.147 [18782 main T] Processing node deletion /ietf-interfaces:interfaces/interface[name='czechlight0']/ietf-ip:ipv6/address[ip='fe80::2:2ff:fe02:202'] 04.151 [18643 sysrepo I] Published event "operational get" "/ietf-interfaces:interfaces/interface/ietf-ip:ipv6/neighbor" with ID 3. 04.152 [18783 sysrepo I] Processing "ietf-interfaces" "operational get" event with ID 3. 04.153 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='eth0']/ietf-ip:ipv6/neighbor[ip='fe80::8ad9:8f00:3275:1e60']/link-layer-address -> 88:d9:8f:75:1e:60 04.154 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='eth0']/ietf-ip:ipv6/neighbor[ip='fe80::f21c:2d00:32f0:6e00']/link-layer-address -> f0:1c:2d:f0:6e:00 04.154 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='eth0']/ietf-ip:ipv6/neighbor[ip='fe80::f816:3eff:fe21:6d46']/link-layer-address -> fa:16:3e:21:6d:46 04.155 [18783 sysrepo I] Successful processing of "operational get" event with ID 3. 04.155 [18643 sysrepo I] Event "operational get" with ID 3 succeeded. 04.162 [18782 main T] Processing node update /ietf-routing:routing/ribs/rib[name='ipv4-master']/routes/route[1]/ietf-ipv4-unicast-routing:destination-prefix -> 0.0.0.0/0 So TL;DR: this test failure is indeed due to a too short of a delay between flushing the addresses and asking for the updated state. Change-Id: Ic920ac16288d2dc10c18660e2b0388995673f188
Together with sysrepo, this software provides "general system management" of embedded devices. The target platform is anything that runs Linux with systemd. This runs in production on CzechLight SDN DWDM devices.
This component tracks the overal health state of the system, including various sensors, or the state of systemd
units. As an operator-friendly LED at the front panel of the appliance shows the aggregated health state.
Firmware can be updated via RAUC, and various aspects of the system's configuration can be adjusted. This includes a firewall, basic network settings, and authentication management.
For a full list, consult the yang/
directory in this repository.
ietf-access-control-list
, RFC 8519 (with deviations)ietf-hardware
, RFC 8348ietf-system
, RFC 7317 (partial support)ietf-interfaces
, RFC 8343 (generating config for systemd-networkd
, with extensions)ietf-routing
, RFC 8349 (see above)czechlight-system