commit	09fc70181da95c0c78fa55908d4cfd06cae76a31	[log] [tgz]
author	Jan Kundrát <jan.kundrat@cesnet.cz>	Tue Jan 17 00:25:55 2023 +0100
committer	Jan Kundrát <jan.kundrat@cesnet.cz>	Tue Jan 17 00:25:55 2023 +0100
tree	5ae18adf251fc77094e68cc5abb635655ff43ab2
parent	7c16208e52a7f25f691867c71aacd53fef8fb78b [diff]

tests: wait longer for interface changes

This has been failing in the CI every now and then with a message like
this:

 /home/ci/src/cesnet-gerrit-public/CzechLight/velia/tests/sysrepo_system-ietfinterfaces-sudo.cpp:202: FATAL ERROR: REQUIRE( dataFromSysrepoNoStatistics(client, "/ietf-interfaces:interfaces/interface[name='" + IFACE + "']", sysrepo::Datastore::Operational) == expectedIface ) is NOT correct!
   values: REQUIRE( {
   "/ietf-ip:ipv4": "",
   "/ietf-ip:ipv6": "",
   "/ietf-ip:ipv6/address[ip='fe80::2:2ff:fe02:202']": "",
   "/ietf-ip:ipv6/address[ip='fe80::2:2ff:fe02:202']/ip": "fe80::2:2ff:fe02:202",
   "/ietf-ip:ipv6/address[ip='fe80::2:2ff:fe02:202']/prefix-length": "64",
   "/ietf-ip:ipv6/autoconf": "",
   "/name": "czechlight0",
   "/oper-status": "unknown",
   "/phys-address": "02:02:02:02:02:02",
   "/statistics": "",
   "/type": "iana-if-type:ethernetCsmacd",
 } == {
   "/ietf-ip:ipv4": "",
   "/ietf-ip:ipv6": "",
   "/ietf-ip:ipv6/autoconf": "",
   "/name": "czechlight0",
   "/oper-status": "unknown",
   "/phys-address": "02:02:02:02:02:02",
   "/statistics": "",
   "/type": "iana-if-type:ethernetCsmacd",
 } )

It turns out that this is indeed the main test thread being "too fast"
(or rather the one handling the rtnetlink cache acting too slow) because
the main test thread asks for the "updated data" before the netlink
thread has gotten around to processing everything. There are three threads:

- 18643 is the test's main one
- 18782 is the one created by Rtnetlink's nlCacheMngrWatcher
- 18783 is the one created by sysrepo for on-demand (pull) retrieval of interface statistics and neighbors

The main test thread (18643) calls `ip addr flush`, and sleeps for 250ms
right after the `ip` exits (which is not directly visible in the logs,
but it will affect the next output from thread 18643). The netlink
handler thread 18782 starts processing an update about removed addresses
immediately:

 03.816 [18643 main T] exec: /usr/bin/sudo /sbin/ip -l 0 addr flush dev czechlight0
 03.851 [18782 system T] Netlink update on address of link 'czechlight0', action DEL
 03.851 [18782 main T] Processing node deletion /ietf-interfaces:interfaces/interface[name='czechlight0']/ietf-ip:ipv4/address[ip='192.0.2.1']
 03.854 [18643 main T] /usr/bin/sudo /sbin/ip exited

The address flushing thing will result in three address deletions (IPv4,
then the IPv6-mapped IPv4, and then the IPv6 link-local based on MAC),
but their timestamps indicate that this processing is rather slow:

 03.851 [18782 system T] Netlink update on address of link 'czechlight0', action DEL
 03.851 [18782 main T] Processing node deletion /ietf-interfaces:interfaces/interface[name='czechlight0']/ietf-ip:ipv4/address[ip='192.0.2.1']
 ...
 04.055 [18782 system T] Netlink update on address of link 'czechlight0', action DEL
 04.055 [18782 main T] Processing node deletion /ietf-interfaces:interfaces/interface[name='czechlight0']/ietf-ip:ipv6/address[ip='::ffff:192.0.2.1']
 ...
 04.147 [18782 main T] Processing node deletion /ietf-interfaces:interfaces/interface[name='czechlight0']/ietf-ip:ipv6/address[ip='fe80::2:2ff:fe02:202']

That's unfortunately happening concurrently with the request for data
from the test's main thread:

 04.105 [18643 main E] dataFrom /ietf-interfaces:interfaces/interface[name='czechlight0']

Handling of this request also involves the statistics&neighbors extra
thread (created by sysrepo for the on-demand, pull subscription):

 04.119 [18643 sysrepo I] Published event "operational get" "/ietf-interfaces:interfaces/interface/statistics" with ID 5.
 04.121 [18783 sysrepo I] Processing "ietf-interfaces" "operational get" event with ID 5.
 04.124 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='czechlight0']/statistics/in-discards -> 0
 ...

I don't know if this is relevant or not (if this thread locks some libnl
mutex, then it has a negative impact on the test, OTOH if it only "takes
time", it actually might get the libnl cache thread 18782 more time to
process the pending events). The point is, it's still too slow:

 04.137 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='lo']/statistics/out-octets -> 0
 04.138 [18783 sysrepo I] Successful processing of "operational get" event with ID 5.
 04.138 [18643 sysrepo I] Event "operational get" with ID 5 succeeded.
 04.143 [18643 sysrepo I] Published event "operational get" "/ietf-interfaces:interfaces/interface/ietf-ip:ipv4/neighbor" with ID 3.
 04.144 [18783 sysrepo I] Processing "ietf-interfaces" "operational get" event with ID 3.
 04.145 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='eth1']/ietf-ip:ipv4/neighbor[ip='172.16.200.1']/link-layer-address -> fa:16:3e:72:a8:30
 04.146 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='eth1']/ietf-ip:ipv4/neighbor[ip='172.16.200.2']/link-layer-address -> fa:16:3e:6f:50:1b
 04.147 [18783 sysrepo I] Successful processing of "operational get" event with ID 3.
 04.147 [18782 system T] Netlink update on address of link 'czechlight0', action DEL
 04.147 [18643 sysrepo I] Event "operational get" with ID 3 succeeded.
 04.147 [18782 main T] Processing node deletion /ietf-interfaces:interfaces/interface[name='czechlight0']/ietf-ip:ipv6/address[ip='fe80::2:2ff:fe02:202']
 04.151 [18643 sysrepo I] Published event "operational get" "/ietf-interfaces:interfaces/interface/ietf-ip:ipv6/neighbor" with ID 3.
 04.152 [18783 sysrepo I] Processing "ietf-interfaces" "operational get" event with ID 3.
 04.153 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='eth0']/ietf-ip:ipv6/neighbor[ip='fe80::8ad9:8f00:3275:1e60']/link-layer-address -> 88:d9:8f:75:1e:60
 04.154 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='eth0']/ietf-ip:ipv6/neighbor[ip='fe80::f21c:2d00:32f0:6e00']/link-layer-address -> f0:1c:2d:f0:6e:00
 04.154 [18783 main T] Processing node update /ietf-interfaces:interfaces/interface[name='eth0']/ietf-ip:ipv6/neighbor[ip='fe80::f816:3eff:fe21:6d46']/link-layer-address -> fa:16:3e:21:6d:46
 04.155 [18783 sysrepo I] Successful processing of "operational get" event with ID 3.
 04.155 [18643 sysrepo I] Event "operational get" with ID 3 succeeded.
 04.162 [18782 main T] Processing node update /ietf-routing:routing/ribs/rib[name='ipv4-master']/routes/route[1]/ietf-ipv4-unicast-routing:destination-prefix -> 0.0.0.0/0

So TL;DR: this test failure is indeed due to a too short of a delay
between flushing the addresses and asking for the updated state.

Change-Id: Ic920ac16288d2dc10c18660e2b0388995673f188

tests/sysrepo_system-ietfinterfaces-sudo.cpp[diff]

1 file changed

tree: 5ae18adf251fc77094e68cc5abb635655ff43ab2

README.md

YANG System management for embedded devices running Linux

Together with sysrepo, this software provides "general system management" of embedded devices. The target platform is anything that runs Linux with systemd. This runs in production on CzechLight SDN DWDM devices.

Health tracking

This component tracks the overal health state of the system, including various sensors, or the state of systemd units. As an operator-friendly LED at the front panel of the appliance shows the aggregated health state.

System management

Firmware can be updated via RAUC, and various aspects of the system's configuration can be adjusted. This includes a firewall, basic network settings, and authentication management.

Supported YANG models

For a full list, consult the yang/ directory in this repository.

ietf-access-control-list, RFC 8519 (with deviations)
ietf-hardware, RFC 8348
ietf-system, RFC 7317 (partial support)
ietf-interfaces, RFC 8343 (generating config for systemd-networkd, with extensions)
ietf-routing, RFC 8349 (see above)
czechlight-system