Leap/SLES 15.5 – strange compatibility problem between tcpdump, libpcap and arping from iputils

My readers know that I presently work again with virtual networks. A part of my studies is related to ARP and routes on veth devices with VLAN-interfaces. I follow the packet transfer across VLANs with tcpdump, which itself depends on libpcap. On Leap 15 the relevant package is: libpcap1. ARP commands were generated by the arping command, which gets available after an installation of the package “iputils“.

This worked perfectly on a laptop. I know for a presentation had to use another system with the same kernelversion and virtual networking support. There I got strange messages for ARP packets passing a veth endpoint’s main device (in my example: veth2V) on their way to a VLAN interface (“veth2V.30) on the same veth endpoint:

netns2:~ # tcpdump -n -e -i any -v
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes

15:15:36.334692 veth2V B   ifindex 2 46:b9:81:00:00:1e ethertype ARP (0x0806), length 52: Unknown Hardware (36461) (len 0), Unknown Protocol (0x0000) (len 1), Unknown (2048) 
        0x0000:  8e6d 0000 0001 0800 0604 0001 46b9 8b4b  .m..........F..K
        0x0010:  8e6d c0a8 0501 ffff ffff ffff c0a8 0502  .m..............
15:15:36.334692 veth2V.30 B   ifindex 4 46:b9:8b:4b:8e:6d ethertype ARP (0x0806), length 48: Ethernet (len 6), IPv4 (len 4), Request who-has (ff:ff:ff:ff:ff:ff) tell, length 28
15:15:36.334720 veth2V.30 Out ifindex 4 d2:85:39:c8:43:fc ethertype ARP (0x0806), length 48: Ethernet (len 6), IPv4 (len 4), Reply is-at d2:85:39:c8:43:fc, length 28
15:15:36.334721 veth2V Out ifindex 2 d2:85:81:00:00:1e ethertype ARP (0x0806), length 52: Unknown Hardware (17404) (len 0), Unknown Protocol (0x0000) (len 1), Unknown (2048) 
        0x0000:  43fc 0000 0001 0800 0604 0002 d285 39c8  C.............9.
        0x0010:  43fc c0a8 0502 46b9 8b4b 8e6d c0a8 0501  C.....F..K.m....

I had never seen similar messages in comparable experiments with veths before. And these messages about “Unknown Hardware”. In addition the length of the Ethernet packets were wrong. I did not get such errors on my laptop where I had prepared the setups of the virtual VLANs.

It took me some time to find the difference between the systems: iputils as well as tcpdump on both systems came from the Network:Utilites-repository
https://download.opensuse.org/ repositories/ network: /utilities/ 15.5/“.

However, libpcap1 on the presentation system came from the main SLES 15.5 OSS repository in version 1.10.1-150400.1.7. On my laptop I had instead fetched the library from the Network:Utilites-repository, too.

Changing libpcap1 to the present version 1.10.4-lp155.92.1 from the Network:Utilites-repository led to correct tcpdump information:

netns2:~ # tcpdump -n -e -i any -v
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes

15:38:44.899645 veth2V B   ifindex 2 f2:5b:23:ba:16:8a ethertype ARP (0x0806), length 48: Ethernet (len 6), IPv4 (len 4), Request who-has (ff:ff:ff:ff:ff:ff) tell, length 28
15:38:44.899645 veth2V.30 B   ifindex 4 f2:5b:23:ba:16:8a ethertype ARP (0x0806), length 48: Ethernet (len 6), IPv4 (len 4), Request who-has (ff:ff:ff:ff:ff:ff) tell, length 28
15:38:44.899667 veth2V.30 Out ifindex 4 9a:88:24:0c:f9:99 ethertype ARP (0x0806), length 48: Ethernet (len 6), IPv4 (len 4), Reply is-at 9a:88:24:0c:f9:99, length 28
15:38:44.899669 veth2V Out ifindex 2 9a:88:24:0c:f9:99 ethertype ARP (0x0806), length 48: Ethernet (len 6), IPv4 (len 4), Reply is-at 9a:88:24:0c:f9:99, length 28

Interestingly, even if one changed both tcpdump, iputils and libpcap1 to the main Leap /SLES 15.5 repository the problem would come up, too.

So, there seems to be something severely wrong with libpcap1 of the main Leap /SLES 15.5 repositories.


Opensuse Leap 15.5 – installation of CUDA 12.3 for Machine Learning

Working with Machine Learning and Deep Neural Networks not only requires GPU drivers, but in case of Nvidia GPUs also the installation of CUDA and cuDNN. This process is always a bit tricky as additional environment variables have to be set for IPython-based Jupyterlab or classic Jupyter Notebook. On an Opensuse system one must in addition take care of the right settings in /etc/alternatives.

I have described the necessary steps in a post at “machine-learning.anracom.com“.

I hope this helps people who want to use Leap 15.5 for Machine Learning with Nvidia GPUs, Keras/Tensorflow 2 and Jupyterlab.

Important addendum 01/27/2024:
Although the combination of CUDA 12.3, cuDNN 8.9.7, Tensorflow 2.15 and Nvidia drivers 545.29.06 works regarding AI-models, there is another major problem:
Nvidia’s driver 545.29.06 is buggy – at least for Leap 15.5, KDE/Plasma with multiple screens. The bug affects Suspend-to-RAM. Suspend-to-RAM seems to work in the suspend phase, and the system also comes up afterward in a seemingly proper state of your KDE/Plasma interface (on your screens).

However, the problems begin when you want to change to another virtual screen via Ctrl-Alt-Fx. You wait and wait and wait … The same for changing the run-level or systemd target state or when you want to shut the system down. This makes Suspend-to-RAM with driver 545.29.06 impossible to use.

If you have a working older Nvidia driver (e.g. a stable 535 version) do not change to 545.29.06. Unfortunately, it is a mess on a multiscreen Leap 15.5 system to return to an older driver version. The Nvidia community repository does not offer you a choice. (Why by the way ????). Downloading an older proprietary driver from Nvidia and trying to install it afterward on a console terminal (after having stopped X11 or Wayland) did not work in my case – the screens displaying the terminal changed their resolution and froze afterward. So, you may have to completely uninstall the present driver 545 completely, go back to standard VGA and then try to install an older driver via Nvidias install mechanism. As I said: It is a mess …