My present posts on veth-devices and Linux network namespaces continue a post-series I have written in 2017 about virtual networking and veth-devices. The last two posts of the present series
More fun with veth, network namespaces, VLANs – IV – L2-segments, same IP-subnet, ARP and routes
were a kind of excursion:
We studied the effects of routes on ARP-requests in network namespaces where two or more LAN-segments terminated – with all IPs in both segments belonging to one and the same IP-subnet. So far, the considered segments in this series were standard L2-segments, not VLAN-segments.
Despite being instructive, scenarios with standard L2-segments are a bit academic. However, they get a more practical meaning when you replace the virtual L2-segments by virtual VLAN-segments. The configuration of virtual VLAN-scenarios based on Linux veth- and bridge-devices is our eventual objective. In forthcoming posts we will consider virtual VLANs realized with the help of veth-subdevices. Such VLANs follow the IEEE 802.1q standard for the Link layer.
In the present post I first want to remind you of basic VLAN settings for veth devices. Afterward we will connect two network namespaces by a veth-based VLAN-segment. Each of the namespaces will get exactly one unique IP-address. Both IP-addresses will belong to the same IP-subnet. Even such a very simple configuration comes with a potential ambiguity for packet emission. The Linux kernel therefore must make some decisions.
We will investigate the impact of route definitions in the namespaces on the emission of tagged and untagged ARP-requests and ARP-replies. All in all we will cover 36 different route configurations.
The result for our simple VLAN-configuration will be that routes determine which of the interfaces of a veth peer device is used to send an ARP-request – if the interface is not specified otherwise. Thereby the route decides whether the ARP-request gets a VLAN-tag or not. Route definitions in our setup also decide whether a reply to an arriving ARP-request is created or not. Furthermore, we will see that we may have to deactivate ARP on the veth trunk interface to prevent the exchange of untagged packets. These results must, however, be verified in other more complex configurations.
VLAN related sub-devices of a veth peer device
A veth connection has two endpoints, also called peer devices. I will use the terms “veth endpoint”, “veth peer device” or simply “veth-peer” as synonyms when I refer to one of the two virtual network devices making up a veth connection. When I speak of the “veth device” I refer to the whole connection arrangement, i.e. the connection line and both peer devices.
In an old post I have already described what we can do to create a VLAN-interface of a veth-endpoint. We take a veth peer device, create a sub-device of type “vlan” and add a VID to it.
Let us name the two endpoints of an example veth-device veth1 and veth2. Then supplementing veth1 with a virtual VLAN-interface requires a command sequence like
netns1:~ # ip link add link veth1 name veth1.10 type vlan id 10 netns1:~ # ip link set veth1 up netns1:~ # ip link set veth1.10 up
By repeating these commands for further VLAN IDds [VIDs] veth1 will get multiple distinct interfaces for different VLANs (see a graphics below for two subdevices). Packets moving from VLAN-related veth sub-devices in the direction of the veth connection line will get a tag corresponding to the VID. The peer device is LAN-aware in the sense that a VLAN-related sub-device only receives and handles packets which arrived at the peer device with a tag fitting its defining VID. A packet with a tag not fitting to any of the VLAN sub-devices is dropped.
Below I will often use the term “VLAN-interface” of a veth-endpoint when I refer to a peer’s sub-device of type “vlan”. The idea behind it is that I abstract from the complexity of the interaction between a sub-device and the main veth peer device. I regard the whole thing as a unit having multiple interfaces. Some interfaces (namely those related to sub-devices of type vlan) trigger a tagging process such that we get tagged packets of Ethertype 802.1Q (0x8100) passing through the veth-endpoint device into the connection line connecting the two veth peer devices.
Addendum, 22.03.2024:
Actually, one should not assume that packets entering a veth-peer’s VLAN-interface from a namespace or packets leaving via a VLAN-interface into a namespace have a tag. The interface just defines a certain “access port” to the veth peer device which leads to tagging at packet ingress. Tagging occurs somewhere during the transfer between sub- and main-device. We need not care about the details. Neither should you assume that a packet leaving a veth-peer’s VLAN-interface to a namespace or a bridge is tagged. We will actually see (with the help of tcpdump) that originally tagged packets which entered a peer device through the veth connection line have their tags stripped off when they pass the VLAN-interface into the surrounding namespace.
The following drawing shows schematically whats going on:
Why is this important? Because it will later make a substantial difference whether we add a veth VLAN-interface to a Linux bridge or whether we add a veth-peer device without any VLAN-interfaces to a Linux bridge.
“Trunk interface” of a veth-endpoint
Even if we have defined VLAN-interfaces of a veth-peer device we can always send a packet directly through the original interface of the peer device. veth1 remains a fully capable (virtual) Ethernet device independent of the fact that we have added some sub-devices. I will call the original interface of the peer device, e.g. veth1, the trunk interface of a veth-endpoint.
The trunk interface will transport tagged packets, which arrived via subdevices, to the other peer device of the veth-connection. But, as we will see, the trunk interface itself is also able to emit and receive untagged packets. This happens when either an arping option or a route specifies this particular interface (see below).
We refer to a general interface of a veth-endpoint (including the trunk) below by the abbreviation IF.
IP-assignment to veth-subdevices – the standard option and resulting ambiguities
When we created two sub-devices (with different VIDs) of a veth-endpoint, then we actually have two choices regarding the assignment of IP addresses. I first discuss the standard option, which assigns just one IP-address to all interfaces of a veth peer device:
Option 1: Standard option
This option should in my opinion the preferred way to handle VLANs with veth. By using it we assign an IP-address to the veth peer device. A proper command would be
netns1:~ # ip addr add 192.168.5.0/24 dev veth1
Note that this leads to an automatically created general route to 192.168.5.0/24 via veth1. This route may have to be changed to enforce packet creation via a tagging sub-device.
The important point is: This IP-address will also be used when Ethernet packets with VLAN-tags are created, i.e. when a VLAN-specific sub-interface is selected to address certain destinations.
The veth endpoint supplies all required information to the headers of the Ethernet-packets, to ARP-sections or enclosed IP-packets – according to which of its interfaces was chosen.
Ambiguities
Obviously, this option comes with an ambiguity regarding the choice of an interface for packets to remote destinations. The key question is:
Which interface of the veth peer device should be used to create and send packets to a destination characterized by an IP and/or a MAC? The options are: the trunk interface for untagged packets or a VLAN-interface.
This question has to be answered both for ARP-packets and higher level protocols. It has to be answered even in situations with deactivated forwarding in the namespace.
To fully understand the requirement to resolve the ambiguity by the Linux kernel the following information may be helpful:
The MAC addresses of all interfaces of a veth peer device are identical.
The result of the “ip a” comand for a veth peer device with three sub-interfaces typically would look like this:
netnsx:~ # ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: vethx@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 8e:79:71:5e:92:a3 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.5.1/24 brd 192.168.5.255 scope global veth1 valid_lft forever preferred_lft forever 3: vethx.30@vethx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 8e:79:71:5e:92:a3 brd ff:ff:ff:ff:ff:ff 4: vethx.40@vethx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 8e:79:71:5e:92:a3 brd ff:ff:ff:ff:ff:ff 5: vethx.50@vethx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 8e:79:71:5e:92:a3 brd ff:ff:ff:ff:ff:ff netnsx:~ #
A veth-endpoint should therefore really be regarded as one unit with multiple interfaces [IFs] (including the trunk interface). The result of the ip-command above shows that the Linux kernel numbers the interfaces. While a IP/MAC/IF – combination is unique, a IP/MAC-combination in our configuration is not!
This underlines the necessity of giving the Linux kernel clear information about which interface to use in a communication with some network destination – including routers.