More fun with veth, network namespaces, VLANs – V – link two L2-segments of the same IP-subnet by a routing network namespace

Posted on 8. March 2024 by eremo

During the last two posts of this series

More fun with veth, network namespaces, VLANs – IV – L2-segments, same IP-subnet, ARP and routes

More fun with veth, network namespaces, VLANs – III – L2-segments of the same IP-subnet and routes in coupling network namespaces

we have studied a Linux network namespace with two attached L2-segments. All IPs were members of one and the same IP-subnet. Forwarding and Proxy ARP had been deactivated in this namespace.

So far, we have understood that routes have a decisive impact on the choice of the destination segment when ICMP- and ARP-requests are sent from a network namespace with multiple NICs – independent of forwarding being enabled or not. Insufficiently detailed routes can lead to problems and asymmetric arrival of replies from the segments – already on the ARP-level!

The obvious impact of routes on ARP-requests in our special scenario has surprised at least some readers, but I think remaining open questions have been answered in detail by the experiments discussed in the preceding post. We can now move on, on sufficiently solid ground.

We have also seen that even with detailed routes ARP- and ICMP-traffic paths to and from the L2-segments remain separated in our scenario (see the graphics below). The reason, of course, was that we had deactivated forwarding in the coupling namespace.

In this post we will study what happens when we activate forwarding. We will watch results of experiments both on the ICMP- and the ARP-level. Our objective is to link our otherwise separate L2-segments (with all their IPs in the same IP-subnet) seamlessly by a forwarding network namespace – and thus form some kind of larger segment. And we will test in what way Proxy ARP will help us to achieve this objective.

Not just fun …

Now, you could argue that no reasonable admin would link two virtual segments with IPs in the same IP-subnet by a routing namespace. One would use a virtual bridge. First answer: We perform virtual network experiments here for fun … Second answer: Its not just fun ..

Our eventual objective is the configuration of virtual VLAN configurations and related security measures. Of particular interest are routing namespaces where two tagging VLANs terminate and communicate with a third LAN-segment, the latter leading to an Internet connection. The present experiments with standard segments are only a first step in this direction.

When we imagine a replacement of the standard segments by tagged VLAN segments we already get the impression that we could use a common namespace for the administration of VLANs without accidentally mixing or transferring ICMP- and ARP-traffic between the VLANs. But the results in the last two previous posts also gave us a clear warning to distinguish carefully between routing and forwarding in namespaces.

The modified scenario – linking two L2-segments by a forwarding namespace

Let us have a look at a sketch of our scenario first:

We see our segments S1 and S2 again. All IPs are memebers of 192.168.5.0/24. The segments are attached to a common network namespace netnsR. The difference to previous scenarios in this post series lies in the activated forwarding and the definition of detailed routes in netnsR for the NICs with IPs of the same C-class IP-subnet.

Our experiments below will look at the effect of default gateway definitions and at the requirement of detailed routes in the L2-segments’ namespaces. In addition we will also test in what way enabling Proxy ARP in netnsR can help to achieve seamless segment coupling in an efficient centralized way.

Continue reading →

Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – VI

Posted on 28. November 2017 by eremo

I continue my excursion into virtual networking based on network namespaces, veth devices, Linux bridges and virtual VLANs.

Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – I
[Commands to create and enter (unnamed) network namespaces via shell processes]
Fun with …. – II [Suggested experiments for virtual networking between network namespaces/containers]
Fun with … – III[Connecting network namespaces (or containers) by veth devices and virtual Linux bridges]
Fun with … – IV[Virtual VLANs for network namespaces (or containers) and VLAN tagging at Linux bridge ports based on veth (sub-) interfaces]
Fun with … – V[Creation of two virtual VLANs for 2 groups of network namespaces/containers by a Linux bridge]

Although we worked with Linux network namespaces only, the basic setups, commands and rules discussed so far are applicable for the network connection of (LXC) containers, too. Reason: Each container establishes (at least) its own network namespace – and the latter is where the container’s network devices operate. So, at its core a test of virtual networking between the containers means a test of networking between different network namespaces with appropriate (virtual) devices. We do not always require full fledged containers; often the creation of network namespaces with proper virtual Ethernet devices is sufficient to check the functionality of a virtual network and e.g. packet filter rules for its devices.

Virtual network connectivity (of containers) typically depends on veth devices and virtual bridges/switches. In this post we look at virtual VLANs spanning 2 bridges.

Our achievements so far

We know already the Linux commands required to create and enter simple (unnamed) network namespaces and give them individual hostnames. We connected these namespaces directly with veth devices and with the help of a virtual Linux bridge. But namespaces/containers can also be arranged in groups participating in a separate isolated network environment – a VLAN. We saw that the core setup of virtual VLANs can be achieved just by configuring virtual Linux bridges appropriately: We define one or multiple VLANs by assigning VIDs/PVIDs to Linux bridge ports. The VLAN is established inside the bridge by controlling packet transport between ports. Packet tagging outside a bridge is not required for the creation of simple coexisting VLANs.

However, the rules governing the corresponding packet tagging at bridge ports depend on the port type: We, therefore, listed up rules both for veth sub-interfaces and trunk interfaces attached to bridges – and, of course, for incoming and outgoing packets. The tagging rules discussed in post IV allow for different setups of more complex VLANs – sometimes there are several solutions with different advantages and disadvantages.

Our first example in the last post were two virtual VLANs defined by a Linux bridge. Can we extend this simple scenario such that the VLANs span several hosts and/or several bridges on the same host? Putting containers (and their network namespaces) into separate VLANs which integrate several hosts is no academic exercise: Even in small environments we may find situations, where containers have to be placed on different hosts with independent HW resources.

Simulating the connection of two hosts

In reality two hosts, each with its own Linux bridge for network namespaces (or containers), would be connected by real Ethernet cards, possibly with sub-interfaces, and a cable. Each Ethernet card (or their sub-interfaces) would be attached to the local bridge of each host. Veths give us the functionality of 2 Ethernet devices connected by a cable. In addition, one can split each veth interfaces into sub-interfaces (see the last post!). So we can simulate all kinds of host connections by bride connections on one and the same host. In our growing virtual test environment (see article 2) we construct the area encircled with the blue dotted line:

Different setups for the connection of two bridges

Actually, there are two different ways how to connect two virtual bridges: We can attach VLAN sensitive sub-interfaces of Ethernet devices to the bridges OR we can use the standard interfaces and build “trunk ports“.

Both variants work – the tagging of the Ethernet packets, however, occurs differently. The different ways of tagging become important in coming experiments with hosts belonging to 2 VLANs. (The differences, of course, also affect packet filter rules for the ports.) So, its instructive to cover both solutions.

Experiment 5.1 – Two virtual VLANs spanning two Linux bridges connected by (veth) Ethernet devices with sub-interfaces

We study the solution based on veth sub-interfaces first. Both virtual bridges shall establish two VLANs: “VLAN 1” (green) and “VLAN 2” (pink). Members of the green VLAN shall be able to communicate with each other, but not with members of the pink VLAN. And vice versa.

To enable such a solution our veth cable must transport packets tagged differently – namely according to their VLAN origin/destination. The following graphics displays the scenario in more detail:

PVID assignments to ports are indicated by dotted squares, VID assignments by squares with a solid border. Packets are symbolized by diamonds. The border color of the diamonds correspond to the tag color (VLAN ID).

Note that we also indicated some results of our tests of “experiment 4” in the last post:

At Linux bridge ports, which are based on sub-interfaces and which got a PVID assigned, any outside packet tags are irrelevant for the tagging inside the bridge. Inside the bridge a packet gets a tag according to the PVID of the port through which the packet enters the bridge!

If we accept this rule then we should be able to assign tags (VLAN IDs) to packets moving through the veth cable different from the tags used inside the bridges. Actually, we should even be able to use altogether different VIDs/PVIDs inside the second bridge, too, as long as we separate the namespace groups correctly. But let us start simple …

Creating the network namespaces, Linux bridges and the veth sub-interfaces

The following command list sets up the environment including two bridges brx (in netns3) and bry (in netns8). Scroll to see all commands and copy it to a root shell prompt …

unshare --net --uts /bin/bash &
export pid_netns1=$!
nsenter -t $pid_netns1 -u hostname netns1
unshare --net --uts /bin/bash &
export pid_netns2=$!
unshare --net --uts /bin/bash &
export pid_netns3=$!
unshare --net --uts /bin/bash &
export pid_netns4=$!
unshare --net --uts /bin/bash &
export pid_netns5=$!
unshare --net --uts /bin/bash &
export pid_netns6=$!
unshare --net --uts /bin/bash &
export pid_netns7=$!
unshare --net --uts /bin/bash &
export pid_netns8=$!

# assign different hostnames  
nsenter -t $pid_netns1 -u hostname netns1
nsenter -t $pid_netns2 -u hostname netns2
nsenter -t $pid_netns3 -u hostname netns3
nsenter -t $pid_netns4 -u hostname netns4
nsenter -t $pid_netns5 -u hostname netns5
nsenter -t $pid_netns6 -u hostname netns6
nsenter -t $pid_netns7 -u hostname netns7
nsenter -t $pid_netns8 -u hostname netns8

#set up veth devices in netns1 to netns4 with connection to netns3  
ip link add veth11 netns $pid_netns1 type veth peer name veth13 netns $pid_netns3    
ip link add veth22 netns $pid_netns2 type veth peer name veth23 netns $pid_netns3
ip link add veth44 netns $pid_netns4 type veth peer name veth43 netns $pid_netns3
ip link add veth55 netns $pid_netns5 type veth peer name veth53 netns $pid_netns3

#set up veth devices in netns6 and netns7 with connection to netns8   
ip link add veth66 netns $pid_netns6 type veth peer name veth68 netns $pid_netns8
ip link add veth77 netns $pid_netns7 type veth peer name veth78 netns $pid_netns8    

# Assign IP addresses and set the devices up 
nsenter -t $pid_netns1 -u -n /bin/bash
ip addr add 192.168.5.1/24 brd 192.168.5.255 dev veth11
ip link set veth11 up
ip link set lo up
exit
nsenter -t $pid_netns2 -u -n /bin/bash
ip addr add 192.168.5.2/24 brd 192.168.5.255 dev veth22
ip link set veth22 up
ip link set lo up
exit
nsenter -t $pid_netns4 -u -n /bin/bash
ip addr add 192.168.5.4/24 brd 192.168.5.255 dev veth44
ip link set veth44 up
ip link set lo up
exit
nsenter -t $pid_netns5 -u -n /bin/bash
ip addr add 192.168.5.5/24 brd 192.168.5.255 dev veth55
ip link set veth55 up
ip link set lo up
exit
nsenter -t $pid_netns6 -u -n /bin/bash
ip addr add 192.168.5.6/24 brd 192.168.5.255 dev veth66
ip link set veth66 up
ip link set lo up
exit
nsenter -t $pid_netns7 -u -n /bin/bash
ip addr add 192.168.5.7/24 brd 192.168.5.255 dev veth77
ip link set veth77 up
ip link set lo up
exit

# set up bridge brx and its ports 
nsenter -t $pid_netns3 -u -n /bin/bash
brctl addbr brx  
ip link set brx up
ip link set veth13 up
ip link set veth23 up
ip link set veth43 up
ip link set veth53 up
brctl addif brx veth13
brctl addif brx veth23
brctl addif brx veth43
brctl addif brx veth53
exit

# set up bridge bry and its ports 
nsenter -t $pid_netns8 -u -n /bin/bash
brctl addbr bry  
ip link set bry up
ip link set veth68 up
ip link set veth78 up
brctl addif bry veth68
brctl addif bry veth78
exit

Set up the VLANs

The following commands configure the VLANs by assigning PVIDs/VIDs to the bridge ports (see the last 2 posts for more information):

# set up 2 VLANs on each bridge 
nsenter -t $pid_netns3 -u -n /bin/bash
ip link set dev brx type bridge vlan_filtering 1   
bridge vlan add vid 10 pvid untagged dev veth13
bridge vlan add vid 10 pvid untagged dev 
veth23
bridge vlan add vid 20 pvid untagged dev veth43
bridge vlan add vid 20 pvid untagged dev veth53
bridge vlan del vid 1 dev brx self
bridge vlan del vid 1 dev veth13
bridge vlan del vid 1 dev veth23
bridge vlan del vid 1 dev veth43
bridge vlan del vid 1 dev veth53
bridge vlan show
exit
nsenter -t $pid_netns8 -u -n /bin/bash
ip link set dev bry type bridge vlan_filtering 1   
bridge vlan add vid 10 pvid untagged dev veth68
bridge vlan add vid 20 pvid untagged dev veth78
bridge vlan del vid 1 dev bry self
bridge vlan del vid 1 dev veth68
bridge vlan del vid 1 dev veth78
bridge vlan show
exit

We have a whole bunch of network namespaces now. Use “lsns” to get an overview. See the first 2 articles of the series, if you need an explanation of the commands used above and additional commands to get more information about the created namespaces and processes.

Note that we used VID 10, PVID 10 on the bridge ports to establish VLAN1 (green) and VID 20, PVID 20 to establish VLAN2 (pink). Note in addition that there is NO VLAN tagging required outside the bridges; thus the flag “untagged” to enforce Ethernet packets to leave the bridges untagged. Consistently, no sub-interfaces have been defined in the network namespace 1, 2, 4, 5, 6, 7. Note also, that we removed the PVID/VID = 1 default values from the ports.

The bridges are not connected, yet. Therefore, our next step is to create a connecting veth device with VLAN sub-interfaces – and to attach the sub-interfaces to the bridges :

# Create a veth device to connect the two bridges 
ip link add vethx netns $pid_netns3 type veth peer name vethy netns $pid_netns8    
nsenter -t $pid_netns3 -u -n /bin/bash
ip link add link vethx name vethx.50 type vlan id 50   
ip link add link vethx name vethx.60 type vlan id 60
brctl addif brx vethx.50
brctl addif brx vethx.60
ip link set vethx up
ip link set vethx.50 up
ip link set vethx.60 up
bridge vlan add vid 10 pvid untagged dev vethx.50
bridge vlan add vid 20 pvid untagged dev vethx.60
bridge vlan del vid 1 dev vethx.50
bridge vlan del vid 1 dev vethx.60
bridge vlan show
exit

nsenter -t $pid_netns8 -u -n /bin/bash
ip link add link vethy name vethy.50 type vlan id 50
ip link add link vethy name vethy.60 type vlan id 60
brctl addif bry vethy.50
brctl addif bry vethy.60
ip link set vethy up
ip link set vethy.50 up
ip link set vethy.60 up
bridge vlan add vid 10 pvid untagged dev vethy.50
bridge vlan add vid 20 pvid untagged dev vethy.60
bridge vlan del vid 1 dev vethy.50
bridge vlan del vid 1 dev vethy.60
bridge vlan show
exit

Note that we have used VLAN IDs 50 and 60 outside the bridge! Note also the VID/PVID settings and the flag “untagged” at our bridge ports vethx.50, vethx.60, vethy.50, vethy.60. The bridge internal tags of outgoing packets are first removed; afterwards the veth sub-interfaces re-tag outgoing packets automatically with tags for VLAN IDs 50,60.

However, we have kept up consistent tagging histories for packets propagating between the bridges and along the vethx/vethy line:

“10=>50=>10”

and

“20=>60=>20”

So, Ethernet packets nowhere cross the borders of our separated VLANs – if our theory works correctly.

Routing? 2 or 4 VLANs?

Routes for 192.168.5.0/24 were set up automatically in the network namespaces netns1, 2, 4, 5, 6, 7. You may check this by entering the namespaces with a shell (nsenter command) and using the command “route“.

Note that we have chosen all IP address to be in the same class. All our virtual devices work on the network link layer (L1/2 of the OSI model). Further IP routing across the bridges is not required on this level. The correct association of IP addresses and MAC addresses across the bridges and all VLANs is instead managed by the ARP protocol.

Our network namespaces should be able to get into contact – as long as they belong to the “same” VLAN.

Note: Each bridge sets up its own 2 VLANs; so, actually, we have built 4 VLANs!. But the bridges are connected in such a way that packet transport works across these 4 VLANs as if they were only two VLANs spanning the bridges.

Tests

We first test whether netns7 can communicate with e.g. netns5, which it should. On the other side netns7 should not be able to ping e.g. netns1. It is instructive to open several terminal windows from our original terminal (on KDE e.g. by “konsole &>/dev/null &”) and to enter different namespaces there to get an impression of what happens.

mytux:~ # nsenter -t $pid_netns7 -u -n /bin/bash
netns7:~ # ping 192.168.5.1 -c2
PING 192.168.5.1 (192.168.5.1) 56(84) bytes of data.
From 192.168.5.7 icmp_seq=1 Destination Host Unreachable
From 192.168.5.7 icmp_seq=2 Destination Host Unreachable

--- 192.168.5.1 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1008ms   
pipe 2
netns7:~ # ping 192.168.5.5 -c2
PING 192.168.5.5 (192.168.5.5) 56(84) bytes of data.
64 bytes from 192.168.5.5: icmp_seq=1 ttl=64 time=0.170 ms
64 bytes from 192.168.5.5: icmp_seq=2 ttl=64 time=0.087 ms

--- 192.168.5.5 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.087/0.128/0.170/0.043 ms
netns7:~ #

And at the same time inside bry in netns8 :

mytux:~ # nsenter -t $pid_netns8 -u -n /bin/bash
netns8:~ # tcpdump -n -i bry  host 192.168.5.1 -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bry, link-type EN10MB (Ethernet), capture size 262144 bytes
14:38:48.780367 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.7, length 28   
14:38:49.778559 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.7, length 28
14:38:50.778574 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.7, length 28    
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
netns8:~ # tcpdump -n -i bry  host 192.168.5.5 -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bry, link-type EN10MB (Ethernet), capture size 262144 bytes
14:39:30.045117 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Request who-has 192.168.5.5 tell 192.168.5.7, length 28
14:39:30.045184 2e:75:26:04:a9:70 > 8a:1e:62:e8:f3:c3, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Reply 192.168.5.5 is-at 2e:75:26:04:a9:70, length 28
14:39:30.045193 8a:1e:62:e8:f3:c3 > 2e:75:26:04:a9:70, ethertype 802.1Q (0x8100), length 102: vlan 20, p 0, ethertype IPv4, 192.168.5.7 > 192.168.5.5: ICMP echo request, id 21633, seq 1, length 64    
14:39:30.045247 2e:75:26:04:a9:70 > 8a:1e:62:e8:f3:c3, ethertype 802.1Q (0x8100), length 102: vlan 20, p 0, ethertype IPv4, 192.168.5.5 > 192.168.5.7: ICMP echo reply, id 21633, seq 1, length 64   
14:39:31.044106 8a:1e:62:e8:f3:c3 > 2e:75:26:04:a9:70, ethertype 802.1Q (0x8100), length 102: vlan 20, p 0, ethertype IPv4, 192.168.5.7 > 192.168.5.5: ICMP echo request, id 21633, seq 2, length 64   
14:39:31.044165 2e:75:26:04:a9:70 > 8a:1e:62:e8:f3:c3, ethertype 802.1Q (0x8100), length 102: vlan 20, p 0, ethertype IPv4, 192.168.5.5 > 192.168.5.7: ICMP echo reply, id 21633, seq 2, length 64  
14:39:35.058576 2e:75:26:04:a9:70 > 8a:1e:62:e8:f3:c3, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Request who-has 192.168.5.7 tell 192.168.5.5, length 28
14:39:35.058587 8a:1e:62:e8:f3:c3 > 2e:75:26:04:a9:70, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Reply 192.168.5.7 is-at 8a:1e:62:e8:f3:c3, length 28
^C
8 packets captured
8 packets received by filter
0 packets dropped by kernel
netns8:~ #

And parallel at vethx in netns3 :

mytux:~ # nsenter -t $pid_netns3 -u -n /bin/bash
netns3:~ # tcpdump -n -i vethx  host 192.168.5.1 -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vethx, link-type EN10MB (Ethernet), capture size 262144 bytes
14:38:48.780381 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 60, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.7, length 28
14:38:49.778582 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 60, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.7, length 28
14:38:50.778594 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 60, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.7, length 28
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
netns3:~ # tcpdump -n -i vethx  host 192.168.5.5 -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vethx, link-type EN10MB (Ethernet), capture size 262144 bytes
14:39:30.045131 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 60, p 0, ethertype ARP, Request who-has 192.168.5.5 tell 192.168.5.7, length 28
14:39:30.045182 2e:75:26:04:a9:70 > 8a:1e:62:e8:f3:c3, ethertype 802.1Q (0x8100), length 46: vlan 60, p 0, ethertype ARP, Reply 192.168.5.5 is-at 2e:75:26:04:a9:70, length 28
14:39:30.045210 8a:1e:62:e8:f3:c3 > 2e:75:26:04:a9:70, ethertype 802.1Q (0x8100), length 102: vlan 60, p 0, ethertype IPv4, 192.168.5.7 > 192.168.5.5: ICMP echo request, id 21633, seq 1, length 64   
14:39:30.045246 2e:75:26:04:a9:70 > 8a:1e:62:e8:f3:c3, ethertype 802.1Q (0x8100), length 102: vlan 60, p 0, ethertype IPv4, 192.168.5.5 > 192.168.5.7: ICMP echo reply, id 21633, seq 1, length 64
14:39:31.044123 8a:1e:62:e8:f3:c3 > 2e:75:26:04:a9:70, ethertype 802.1Q (0x8100), length 102: vlan 60, p 0, ethertype IPv4, 192.168.5.7 > 192.168.5.5: ICMP echo request, id 21633, seq 2, length 64    
14:39:31.044163 2e:75:26:04:a9:70 > 8a:1e:62:e8:f3:c3, ethertype 802.1Q (0x8100), length 102: vlan 60, p 0, ethertype IPv4, 192.168.5.5 > 192.168.5.7: ICMP echo reply, id 21633, seq 2, length 64   
14:39:35.058573 2e:75:26:04:a9:70 > 8a:1e:62:e8:f3:c3, ethertype 802.1Q (0x8100), length 46: vlan 60, p 0, ethertype ARP, Request who-has 192.168.5.7 tell 192.168.5.5, length 28
14:39:35.058589 8a:1e:62:e8:f3:c3 > 2e:75:26:04:a9:70, ethertype 802.1Q (0x8100), length 46: vlan 60, p 0, ethertype ARP, Reply 192.168.5.7 is-at 8a:1e:62:e8:f3:c3, length 28
^C
8 packets captured
8 packets received by filter
0 packets dropped by kernel
netns3:~ #

How does netns7 see the world afterwards?

netns7:~ # ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
 
      valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: veth77@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000      
    link/ether 8a:1e:62:e8:f3:c3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.5.7/24 brd 192.168.5.255 scope global veth77
       valid_lft forever preferred_lft forever
    inet6 fe80::881e:62ff:fee8:f3c3/64 scope link 
       valid_lft forever preferred_lft forever
netns7:~ # arp -a
? (192.168.5.1) at <incomplete> on veth77
? (192.168.5.5) at 2e:75:26:04:a9:70 [ether] on veth77    
netns7:~ #

We have a mirrored situation on netns6 with respect to netns1 and netns5. netns6 can reach netns1, but not netns5.

These results prove what we have claimed:

We have a separation of the VLANs across the bridges.
Inside the bridges only the ports’ PVID-settings determine the VLAN tag (here 20) of incoming packets.
Along the veth “cable” we have a completely different tag (here 60 for packets which originally got tag 20 inside bry).

Let us cross check for netns2:

mytux:~ # nsenter -t $pid_netns2 -u -n /bin/bash
netns2:~ # ping 192.168.5.7 -c2
PING 192.168.5.7 (192.168.5.7) 56(84) bytes of data.
From 192.168.5.2 icmp_seq=1 Destination Host Unreachable
From 192.168.5.2 icmp_seq=2 Destination Host Unreachable

--- 192.168.5.7 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 999ms
pipe 2
netns2:~ # ping 192.168.5.6 -c2
PING 192.168.5.6 (192.168.5.6) 56(84) bytes of data.
64 bytes from 192.168.5.6: icmp_seq=1 ttl=64 time=0.154 ms
64 bytes from 192.168.5.6: icmp_seq=2 ttl=64 time=0.092 ms

--- 192.168.5.6 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.092/0.123/0.154/0.031 ms
netns2:~ #

And how do the bridges see the world?

In netns8 and netns3 we have a closer look at the bridges:

netns8:~ # ip a s
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth68: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master bry state UP group default qlen 1000
    link/ether 0a:5b:60:31:7a:bd brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::85b:60ff:fe31:7abd/64 scope link 
       valid_lft forever preferred_lft forever
3: veth78@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master bry state UP group default qlen 1000    
    link/ether 3e:f3:4b:26:02:46 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::3cf3:4bff:fe26:246/64 scope link 
       valid_lft forever preferred_lft forever
4: bry: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 0a:5b:60:31:7a:bd brd ff:ff:ff:ff:ff:ff
    inet6 fe80::30a5:8dff:fe54:987e/64 scope link 
       valid_lft forever preferred_lft forever
5: vethy@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 7a:86:31:14:57:2a brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::7886:31ff:fe14:572a/64 scope link 
       valid_lft forever preferred_lft forever
6: vethy.50@vethy: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master bry state UP group default qlen 1000   
    link/ether 7a:86:31:14:57:2a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7886:31ff:
fe14:572a/64 scope link 
       valid_lft forever preferred_lft forever
7: vethy.60@vethy: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master bry state UP group default qlen 1000  
    link/ether 7a:86:31:14:57:2a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7886:31ff:fe14:572a/64 scope link 
       valid_lft forever preferred_lft forever
netns8:~ # bridge vlan show
port    vlan ids
veth68   10 PVID Egress Untagged      
                                                                                                
veth78   20 PVID Egress Untagged                                        
                                                        
bry     None
vethy.50 10 PVID Egress Untagged

vethy.60 20 PVID Egress Untagged
netns8:~ # brctl showmacs bry 
port no mac addr                is local?       ageing timer   
  1     0a:5b:60:31:7a:bd       yes                0.00
  1     0a:5b:60:31:7a:bd       yes                0.00
  4     2e:75:26:04:a9:70       no                 3.62
  2     3e:f3:4b:26:02:46       yes                0.00
  2     3e:f3:4b:26:02:46       yes                0.00
  4     7a:86:31:14:57:2a       yes                0.00
  3     7a:86:31:14:57:2a       yes                0.00
  3     7a:86:31:14:57:2a       yes                0.00
  3     7a:86:31:14:57:2a       yes                0.00
  2     8a:1e:62:e8:f3:c3       no                 3.62

netns3:~ # ip a s
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master brx state UP group default qlen 1000
    link/ether 52:9b:43:56:37:df brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::509b:43ff:fe56:37df/64 scope link 
       valid_lft forever preferred_lft forever
3: veth23@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master brx state UP group default qlen 1000   
    link/ether 06:81:88:12:5d:dc brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::481:88ff:fe12:5ddc/64 scope link 
       valid_lft forever preferred_lft forever
4: veth43@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master brx state UP group default qlen 1000   
    link/ether 56:d6:b2:80:9a:de brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::54d6:b2ff:fe80:9ade/64 scope link 
       valid_lft forever preferred_lft forever
5: veth53@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master brx state UP group default qlen 1000   
    link/ether 12:58:a6:73:6c:6e brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::1058:a6ff:fe73:6c6e/64 scope link 
       valid_lft forever preferred_lft forever
6: brx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:81:88:12:5d:dc brd ff:ff:ff:ff:ff:ff
    inet6 fe80::8447:28ff:fe22:7a90/64 scope link 
       valid_lft forever preferred_lft forever
7: vethx@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether b6:e9:ef:3d:1c:b7 brd ff:ff:ff:ff:ff:ff link-netnsid 4
    inet6 fe80::b4e9:efff:fe3d:1cb7/64 scope link 
       valid_lft forever preferred_lft forever
8: vethx.50@vethx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master brx state 
UP group default qlen 1000
    link/ether b6:e9:ef:3d:1c:b7 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::b4e9:efff:fe3d:1cb7/64 scope link 
       valid_lft forever preferred_lft forever
9: vethx.60@vethx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master brx state UP group default qlen 1000   
    link/ether b6:e9:ef:3d:1c:b7 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::b4e9:efff:fe3d:1cb7/64 scope link 
       valid_lft forever preferred_lft forever
netns3:~ # bridge vlan show
port    vlan ids
veth13   10 PVID Egress Untagged

veth23   10 PVID Egress Untagged

veth43   20 PVID Egress Untagged

veth53   20 PVID Egress Untagged

brx     None       
vethx.50 10 PVID Egress Untagged   
                                                             
vethx.60 20 PVID Egress Untagged
netns3:~ # brctl showmacs brx
port no mac addr                is local?       ageing timer  
  2     06:81:88:12:5d:dc       yes                0.00
  2     06:81:88:12:5d:dc       yes                0.00
  4     12:58:a6:73:6c:6e       yes                0.00
  4     12:58:a6:73:6c:6e       yes                0.00
  4     2e:75:26:04:a9:70       no                 3.49
  1     52:9b:43:56:37:df       yes                0.00
  1     52:9b:43:56:37:df       yes                0.00
  3     56:d6:b2:80:9a:de       yes                0.00
  3     56:d6:b2:80:9a:de       yes                0.00
  6     8a:1e:62:e8:f3:c3       no                 3.49
  5     b6:e9:ef:3d:1c:b7       yes                0.00
  6     b6:e9:ef:3d:1c:b7       yes                0.00
  5     b6:e9:ef:3d:1c:b7       yes                0.00
  5     b6:e9:ef:3d:1c:b7       yes                0.00

And:

netns8:~ # brctl showmacs bry
port no mac addr                is local?       ageing timer    
  1     0a:5b:60:31:7a:bd       yes                0.00
  1     0a:5b:60:31:7a:bd       yes                0.00
  4     2e:75:26:04:a9:70       no                 7.37
  2     3e:f3:4b:26:02:46       yes                0.00
  2     3e:f3:4b:26:02:46       yes                0.00
  4     7a:86:31:14:57:2a       yes                0.00
  3     7a:86:31:14:57:2a       yes                0.00
  3     7a:86:31:14:57:2a       yes                0.00
  3     7a:86:31:14:57:2a       yes                0.00
  2     8a:1e:62:e8:f3:c3       no                 7.37
  3     96:e8:d1:2c:b8:ad       no                 3.84
  1     ce:48:c6:8c:ee:1a       no                 3.84
netns8:~ #

netns3:~ # brctl showmacs brx
port no mac addr                is local?       ageing timer   
  2     06:81:88:12:5d:dc       yes                0.00
  2     06:81:88:12:5d:dc       yes                0.00
  4     12:58:a6:73:6c:6e       yes                0.00
  4     12:58:a6:73:6c:6e       yes                0.00
  4     2e:75:26:04:a9:70       no                12.48
  1     52:9b:43:56:37:df       yes                0.00
  1     52:9b:43:56:37:df       yes                0.00
  3     56:d6:b2:80:9a:de       yes                0.00
  3     56:d6:b2:80:9a:de       yes                0.00
  6     8a:1e:62:e8:f3:c3       no                12.48
  2     96:e8:d1:2c:b8:ad       no                 8.94
  5     b6:e9:ef:3d:1c:b7       yes                0.00
  6     b6:e9:ef:3d:1c:b7       yes                0.00
  5     b6:e9:ef:3d:1c:b7       yes                0.00
  5     b6:e9:ef:3d:1c:b7       yes                0.00
  5     ce:48:
c6:8c:ee:1a       no                 8.94
netns3:~ #

Obviously, our bridges learn during pings …

Check of the independence of VLAN definitions on Bry

Just for fun: Let us change the PVID/VID setting on bry:

# Changing PVID/VID in bry 
nsenter -t $pid_netns8 -u -n /bin/bash
bridge vlan add vid 36 pvid untagged dev veth68
bridge vlan add vid 46 pvid untagged dev veth78
bridge vlan add vid 36 pvid untagged dev vethy.50   
bridge vlan add vid 46 pvid untagged dev vethy.60   
bridge vlan del vid 10 dev vethy.50
bridge vlan del vid 10 dev veth68
bridge vlan del vid 20 dev vethy.60
bridge vlan del vid 20 dev veth78
bridge vlan show
exit

This leads to:

netns8:~ # bridge vlan show
port    vlan ids
veth68   36 PVID Egress Untagged

veth78   46 PVID Egress Untagged

bry     None
vethy.50         36 PVID Egress Untagged    

vethy.60         46 Egress Untagged

But still:

netns2:~ # ping 192.168.5.7 -c2
PING 192.168.5.7 (192.168.5.7) 56(84) bytes of data.
From 192.168.5.2 icmp_seq=1 Destination Host Unreachable   
From 192.168.5.2 icmp_seq=2 Destination Host Unreachable

--- 192.168.5.7 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1009ms    
pipe 2
netns2:~ # ping 192.168.5.6 -c2
PING 192.168.5.6 (192.168.5.6) 56(84) bytes of data.
64 bytes from 192.168.5.6: icmp_seq=1 ttl=64 time=0.120 ms
64 bytes from 192.168.5.6: icmp_seq=2 ttl=64 time=0.094 ms

--- 192.168.5.6 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms   
rtt min/avg/max/mdev = 0.094/0.107/0.120/0.013 ms
netns2:~ #

Experiment 5.2 – Two virtual VLANs spanning two Linux bridges connected by a veth based trunk line between trunk ports

Now let us look at another way of connecting the bridges. This time we use a real trunk connection without sub-interfaces. We then have to attach vethx directly to brx and vethy directly to bry. NO PVIDs must be used on the respective ports; however the flag “tagged” is required. And compared to the last settings in bry we have to go back to the PVID/VID values of 10, 20.

Our new connection model is displayed in the following graphics:

We need to change the present bridge and bridge port definitions accordingly. The commands, which you can enter at the prompt of your original terminal window are given below:

# Change vethx to trunk like interface in brx   
nsenter -t $pid_netns3 -u -n /bin/bash
brctl delif brx vethx.50
brctl delif brx vethx.60
ip link del dev vethx.50
ip link del dev vethx.60
brctl addif brx vethx  
bridge vlan add vid 10 tagged dev vethx   
bridge vlan add vid 20 tagged dev vethx
bridge vlan del vid 1 dev vethx
bridge vlan show
exit

And

# Change vethy to trunk like interface in brx   
nsenter -t $pid_netns8 -u -n /bin/bash
brctl delif bry vethy.50
brctl delif bry vethy.60
ip link del dev vethy.50
ip link del dev vethy.60
brctl addif bry vethy
bridge vlan add vid 10 tagged dev vethy
bridge vlan add vid 20 tagged dev vethy
bridge vlan del vid 1 dev vethy
bridge vlan add vid 10 pvid untagged dev veth68  
bridge vlan add vid 20 pvid untagged dev veth78  
bridge vlan del vid 36 dev veth68
bridge vlan del vid 46 dev veth78
bridge vlan show
exit

We get the following bridge/VLAN configurations:

netns8:~ # bridge vlan show            
           
port    vlan ids
veth68   10 PVID Egress Untagged   

veth78   20 PVID Egress Untagged

bry     None
vethy    10
         20

and

netns3:~ # bridge vlan show
port    vlan ids
veth13   10 PVID Egress Untagged    

veth23   10 PVID Egress Untagged

veth43   20 PVID Egress Untagged

veth53   20 PVID Egress Untagged

brx     None
vethx    10
         20

Testing 2 VLANs spanning two bridges/Hosts with a trunk connection

We test by pinging from netns7:

netns7:~ # ping 192.168.5.1 -c2
PING 192.168.5.1 (192.168.5.1) 56(84) bytes of data.
From 192.168.5.7 icmp_seq=1 Destination Host Unreachable    
From 192.168.5.7 icmp_seq=2 Destination Host Unreachable

--- 192.168.5.1 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 999ms     
pipe 2
netns7:~ #

This gives at the bridge device bry in netns8:

netns8:~ # tcpdump -n -i bry  host 192.168.5.1 -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bry, link-type EN10MB (Ethernet), capture size 262144 bytes
15:31:15.527528 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.7, length 28   
15:31:16.526542 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.7, length 28   
15:31:17.526576 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.7, length 28   
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
netns8:~ #

At the outer side of vethx in netns3 we get :

netns3:~ # tcpdump -n -i vethx  host 192.168.5.1 -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vethx, link-type EN10MB (Ethernet), capture size 262144 bytes
15:31:15.527543 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.7, length 28   
15:31:16.526561 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.7, length 28   
15:31:17.526605 8a:1e:62:e8:f3:c3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.7, length 28   
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
netns3:~ #

You see, how the packet tags have changed now: Due to the missing PVIDs at the ports for vethx, vethy and the flag “tagged” we get packets on the vethx/vethy connection line, which carry the original 20 tag they had inside the bridges.

So :

netns7:~ # ping 192.168.5.5 -c2
PING 192.168.5.5 (192.168.5.5) 56(84) bytes of data.
64 bytes from 192.168.5.5: icmp_seq=1 ttl=64 time=0.042 ms    
64 bytes from 192.168.5.5: icmp_seq=2 ttl=64 time=0.092 ms

--- 192.168.5.5 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms   
rtt min/avg/max/mdev = 0.042/0.067/0.092/0.025 ms
netns7:~ #

Obviously, we can connect our bridges with a trunk line between trunk ports, too.

Exactly 2 VLANs spanning 2 bridges with a trunk connection

Note that we MUST provide identical PVID/VID values inside the bridges bry and brx when we use a trunk like connection! VLAN filtering at all bridge ports works in both directions – IN and OUT. As the Ethernet packets keep their VLAN tags
when they leave or enter a bridge, we can not choose the VID/PVID values to be different in bry from brx. So, in contrast to the connection model with the sub-interfaces, we have no choices for PVID/VID assignments; we deal with exactly 2 and not 4 coupled VLANs.

Still, packets leave veth68, 78 and veth13, 23, 43, 53 untagged! The VLANs get established by the bridge and their connection line, alone.

Which connection model is preferable?

The connection model based on trunk port configurations looks simpler than the model based on veth sub-interfaces. However, the connection model based on sub-interfaces allows for much more flexibility and freedom! In addition, it may make it easier to define port related iptables filtering rules.

So, you have the choice how to extend (virtual) VLANs over several bridges/hosts.
Unfortunately, I have not yet tested for any performance differences.

VLANs spanning hosts with Linux bridges

Our test examples were tested on just one host. Is there any major difference when we instead look at 2 hosts, each with a virtual Linux bridge? Not, really. Our devices vethx and vethy would then be two real Ethernet cards like ethx and ethy. But you could make them slaves of the bridges, too, and you could split them into sub-interfaces.

So, our VLANs based on Linux bridge configurations would also work, if the bridges were located on different hosts. For both connection models …

Conclusion

Network namespaces or containers can become members of virtual VLANs. The configuration of bridge ports determines the VLAN setup. We can easily extend such (virtual) VLANs from one bridge to other bridges – even if the bridges are located on different hosts. In addition, we have the choice whether we base the connection on ports based on sub-interfaces or pure trunk ports. This gives us a maximum of flexibility.

But: Our VLANs were strictly separated so far. In reality, however, we may find situations in which a host/container must be member of two VLANs (VLAN1 and VLAN2). How do the veth connections from/to a network namespace look like, if a user in this intermediate network namespace shall be able to talk to all containers/namespaces in VLAN1 and VLAN2?

This is the topic of the next post.

Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – VII

Again, there will be 2 different solutions ….

Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – V

Posted on 21. November 2017 by eremo

In the previous posts of this series

we laid the foundations for working with VLANs in virtual networks between different network namespaces – or containers, if you like.

In the last post (4) I provided rules and commands for establishing VLANs via the configuration of a virtual Linux bridge. We saw how we define VLANs and set VLAN IDs, e.g. with the help of sub-interfaces of veth pairs or at Linux bridge ports (VIDs, PVID).

We apply this knowledge now to build the network environment for an experiment 4, which we described already in the second post:

The objective of this experiment 4 is the setup of two separated virtual VLANs for 2 groups of 4 network namespaces (or containers) with the help of a Linux bridge in a separate fifth network namespace.

In VLANs packet transport is controlled on the link layer and not on the network layer of the TCP/IP protocol. An interesting question for all coming experiments will be, where and how the tagging of the Ethernet packets must occur. Experiment 4 will show that a virtual Linux bridge has a lot in common with real switches – and that in simple cases the bridge configuration alone can define the required VLANs.

Note that we will not use any firewall rules to achieve the separation of the network traffic! However, be aware of the fact that the prevention of ARP spoofing even in our simple scenario requires packet filtering (e.g. by netfilter iptables/ebtables rules).

Experiment 4

The experiment is illustrated in the upper left corner of the graphics below; we configure the area surrounded by the blue dotted line:

You recognize the drawing of our virtual test environment (discussed in the article 2). We set up (unnamed) network namespaces netns1, netns2, netns4, netns5 and of course netns3 with the help of commands discussed in article 1. Remember: The “names” netnx, actually, are hostnames! netns3 contains our bridge “brx“.

VLAN IDs and VLAN tags are numbers. But for visualization purposes you can imagine that we give Ethernet packets that shall be exchanged between netns1 and netns2 a green tag and packets which travel between netns4 and netns5 a pink tag. The small red line between the respective ports inside the bridge represents the separation of our two groups of network namespaces (or containers) via 2 VLANs. For the meaning of other colors around some plug symbols see the text below.

For connectivity tests we need to watch packets of the ARP (address
resolution) protocol and the propagation of ICMP packets. tcpdump will help us to identify such packets at selected interfaces.

Connect 4 network namespaces with the help of a (virtual) Linux bridge in a fifth namespace

As in our previous experiments (see post 2) we enter the following list of commands at a shell prompt. (You may just copy/paste them). The list is a bit lengthy, so you may have to scroll:

# set up namespaces 
unshare --net --uts /bin/bash &
export pid_netns1=$!
nsenter -t $pid_netns1 -u hostname netns1
unshare --net --uts /bin/bash &
export pid_netns2=$!
unshare --net --uts /bin/bash &
export pid_netns3=$!
unshare --net --uts /bin/bash &
export pid_netns4=$!
unshare --net --uts /bin/bash &
export pid_netns5=$!

# assign different hostnames  
nsenter -t $pid_netns1 -u hostname netns1
nsenter -t $pid_netns2 -u hostname netns2
nsenter -t $pid_netns3 -u hostname netns3
nsenter -t $pid_netns4 -u hostname netns4
nsenter -t $pid_netns5 -u hostname netns5

#set up veth devices 
ip link add veth11 netns $pid_netns1 type veth peer name veth13 netns $pid_netns3   
ip link add veth22 netns $pid_netns2 type veth peer name veth23 netns $pid_netns3
ip link add veth44 netns $pid_netns4 type veth peer name veth43 netns $pid_netns3
ip link add veth55 netns $pid_netns5 type veth peer name veth53 netns $pid_netns3

# Assign IP addresses and set the devices up 
nsenter -t $pid_netns1 -u -n /bin/bash
ip addr add 192.168.5.1/24 brd 192.168.5.255 dev veth11
ip link set veth11 up
ip link set lo up
exit
nsenter -t $pid_netns2 -u -n /bin/bash
ip addr add 192.168.5.2/24 brd 192.168.5.255 dev veth22
ip link set veth22 up
ip link set lo up
exit
nsenter -t $pid_netns4 -u -n /bin/bash
ip addr add 192.168.5.4/24 brd 192.168.5.255 dev veth44
ip link set veth44 up
ip link set lo up
exit
nsenter -t $pid_netns5 -u -n /bin/bash
ip addr add 192.168.5.5/24 brd 192.168.5.255 dev veth55
ip link set veth55 up
ip link set lo up
exit

# set up the bridge 
nsenter -t $pid_netns3 -u -n /bin/bash
brctl addbr brx  
ip link set brx up
ip link set veth13 up
ip link set veth23 up
ip link set veth43 up
ip link set veth53 up
brctl addif brx veth13
brctl addif brx veth23
brctl addif brx veth43
brctl addif brx veth53
exit

lsns -t net -t uts

We expect that we can ping from each namespace to all the others. We open a subshell window (see the third post of the series), enter namespace netns5 there and ping e.g. netns2:

mytux:~ # nsenter -t $pid_netns5 -u -n /bin/bash
netns5:~ # ping 192.168.5.2 -c2
PING 192.168.5.2 (192.168.5.2) 56(84) bytes of data.
64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.031 ms   
64 bytes from 192.168.5.2: icmp_seq=2 ttl=64 time=0.029 ms   

--- 192.168.5.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms                                        
rtt min/avg/max/mdev = 0.029/0.030/0.031/0.001 ms

So far so good.

Create and isolate two VLANs for two groups of network namespaces (or containers) via proper port configuration of a Linux bridge

We have not set up the ports of our bridge, yet, to handle different VLANs. A look into the rules discussed in the last post provides the necessary information, and we execute the following commands:

# set up 2 VLANs  
nsenter -t $pid_netns3 -u -n /bin/bash
ip link set dev brx type bridge vlan_filtering 1
bridge vlan add vid 10 pvid untagged dev veth13
bridge vlan add vid 10 pvid untagged dev veth23
bridge vlan add vid 20 pvid 
untagged dev veth43
bridge vlan add vid 20 pvid untagged dev veth53
bridge vlan del vid 1 dev brx self
bridge vlan del vid 1 dev veth13
bridge vlan del vid 1 dev veth23
bridge vlan del vid 1 dev veth43
bridge vlan del vid 1 dev veth53
bridge vlan show 
exit

Note:

For working on the bridge’s Ethernet interface itself we need the “self” string.

Question: Where must and will VLAN tags be attached to network packets – inside or/and outside the bridge?
Answer: In our present scenario inside the bridge, only.

This is consistent with using the option “untagged” at all ports: Outside the bridge there are only untagged Ethernet packets.

The command “bridge VLAN show” gives us an overview over our VLAN settings and the corresponding port configuration:

netns3:~ # bridge vlan show
port    vlan ids
veth13   10 PVID Egress Untagged   

veth23   10 PVID Egress Untagged

veth43   20 PVID Egress Untagged

veth53   20 PVID Egress Untagged

brx     None
netns3:~ #

In our setup VID 10 corresponds to the “green” VLAN and VID 20 to the “pink” one.

Please note that there is absolutely no requirement to give the bridge itself an IP address or to define VLAN sub-interfaces of the bridge’s own Ethernet interface. Treating and configuring the bridge itself as an Ethernet device may appear convenient and is a standard background operation of many applications, which configure bridges. E.g. of virt-manager. But in my opinion such an implicit configuration only leads to unclear and potentially dangerous situations for packet filtering. A bridge with an IP gets an additional and special, but fully operational interface to its environment (here to its network namespace) – besides the “normal” ports to clients. It is easy to forget this special interface. Actually, it even gets a default PVID and VID (value 1) assigned. But I delete these VID/PVID almost always to avoid any traffic at the bridges default interface. Personally, I use a bridge very, very seldom as an Ethernet device with an IP address. If I need a connection to the surrounding network namespace I use a veth device, instead. Then we have an explicitly defined port. In our experiment 4 such a connection is not required.

Testing the VLANs

Now we open 2 sub shell windows for entering our namespaces (in KDE e.g. by “konsole &>/dev/null &”).

First we watch traffic from 192.168.5.1 through veth43 in netns3 in one of our shells:

mytux:~ # nsenter -t $pid_netns4 -u -n /bin/bash
netns3:~ # tcpdump -n -i veth43  host 192.168.5.1 -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode  
listening on veth43, link-type EN10MB (Ethernet), capture size 262144 bytes

Then we open another shell and try to ping netns4 from netns1 :

mytux:~ # nsenter -t $pid_netns1 -u -n /bin/bash 
netns1:~ # ping 192.168.5.4
PING 192.168.5.4 (192.168.5.4) 56(84) bytes of data.
^C
--- 192.168.5.4 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1007ms

Nothing happens at veth43 in netns3! This was to be expected as our VLAN for VID 10, of course, is isolated from VLAN with VID 20.

However, if we watch traffic on veth23 in netns3 and ping in parallel for netns2 and later for netns4 from netns1, we get (inside netns1):

netns1:~ # ping 192.168.5.2
PING 192.168.5.2 (192.168.5.2) 56(84) bytes of data.
64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.090 ms  
64 bytes from 192.168.5.2: icmp_seq=2 ttl=64 time=0.064 ms
^C
--- 192.168.5.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms   
rtt min/avg/max/mdev = 0.064/0.077/0.090/0.013 ms
nnetns1:~ # ^C
netns1:~ # ping 192.168.5.4
PING 192.168.5.4 (192.168.5.4) 56(84) bytes of data.
From 192.168.5.1 icmp_seq=1 Destination Host Unreachable  
From 192.168.5.1 icmp_seq=2 Destination Host Unreachable
From 192.168.5.1 icmp_seq=3 Destination Host Unreachable
^C
--- 192.168.5.4 ping statistics ---
6 packets transmitted, 0 received, +3 errors, 100% packet loss, time 5031ms                          
pipe 3

At the same time in netns3:

netns3:~ # tcpdump -n -i veth23  host 192.168.5.1 -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on veth23, link-type EN10MB (Ethernet), capture size 262144 bytes
16:13:59.748075 f2:3d:63:de:a8:41 > 6e:12:2e:cf:c1:25, ethertype IPv4 (0x0800), length 98: 192.168.5.1 > 192.168.5.2: ICMP echo request, id 29195, seq 1, length 64    
16:13:59.748106 6e:12:2e:cf:c1:25 > f2:3d:63:de:a8:41, ethertype IPv4 (0x0800), length 98: 192.168.5.2 > 192.168.5.1: ICMP echo reply, id 29195, seq 1, length 64
16:14:00.748326 f2:3d:63:de:a8:41 > 6e:12:2e:cf:c1:25, ethertype IPv4 (0x0800), length 98: 192.168.5.1 > 192.168.5.2: ICMP echo request, id 29195, seq 2, length 64   
16:14:00.748337 6e:12:2e:cf:c1:25 > f2:3d:63:de:a8:41, ethertype IPv4 (0x0800), length 98: 192.168.5.2 > 192.168.5.1: ICMP echo reply, id 29195, seq 2, length 64
16:16:48.630614 f2:3d:63:de:a8:41 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.5.4 tell 192.168.5.1, length 28
16:16:49.628213 f2:3d:63:de:a8:41 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.5.4 tell 192.168.5.1, length 28
16:16:50.628220 f2:3d:63:de:a8:41 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.5.4 tell 192.168.5.1, length 28
16:16:51.645477 f2:3d:63:de:a8:41 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.5.4 tell 192.168.5.1, length 28
16:16:52.644229 f2:3d:63:de:a8:41 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.5.4 tell 192.168.5.1, length 28
16:16:53.644171 f2:3d:63:de:a8:41 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.5.4 tell 192.168.5.1, length 28
^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel

You may test the other communication channels in the same way. Obviously, we have succeeded in isolating a “green” communication area from a “pink” one! On the link layer level – i.e. despite the fact that all members of both VLANs belong to the same IP network class!

Note that even a user on the host can not see the traffic inside the two VLANs directly; he/she does not even see the network interfaces with “ip a s” as they all are located in network namespaces different from its own …

VLAN tags on packets outside the bridge?

Just for fun (and for the preparation of coming experiments) we want to try and assign a “brown” tag to packets outside the bridge, namely those moving along the veth connection line to netns2.

On real Ethernet devices you need to define sub-devices to achieve a VLAN tagging. Actually, this works with veth interfaces, too! With the following command list we extend each of our interfaces veth22 and veth23 by a sub-interface. We assign the IP address 192.168.5.2 afterwards to the sub-interface veth22.50 of veth22 (instead of veth22 itself). Instead of veth23 we then plug its new sub-interface into our virtual bridge to terminate the connection correctly.

# Replace veth22, veth23 with sub-interfaces 
nsenter -t $pid_netns3 -u -n /bin/bash
brctl delif brx veth23
ip link add link veth23 name veth23.50 type vlan id 50  
ip link set veth23.50 up
brctl addif brx veth23.50 
exit 
nsenter -t $pid_netns2 -u -n /bin/bash
ip addr del 192.168.5.2/24 brd 192.168.5.255 dev veth22
ip link 
add link veth22 name veth22.50 type vlan id 50
ip addr add 192.168.5.2/24 brd 192.168.5.255 dev veth22.50    
ip link set veth22.50 up
bridge vlan add vid 10 pvid untagged dev veth23.50
bridge vlan del vid 1 dev veth23.50
exit

The PVID/VID-setting is done for the new sub-interface “veth23.50” on the bridge! Note that the “green” VID 10 inside the bridge is different from the VLAN ID 50, which is used outside the bridge (“brown” tags). According to the rules presented in the last article this should not have any impact on our VLANs:

Tags of incoming packets entering the bridge via veth23 are removed and replaced the green tag (10) before forwarding occurs inside the bridge. Outgoing packets first get their green tag removed due to the fact that we have marked the port with the flag “untagged”. But on the outside of the bridge the veth sub-interface re-marks the packets with the “brown” tag.

We ping netns2

netns1:~ # ping 192.168.5.2 -c3
PING 192.168.5.2 (192.168.5.2) 56(84) bytes of data.
64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.099 ms  
64 bytes from 192.168.5.2: icmp_seq=2 ttl=64 time=0.055 ms
64 bytes from 192.168.5.2: icmp_seq=3 ttl=64 time=0.094 ms

--- 192.168.5.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms   
rtt min/avg/max/mdev = 0.055/0.082/0.099/0.022 ms
netns1:~ #

and capture the respective packets at “veth23” with tcpdump:

netns3:~ # bridge vlan show
port    vlan ids
veth13   10 PVID Egress Untagged

veth43   20 PVID Egress Untagged

veth53   20 PVID Egress Untagged

brx     None
veth23.50        10 PVID Egress Untagged

netns3:~ # tcpdump -n -i veth23  host 192.168.5.1 -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on veth23, link-type EN10MB (Ethernet), capture size 262144 bytes         
17:38:55.962118 f2:3d:63:de:a8:41 > 6e:12:2e:cf:c1:25, ethertype 802.1Q (0x8100), length 102: vlan 50, p 0, ethertype IPv4, 192.168.5.1 > 192.168.5.2: ICMP echo request, id 1772, seq 1, length 64   
17:38:55.962155 6e:12:2e:cf:c1:25 > f2:3d:63:de:a8:41, ethertype 802.1Q (0x8100), length 102: vlan 50, p 0, ethertype IPv4, 192.168.5.2 > 192.168.5.1: ICMP echo reply, id 1772, seq 1, length 64
17:38:56.961095 f2:3d:63:de:a8:41 > 6e:12:2e:cf:c1:25, ethertype 802.1Q (0x8100), length 102: vlan 50, p 0, ethertype IPv4, 192.168.5.1 > 192.168.5.2: ICMP echo request, id 1772, seq 2, length 64
17:38:56.961116 6e:12:2e:cf:c1:25 > f2:3d:63:de:a8:41, ethertype 802.1Q (0x8100), length 102: vlan 50, p 0, ethertype IPv4, 192.168.5.2 > 192.168.5.1: ICMP echo reply, id 1772, seq 2, length 64
17:38:57.960293 f2:3d:63:de:a8:41 > 6e:12:2e:cf:c1:25, ethertype 802.1Q (0x8100), length 102: vlan 50, p 0, ethertype IPv4, 192.168.5.1 > 192.168.5.2: ICMP echo request, id 1772, seq 3, length 64   
17:38:57.960328 6e:12:2e:cf:c1:25 > f2:3d:63:de:a8:41, ethertype 802.1Q (0x8100), length 102: vlan 50, p 0, ethertype IPv4, 192.168.5.2 > 192.168.5.1: ICMP echo reply, id 1772, seq 3, length 64
17:39:00.976243 6e:12:2e:cf:c1:25 > f2:3d:63:de:a8:41, ethertype 802.1Q (0x8100), length 46: vlan 50, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.2, length 28
17:39:00.976278 f2:3d:63:de:a8:41 > 6e:12:2e:cf:c1:25, ethertype 802.1Q (0x8100), length 46: vlan 50, p 0, ethertype ARP, Reply 192.168.5.1 is-at f2:3d:63:de:a8:41, length 28

Note the information ” ethertype 802.1Q (0x8100), length 46: vlan 50″ which proves the tagging with 50 outside the bridge.

Note further that we needed to capture on device veth23 – on device veth23.50 we do not see the tagging:

netns3:~ # tcpdump -n -i veth23.50  host 192.168.5.1 -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on veth23.50, link-type EN10MB (Ethernet), capture size 
262144 bytes
17:45:29.015840 f2:3d:63:de:a8:41 > 6e:12:2e:cf:c1:25, ethertype IPv4 (0x0800), length 98: 192.168.5.1 > 192.168.5.2: ICMP echo request, id 2222, seq 1, length 64   
17:45:29.015875 6e:12:2e:cf:c1:25 > f2:3d:63:de:a8:41, ethertype IPv4 (0x0800), length 98: 192.168.5.2 > 192.168.5.1: ICMP echo reply, id 2222, seq 1, length 64

Can we see the tagging inside the bridge? Yes, we can:

netns3:~ # tcpdump -n -i brx  host 192.168.5.1 -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on brx, link-type EN10MB (Ethernet), capture size 262144 bytes
17:51:41.563316 f2:3d:63:de:a8:41 > 6e:12:2e:cf:c1:25, ethertype 802.1Q (0x8100), length 102: vlan 10, p 0, ethertype IPv4, 192.168.5.1 > 192.168.5.2: ICMP echo request, id 2535, seq 1, length 64   
17:51:41.563343 6e:12:2e:cf:c1:25 > f2:3d:63:de:a8:41, ethertype 802.1Q (0x8100), length 102: vlan 10, p 0, ethertype IPv4, 192.168.5.2 > 192.168.5.1: ICMP echo reply, id 2535, seq 1, length 64
17:51:42.562333 f2:3d:63:de:a8:41 > 6e:12:2e:cf:c1:25, ethertype 802.1Q (0x8100), length 102: vlan 10, p 0, ethertype IPv4, 192.168.5.1 > 192.168.5.2: ICMP echo request, id 2535, seq 2, length 64
17:51:42.562387 6e:12:2e:cf:c1:25 > f2:3d:63:de:a8:41, ethertype 802.1Q (0x8100), length 102: vlan 10, p 0, ethertype IPv4, 192.168.5.2 > 192.168.5.1: ICMP echo reply, id 2535, seq 2, length 64
17:51:43.561327 f2:3d:63:de:a8:41 > 6e:12:2e:cf:c1:25, ethertype 802.1Q (0x8100), length 102: vlan 10, p 0, ethertype IPv4, 192.168.5.1 > 192.168.5.2: ICMP echo request, id 2535, seq 3, length 64   
17:51:43.561367 6e:12:2e:cf:c1:25 > f2:3d:63:de:a8:41, ethertype 802.1Q (0x8100), length 102: vlan 10, p 0, ethertype IPv4, 192.168.5.2 > 192.168.5.1: ICMP echo reply, id 2535, seq 3, length 64
17:51:46.576259 6e:12:2e:cf:c1:25 > f2:3d:63:de:a8:41, ethertype 802.1Q (0x8100), length 46: vlan 10, p 0, ethertype ARP, Request who-has 192.168.5.1 tell 192.168.5.2, length 28
17:51:46.576276 f2:3d:63:de:a8:41 > 6e:12:2e:cf:c1:25, ethertype 802.1Q (0x8100), length 46: vlan 10, p 0, ethertype ARP, Reply 192.168.5.1 is-at f2:3d:63:de:a8:41, length 28
^C

Note: “ethertype 802.1Q (0x8100), length 46: vlan 10”. Inside the bridge we have the tag 10 – as expected. In our setup the external VLAN tagging is irrelevant!

The separation of communication paths between different ports inside of the bridge can be controlled by the bridge setup alone – independent of any VLAN packet tagging, which may occur outside the bridge!

This enhances security: VLAN tags can be manipulated outside the bridge. But as such tags get stripped when packets enter the bridge via ports based on veth sub-interfaces, this won’t help an attacker so much …. :-).

For certain purposes we can (and will) use VLAN tagging also along certain connections outside the bridge – but the control and isolation of network paths between containers on one and the same virtualization host normally does not require VLAN tagging outside a bridge. The big exception is of course when routing to the outside world is required. But this is the topic of later blog posts.

If you like, you can now test that one can not ping e.g. netns5 from netns2. This will not be possible as inside the bridge packets from netns2 get tags for the VLAN ID 10 as we have seen – and neither the port based on veth43 nor the port for veth53 will allow any such packets to pass.

VLANs support security, but traffic separation alone is not sufficient. Some spoofing attack vectors would try to flood the bridge with wrong information about MACs. The dynamic learning of a port-MAC relation then becomes a disadvantage. One may think that the bridges’s internal tagging would nevertheless block a packet misdirection to the wrong VLAN. However, the real behavior may depend on details of the bridges’s handling of the protocol stacks and the point when tagging occurs. I do not understand enough, yet, about this. So, better work proactively:
There are parameters by which you can make the port-MAC relations almost static. Use them and implement netfilter rules in addition! You need such rules anyway to avoid ARP spoofing within each VLAN.

Traffic between VLANs?

If you for some reasons need to allow for traffic between you have to establish routing outside the bridge and limit the type of traffic allowed by packet filter rules. A typical scenario would be that some clients in one VLAN need access to services (special TCP ports) of a container in a network namespace attached to another VLAN. I do not follow this road here, yet, because right now I am more interested in isolation. But see the following links for examples of routing between VLANs :
https://serverfault.com/ questions/ 779115/ forward-traffic-between-vlans-with-iptables
https://www.riccardoriva.info/blog/?p=35

Conclusion

Obviously, we can use a virtual Linux bridge in a separate network namespace to isolate communication paths between groups of other network namespaces against each other. This can be achieved by making the bridge VLAN aware and by setting proper VIDs, PVIDs on the bridge ports of veth interfaces. Multiple VLANs can thus be establish by just one bride. We have shown that the separation works even if all members of both VLANs belong to the same IP network class.

We did not involve the bridge’s own Ethernet interface and we did not need any packet tagging outside the bridge to achieve our objective. In our case it was not necessary to define sub-interfaces on either side of our veth connections. But even if we had used sub-interfaces and tagging outside the bridge it would not have destroyed the operation of our VLANs. The bridge itself establishes the VLANs; thinking virtual VLANs means thinking virtual bridges/switches – at least since kernel 3.9!

If we associated the four namespaces with 4 LXC containers our experiment 4 would correspond to a typical scenario for virtual networking on a host, whose containers are arranged in groups. Only members of a group are allowed to communicate with each other. How about extending such a grouping of namespaces/containers to another host? We shall simulate such a situation in the next blog post …

Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – VI

Stay tuned !

Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – III

Posted on 14. November 2017 by eremo

In the first blog post
Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – I
of this series about virtual networking between network namespaces I had discussed some basic Linux commands to set up and enter network namespaces on a Linux system.

In a second post
Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – II
I suggested and described several networking experiments which can quickly be set up by these tools. As containers are based on namespaces we can study virtual networking between containers on a host in principle just by connecting network namespaces. Makes e.g. the planning of firewall rules and VLANs a bit easier …

The virtual environment we want to build up and explore step by step is displayed in the following graphics:

In this article we shall cover experiment 1 and experiment 2 discussed in the last article – i.e. we start with the upper left corner of the drawing.

Experiment 1: Connect two network namespaces directly

This experiments creates the dotted line between netns1 and netns2. Though simple this experiments lays a foundation for all other experiments.

We place the two different Ethernet interfaces of a veth device in the two (unnamed) network namespaces (with hostnames) netns1 and netns2. We assign IP addresses (of the same network class) to the interfaces and check a basic communication between the network namespaces. The situation corresponds to the following simple picture:

What shell commands can be used for achieving this? You may put the following lines in a file for keeping them for further experiments or to create a shell script:

unshare --net --uts /bin/bash &
export pid_netns1=$!
nsenter -t $pid_netns1 -u hostname netns1
unshare --net --uts /bin/bash &
export pid_netns2=$!
nsenter -t $pid_netns2 -u hostname netns2
ip link add veth11 netns $pid_netns1 type veth peer name veth22 netns $pid_netns2   
nsenter -t $pid_netns1 -u -n /bin/bash
ip addr add 192.168.5.1/24 brd 192.168.5.255 dev veth11
ip link set veth11 up
ip link set lo up
ip a s
exit
nsenter -t $pid_netns2 -u -n /bin/bash
ip addr add 192.168.5.2/24 brd 192.168.5.255 dev veth22
ip link set veth22 up
ip a s
exit
lsns -t net -t uts

If you copy these lines to the prompt of a root shell of some host “mytux” you will get something like the following:

mytux:~ # unshare --net --uts /bin/bash &
[2] 32146
mytux:~ # export pid_netns1=$!

[2]+  Stopped                 unshare --net --uts /bin/bash
mytux:~ # nsenter -t $pid_netns1 -u hostname netns1
mytux:~ # unshare --net --uts /bin/bash &
[3] 32154
mytux:~ # export pid_netns2=$!

[3]+  Stopped                 unshare --net --uts /bin/bash
mytux:~ # nsenter -t $pid_netns2 -u hostname netns2
mytux:~ # ip link add veth11 netns $pid_netns1 type veth peer name veth22 netns $pid_netns2   
mytux:~ # nsenter -t 
$pid_netns1 -u -n /bin/bash
netns1:~ # ip addr add 192.168.5.1/24 brd 192.168.5.255 dev veth11
netns1:~ # ip link set veth11 up
netns1:~ # ip link set lo up
netns1:~ # ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1   
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: veth11: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000     
    link/ether da:34:49:a6:18:ce brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.5.1/24 brd 192.168.5.255 scope global veth11
       valid_lft forever preferred_lft forever
netns1:~ # exit
exit
mytux:~ # nsenter -t $pid_netns2 -u -n /bin/bash
netns2:~ # ip addr add 192.168.5.2/24 brd 192.168.5.255 dev veth22
netns2:~ # ip link set veth22 up
netns2:~ # ip a s
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth22: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000   
    link/ether f2:ee:52:f9:92:40 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.5.2/24 brd 192.168.5.255 scope global veth22
       valid_lft forever preferred_lft forever
    inet6 fe80::f0ee:52ff:fef9:9240/64 scope link tentative 
       valid_lft forever preferred_lft forever
netns2:~ # exit
exit
mytux:~ # lsns -t net -t uts
        NS TYPE NPROCS   PID USER  COMMAND
4026531838 uts     387     1 root  /usr/lib/systemd/systemd --switched
4026531963 net     385     1 root  /usr/lib/systemd/systemd --switched
4026532178 net       1   581 root  /usr/sbin/haveged -w 1024 -v 0 -F
4026540861 net       1  4138 rtkit /usr/lib/rtkit/rtkit-daemon
4026540984 uts       1 32146 root  /bin/bash
4026540986 net       1 32146 root  /bin/bash
4026541078 uts       1 32154 root  /bin/bash
4026541080 net       1 32154 root  /bin/bash
mytux:~ #

Of course, you recognize some of the commands from my first blog post. Still, some details are worth a comment:

Unshare, background shells and shell variables

We create a separate network (and uts) namespace with the “unshare” command and background processes.

unshare –net –uts /bin/bash &

Note the options! We export shell variables with the PIDs of the started background processes [$!] to have these PIDs available in subshells later on. Note: From our original terminal window (in my case a KDE “konsole” window) we can always open a subshell window with:

mytux:~ # konsole &>/dev/null

You may use another terminal window command on your system. The output redirection is done only to avoid KDE message clattering. In the subshell you may enter a previously created network namespace netnsX by

nsenter -t $pid_netnsX -u -n /bin/bash

Hostnames to distinguish namespaces at the shell prompt

Assignment of hostnames to the background processes via commands like

nsenter -t $pid_netns1 -u hostname netns1

This works through the a separation of the uts namespace. See the first post for an explanation.

Create veth devices with the “ip” command

The key command to create a veth device and to assign its two interfaces to 2 different network namespaces is:

ip link add veth11 netns $pid_netns1 type veth peer name veth22 netns $pid_netns2

Note, that we can use PIDs to identify the target network namespaces! Explicit names of the network namespaces are not required!

The importance of a running lo-device in each network namespace

We intentionally did not set the loopback device “lo” up in netns2. This leads to an interesting observation, which many admins are not aware of:

The lo device is required (in UP status) to be able to ping network interfaces (here e.g. veth11) in the local namespace!

This is standard: If you do not specify the interface to ping from via an option “-I” the ping command will use device lo as a default! The ping traffic runs through it! Normally, we just do not realize this point, because lo almost always is UP on a standard system (in its root namespace).

For testing the role of “lo” we now open a separate terminal window:

mytux:~ # konsole &>/dev/null

There:

mytux:~ # nsenter -t $pid_netns2 -u -n /bin/bash
netns2:~ # ping 192.168.5.2
PING 192.168.5.2 (192.168.5.2) 56(84) bytes of data.
^C
--- 192.168.5.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1008ms   

netns2:~ # ip link set lo up
netns2:~ # ping 192.168.5.2 -c2
PING 192.168.5.2 (192.168.5.2) 56(84) bytes of data.
64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.017 ms
64 bytes from 192.168.5.2: icmp_seq=2 ttl=64 time=0.033 ms

--- 192.168.5.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 998ms    
rtt min/avg/max/mdev = 0.017/0.034 ms

And: Within the same namespace and “lo” down you cannot even ping the second Ethernet interface of a veth device from the first interface – even if they belong to the same network class!

Open a new sub shell and enter e.g. netns1 there:

netns1:~ # ip link add vethx type veth peer name vethy 
netns1:~ # ip addr add 192.168.20.1/24 brd 192.168.20.255 dev vethx    
netns1:~ # ip addr add 192.168.20.2/24 brd 192.168.20.255 dev vethy    
netns1:~ # ip link set vethx up
netns1:~ # ip link set vethy up
netns1:~ # ping 192.168.20.2 -I 192.168.20.1
PING 192.168.20.2 (192.168.20.2) from 192.168.20.1 : 56(84) bytes of data.    
^C
--- 192.168.20.2 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3000ms
netns1:~ # ip link set lo up
netns1:~ # ping 192.168.20.2 -I 192.168.20.1       
PING 192.168.20.2 (192.168.20.2) from 192.168.20.1 : 56(84) bytes of data.   
64 bytes from 192.168.20.2: icmp_seq=1 ttl=64 time=0.019 ms     
64 bytes from 192.168.20.2: icmp_seq=2 ttl=64 time=0.052 ms                                
^C                                                              
--- 192.168.20.2 ping statistics ---                            
2 packets transmitted, 2 received, 0% packet loss, time 999ms   
rtt min/avg/max/mdev = 0.019/0.035/0.052/0.017 ms               
netns1:~ #

Connection test

Now back to our experiment. Let us now try to ping netns1 from netns2:

netns2:~ # ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1   
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: veth22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000     
    link/ether f2:ee:52:f9:92:40 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.5.2/24 brd 192.168.5.255 scope global veth22
       valid_lft forever preferred_lft forever
    inet6 fe80::f0ee:52ff:fef9:9240/64 scope link 
       valid_lft forever preferred_lft forever
netns2:~ # ping 192.168.5.1
PING 192.168.5.1 (192.168.5.1) 56(84) bytes of data.
64 bytes from 192.168.5.1: icmp_seq=1 ttl=64 time=0.030 ms  
64 bytes from 192.168.5.1: icmp_seq=2 ttl=64 time=0.033 ms  
64 bytes from 192.168.5.1: icmp_seq=3 ttl=64 time=0.036 ms  
^C
--- 192.168.5.1 ping statistics ---                                                                  
3 packets transmitted, 3 received, 0% packet loss, time 1998ms                                       
rtt min/avg/max/mdev = 0.030/0.033/0.036/0.002 ms                                                    
netns2:~ #

OK! And vice versa:

mytux:~ #  nsenter -t $pid_netns1 -u -n /bin/bash
netns1:~ #  nsenter -t $pid_netns2 -u -n /bin/bash
netns1:~ # ping 192.168.5.2 -c2
PING 192.168.5.2 (192.168.5.2) 56(84) bytes of data.
64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.023 ms   
64 bytes from 192.168.5.2: icmp_seq=2 ttl=64 time=0.023 ms

--- 192.168.5.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1003ms   
rtt min/avg/max/mdev = 0.023/0.023/0.023/0.000 ms
netns1:~ #

Our direct communication via veth works as expected! Network packets are not stopped by network namespace borders – this would not make much sense.

Experiment 2: Connect two namespaces via a bridge in a third namespace

We now try a connection of netns1 and netns2 via a Linux bridge “brx“, which we place in a third namespace netns3:

Note:

This is a standard way to connect containers on a host!

LXC tools as well as libvirt/virt-manager would help you to establish such a bridge! However, the bridge would normally be place inside the host’s root namespace. In my opinion this is not a good idea:

A separate 3rd namespace gets the the bridge and related firewall and VLAN rules outside the control of the containers. But a separate namespace also helps to isolate the host against any communication (and possible attacks) coming from the containers!

So, let us close our sub terminals from the first experiment and kill the background shells:

mytux:~ # kill -9 32146
[2]-  Killed                  unshare --net --uts /bin/bash
mytux:~ # kill -9 32154
[3]+  Killed                  unshare --net --uts /bin/bash

We adapt our setup commands now to create netns3 and bridge “brx” there by using “brctl bradd“. Futhermore we add two different veth devices; each with one interface in netns3. We attach the interface to the bridge via “brctl addif“:

unshare --net --uts /bin/bash &
export pid_netns1=$!
nsenter -t $pid_netns1 -u hostname netns1
unshare --net --uts /bin/bash &
export pid_netns2=$!
nsenter -t $pid_netns2 -u hostname netns2
unshare --net --uts /bin/bash &
export pid_netns3=$!
nsenter -t $pid_netns3 -u hostname netns3
nsenter -t $pid_netns3 -u -n /bin/bash
brctl addbr brx  
ip link set brx up
exit 
ip link add veth11 netns $pid_netns1 type veth peer name veth13 netns $pid_netns3     
ip link add veth22 netns $pid_netns2 type veth peer name veth23 netns $pid_netns3    
nsenter -t $pid_netns1 -u -n /bin/bash
ip addr add 192.168.5.1/24 brd 192.
168.5.255 dev veth11
ip link set veth11 up
ip link set lo up
ip a s
exit
nsenter -t $pid_netns2 -u -n /bin/bash
ip addr add 192.168.5.2/24 brd 192.168.5.255 dev veth22
ip link set veth22 up
ip a s
exit
nsenter -t $pid_netns3 -u -n /bin/bash
ip link set veth13 up
ip link set veth23 up
brctl addif brx veth13
brctl addif brx veth23
exit

It is not necessary to show the reaction of the shell to these commands. But note the following:

The bridge has to be set into an UP status.
The veth interfaces located in netns3 do not get an IP address. Actually, a veth interface plays a different role on a bridge than in normal surroundings.
The bridge itself does not get an IP address.

Bridge ports

By attaching the veth interfaces to the bridge we create a “port” on the bridge, which corresponds to some complicated structures (handled by the kernel) for dealing with Ethernet packets crossing the port. You can imagine the situation as if e.g. the veth interface veth13 corresponds to the RJ45 end of a cable which is plugged into the port. Ethernet packets are taken at the plug, get modified sometimes and then are transferred across the port to the inside of the bridge.

However, when we assign an Ethernet address to the other interface, e.g. veth11 in netns1, then the veth “cable” ends in a full Ethernet device, which accepts network commands as “ping” or “nc”.

No IP address for the bridge itself!
We do NOT assign an IP address to the bridge itself; this is a bit in contrast to what e.g. happens when you set up a bridge for networking with the tools of virt-manager. Or what e.g. Opensuse does, when you setup a KVM virtualization host with YaST. In all these cases something like

ip addr add 192.168.5.100/24 brd 192.168.5.255 dev brx

happens in the background. However, I do not like this kind of implicit politics, because it opens ways into the namespace surrounding the bridge! And it is easy to forget this bridge interface both in VLAN and firewall rules.

Almost always, there is no necessity to provide an IP address to the bridge itself. If we need an interface of a namespace, a container or the host to a Linux bridge we can always use a veth device. This leads to a much is much clearer situation; you see the Ethernet interface and the port to the bridge explicitly – thus you have much better control, especially with respect to firewall rules.

Enter network namespace netns3

Now we open a terminal as a sub shell (as we did in the previous example) and enter netns3 to have a look at the interfaces and the bridge.

mytux:~ # nsenter -t $pid_netns3 -u -n /bin/bash
netns3:~ # brctl show brx
bridge name     bridge id               STP enabled     interfaces
brx             8000.000000000000       no
netns3:~ # ip a s
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: brx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ce:fa:74:92:b5:00 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1c08:76ff:fe0c:7dfe/64 scope link 
       valid_lft forever preferred_lft forever
3: veth13@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master brx state UP group default qlen 1000   
    link/ether ce:fa:74:92:b5:00 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::ccfa:74ff:fe92:b500/64 scope link 
       valid_lft forever preferred_lft forever
4: veth23@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master brx state UP group default qlen 1000   
    link/ether fe:5e:0b:d1:44:69 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::fc5e:
bff:fed1:4469/64 scope link 
       valid_lft forever preferred_lft forever
netns3:~ # bridge link
3: veth13 state UP @brx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master brx state forwarding priority 32 cost 2    
4: veth23 state UP @brx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master brx state forwarding priority 32 cost 2

Useful commands

Let us briefly discuss some useful commands:

Incomplete information of “brctl show”
Unfortunately, the standard command

brctl show brx

does not work properly inside network namespaces; it does not produce a complete output. E.g., the attached interfaces are not shown. However, the command

ip a s

shows all interfaces and their respective “master“. The same is true for the very useful “bridge” command :

bridge link

If you want to see even more details on interfaces use

ip -d a s

and grep the line for a specific interface.

Just for completeness: To create a bridge and add a veth devices to the bridge, we could also have used:

ip link add name brx type bridge
ip link set brx up
ip link set dev veth13 master brx   
ip link set dev veth23 master brx

Connectivity test with ping
Now, let us turn to netns1 and test connectivity:

mytux:~ # nsenter -t $pid_netns1 -u -n /bin/bash
netns1:~ # ip a s 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: veth11@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000    
    link/ether 6a:4d:0c:30:12:04 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.5.1/24 brd 192.168.5.255 scope global veth11
       valid_lft forever preferred_lft forever                                                       
    inet6 fe80::684d:cff:fe30:1204/64 scope link                                                     
       valid_lft forever preferred_lft forever                                                       
netns1:~ # ping 192.168.5.2
PING 192.168.5.2 (192.168.5.2) 56(84) bytes of data.
64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.039 ms
64 bytes from 192.168.5.2: icmp_seq=2 ttl=64 time=0.045 ms
64 bytes from 192.168.5.2: icmp_seq=3 ttl=64 time=0.054 ms
^C
--- 192.168.5.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.039/0.046/0.054/0.006 ms
netns1:~ # nc -l 41234

Note that – as expected – we do not see anything of the bridge and its interfaces in netns1! Note that the bridge basically is a device on the data link layer, i.e. OSI layer 2. In the current configuration we did nothing to stop the propagation of Ethernet packets on this layer – this will change in further experiments.

Connectivity test with netcat
At the end of our test we used the netcat command “nc” to listen on a TCP port 41234. At another (sub) terminal we can now start a TCP communication from netns2 to the TCP port 41234 in netns1:

mytux:~ # nsenter -t $pid_netns2 -u -n /bin/bash
netns2:~ # nc 192.168.5.1 41234
alpha
beta

This leads to an output after the last command in netns1:

netns1:~ # nc -l 41234
alpha
beta

So, we have full connectivity – not only for ICMP packets, but also for TCP packets. In yet another terminal:

mytux:~ # nsenter -t $pid_netns1 -u -n /bin/bash
netns1:~ # netstat -a
Active 
Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 *:41234                 *:*                     LISTEN      
tcp        0      0 192.168.5.1:41234       192.168.5.2:45122       ESTABLISHED     
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path   
netns1:~ #

Conclusion

It is pretty easy to connect network namespaces with veth devices. The interfaces can be assigned to different network namespaces by using a variant of the “ip” command. The target network namespaces can be identified by PIDs of their basic processes. We can link to namespaces directly via the interfaces of one veth device.

An alternative is to use a Linux bridge (for Layer 2 transport) in yet another namespace. The third namespace provides better isolation; the bridge is out of the view and control of the other namespaces.

We have seen that the commands “ip a s” and “bridge link” are useful to get information about the association of bridges and their assigned interfaces/ports in network namespaces.

In the coming article
Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – IV
we extend our efforts to creating VLANs with the help of our Linux bridge. Stay tuned ….

Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – II

Posted on 12. November 2017 by eremo

The topics of this blog post series are

the basic handling of network namespaces
and virtual networking between different network namespaces.

One objective is a better understanding of the mechanisms behind the setup for future (LXC) containers on a host; containers are based on namespaces (see the last post of this series for a mini introduction). The most important Linux namespace for networking is the so called “network namespace”.

As explained in the previous article
Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – I
it is interesting and worthwhile to perform network experiments without referring to explicit names for network namespaces. Especially, when you plan to administer LXC containers with libvirt/virt-manager. You then cannot use the standard LXC tools or “ip“-options for explicit network namespace names.

We, therefore, had a look at relevant options for the ip-command and other typical userspace tools. The basic trick was/is to refer to PIDs of the processes originally associated with network namespaces. I discussed commands for listing network namespaces and associated processes. In addition, I showed how one can use shells for entering new or existing unnamed network namespaces. We finalized the first post with the creation of a veth device inside a distinct network namespace.

Advanced experiments – communication scenarios between network namespaces and groups of namespaces (or containers)

Regarding networking a container is represented by a network namespace, associated network devices and rules. A network namespace provides an isolation of the network devices assigned to this namespace plus related packet filter and routing rules from/against other namespaces/containers.

But very often you may have to deal not only with one container on a host but a whole bunch of containers. Therefore, another objective of experiments with network namespaces is

to study the setup of network communication lines between different containers – i.e. between different network namespaces –
and to study mechanisms for the isolation of the network packet flow between certain containers/namespaces against packets and from communication lines of other containers/namespaces or/and the host.

The second point may appear strange at first sight: Didn’t we learn that the fundamental purpose of (network) namespaces already is isolation? Yes, the isolation of devices, but not the isolation of network packet crossing the network namespace borders. In realistic situations we, in addition, need to establish and at the same time isolate communication paths in between different containers/namespaces and to their environment.

Typically, we have to address a grouping of containers/namespaces in this context:

Different containers on one or several hosts should be able to talk to each other and the Internet – but only if these containers are members of a defined group.
At the same time we may need to isolate the communication occurring within a group of containers/namespaces against the communication flow of containers/namespaces of another group and against communication lines of the host.
Still, we may need to allow namespaces/containers of different groups to use a common NIC to the Internet despite an otherwise isolated operation.

All this requires a confinement of the flow of distinguished network packages along certain paths between network namespaces. Thus, the question comes up how to achieve separated virtual communication circuits between network namespaces already on the L2 level and across possibly involved virtual devices.

Veth devices, VLAN aware Linux bridges (or other types of virtual Linux switches) and VLAN tagging play a key role in simple (virtual) infrastructure approaches to such challenges. Packet filter rules of Linux’ netfilter components additionally support the control of packet flow through such (virtual) infrastructure elements. Note:

The nice thing about network namespaces is that we can study all required basic networking principles easily without setting up LXC containers.

Test scenarios – an overview

I want to outline a collection of interesting scenarios for establishing and isolating communication paths between namespaces/containers. We start with a basic communication line between 2 different network namespaces. By creating more namespaces and veth devices, a VLAN aware bridge and VLAN rules we extend the test scenario’s complexity step by step to cover the questions posed above. See the graphics below.

Everything actually happens on one host. But the elements of lower part (below the horizontal black line) also could be placed on a different host. The RJ45 symbols represent Ethernet interfaces of veth devices. These interfaces, therefore, appear mostly in pairs (as long as we do not define sub interfaces). The colors represent IDs (VIDs) of VLANs. Three standard Linux bridges are involved; on each of these bridges we shall activate VLAN filtering. We shall learn that we can, but do not need to tag packets outside of VLAN filtering bridges – with a few interesting exceptions.

I suggest 10 experiments to perform within the drawn virtual network. We cannot discuss details of all experiments in one blog post; but in the coming posts we shall walk through this graphics in several steps from the top to the bottom and from the left to the right. Each step will be accompanied by experiments.

I use the abbreviation “netns” for “network namespace” below. Note in advance that the processes (shells) underlying the creation of network namespaces in our experiments always establish “uts” namespaces, too. Thus we can assign different hostnames to the basic shell processes – this helps us to distinguish in which network namespace shell we operate by just looking at the prompt of a shell. All the “names” as netns1, netns2, … appearing in the examples below actually are hostnames – and not real network namespace names in the sense of “ip” commands or LXC tools.

I should remark that I did the experiments below not just for fun, but because the use of VLAN tags in environments with Linux bridges are discussed in many Internet articles in a way which I find confusing and misleading. This is partially due to the fact that the extensions of the Linux kernel for VLAN definitions with the help of Linux bridges have reached a stable status only with kernel 3.9 (as far as I know). So many articles before 2014 present ideas which do not fit to the present options. Still, even today, you stumble across discussions which claim that you either do VLANs or bridging – but not both – and if, then only with different bridges for different VLANs. I personally think that today the only reason for such approaches would be performance – but not a strict separation of technologies.

Experiments

I hope the following experiments will provide readers some learning effects and also some fun with veth devices and bridges:

Experiment 1: Connect two namespaces directly
First we shall place the two different Ethernet interfaces of a veth device in two different (unnamed) network namespaces (with hostnames) netns1 and netns2. We assign IP addresses (of the same network class) to the interfaces and check a basic communication between the network namespaces. Simple and effective!

Experiment 2: Connect two namespaces via a bridge in a third namespace
Afterwards we instead connect our two different network namespaces netns1 and netns2 via a Linux bridge “brx” in a third namespace netns3. Note: We would use a separate 3rd namespace also in a scenario with containers to get the the bridge and related firewall and VLAN rules outside the control of the containers. In addition such a separate namespace helps to isolate the host against any communication (and possible attacks) coming from the containers.

Experiment 3: Establish isolated groups of containers
We set up two additional network namespaces (netns4, netns5). We check communication between all four namespaces attached to brx. Then we put netns1 and netns2 into a group (“green”) – and netns4 and netns5 into another group (“rosa”). Communication between member namespaces of a group shall be allowed – but not between namespaces of different groups. Despite the fact that all namespaces are part of the same IP address class! We achieve this on the L2 level by assigning VLAN IDs (VIDs) to the bridge ports to which we attach netns1, netns2, em>netns2 and netns5.

We shall see how “PVIDs” are assigned to a specific port for tagging packets that move into the bridge through this port and how we untag outgoing packets at the very same port. Conclusion: So far, no tagging is required outside the Linux bridge brx for building simple virtual VLANs!

Experiment 4: Tagging outside the bridge?
Although not required we repeat the last experiment with defined subinterfaces of two veth devices (used for netns2 and netns5) – just to check that packet tagging occurs correctly outside the bridge. This is done in preparation for other experiments. But for the isolation of VLAN communication paths inside the bridge only the tagging of packets coming into the bridge through a port is relevant: A packet coming from outside is first untagged and then retagged when moving into the bridge. The reverse untagging and retagging for outgoing packets is done correctly, too – but the tag “color” outside the bridge actually plays no role for the filtered communication paths inside the bridge.

Experiment 5: Connection to a second independent environment – with keeping up namespace grouping
In reality we may have situations in which some containers of a defined group will be placed on different hosts. Can we extend the concept of separating container/namespace groups by VLAN tagging to a different hosts via two bridges? Bridge brx on the first host and a new bridge bry on the second (netns8)? Yes, we can!

In reality we would connect two hosts by Ethernet cards. We simulate this situation in our virtual environment again with a veth interface pair between "netns3" and “netns8“. But
as we absolutely do not want to mix packets of our two groups we now need to tag the packets on their way between the bridges. We shall see how to use subinterfaces of the (veth) Ethernet interfaces to achieve this. Note, that the two resulting communication paths between bridges may potentially lead to loops! We shall deal with this problem, too.

Experiment 6: Two tags on a bridge port? Members of two groups?
Now, we could have containers (namespaces) that should be able to communicate with both groups. Then we would need 2 VIDs on a bridge port for this special container/namespace. We establish netns9 for this test. We shall see that it is no problem to assign two VIDs to a port to filter the differently tagged packets going from the bridge outwards. Nevertheless we run into problems – not because of the assignment of 2 VIDs, but due the fact that we can only assign one PVID to each bridge port. This seems to limit our possibilities to tag incoming packets if we choose its value to be among the VIDS defined already on other ports. Then we cannot direct packages to 2 groups for existing VIDs.

We have to solve this by defining new additional paths inside the bridge for packages coming in through the port for netns9: We assign a PVID to the this port, which is different from all VIDs defined so far. Then we assign additional VIDs with the value of this new PVID to the ports of the members of our existing groups. An interesting question then is: Are the groups still isolated? Is pinging interrupted? And how to stop man-in-the-middle-attacks of netns9?

The answer lies in some firewall rules which must be established on the bridge! In case we use iptables (instead of the more suited ebtables) these rules MUST refer to the ports of the bridge via physdev options and IP addresses. However, ARP packets – coming from netns9 should pass to all interfaces of members of our groups.

Experiment 7: Separate the network groups by different IP address class
If we wanted a total separation of two groups we would also separate them on L3 – i.e we would assign IP addresses of separate IP address networks to the members of the different groups. Will transport across our bridges still work correctly under this condition? It should …. However, netns9 will get a problem then. We shall see that he could still communicate with both groups if we used subinterfaces for his veth interface – and defined two routes for him.

Experiment 8: Connection of container groups and the host to the Internet
Our containers/namespaces of group “green”, which are directly or indirectly attached to bridge brx shall be able reach the Internet. The host itself, too. Normally, you would administer the host via an administration network, to which the host would connect via a specific network card separate from the card used to connect the containers/namespaces to the Internet. However, what can we do, if we only have exactly one Ethernet card available?

Then some extra care is required. There are several possible solutions for an isolation of the host’s traffic to the Internet from the rest of the system. I present one which makes use of what we have learned so far about VLAN tagging. We set up a namespace netns10 with a third bridge “brz“. We apply VLAN tagging in this namespace – inside the bridge, but also outside. Communication to the outside requires routing, too. Still, we need some firewall rules – including the interfaces of the bridge. The bridge can be interpreted as an IN/OUT interface plane to the firewall; there is of course only one firewall although the drawing indicates two sets of rules.

“netns11” just represents the Internet with some routing. We can replace the Ethernet card drawn in netns10 by a veth interface to achieve a connection to netns11; the second interface inside netns11 then represents some host on the Internet. It can be simulated by a tap device. We can check, how signals move to and from this “host”.

Purely academic?

The scenarios discussed above seem to be complicated. Actually, they are not as soon as we get used to the involved elements and rules. But, still the whole setup may seem a bit academic … However, if you think a bit about it, you may find that on a development system for web services you may have

two containers for frontend apache systems with load balancing,
two containers for web service servers,
two or three containers for a MySQL-systems with different types of replication,
one container representing a user system,
one container to simulate OWASP and other attacks on the servers and the user client.

If we want to simulate attacks on a web-service system with such a configuration on one host only, you are not so far from the scenario presented. Modern PC-systems (with a lot of memory) do have the capacity to host a lot of containers – if the load is limited.

Anyway, enough stuff for the coming blog posts … During the posts I shall present the commands to set up the above network. These commands can be used in a script which gets longer with each post. But we start with a simple example – see:

Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – III