Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – VII

In the last articles of our excursion on network namespaces, veth-devices and virtual networking

we studied virtual VLANs a bit. We saw that virtual VLANs can be defined just by applying certain configuration options to Linux bridge ports. In addition, virtual VLANs can be extended over several Linux bridges via veth sub-interfaces OR pure veth trunk connections. These possibilities support already a large variety of options for the configuration of virtual networks (e.g. for a bunch of containers). We discussed some simple illustrative test cases, in which containers were represented by simple network namespaces.

However, so far, four properties characterized our test configurations:

  • All network namespaces (or container hosts) connected to a Linux bridge belonged to exactly one of the involved VLANs.
  • All network namespaces (or container hosts) belonging to the involved VLANs were connected to a Linux bridge via ports which sent out untagged packets on egress from the bridge to the target namespaces and received untagged packets from the namespaces (or container hosts).
  • The VLANs (e.g. VLAN1, VLAN2) were completely defined by PVID/VID definition at Linux bridge ports, only. We eliminated in addition default PVID/VID values. Thus, the VLANs were completely isolated from each other: No host/namespace of a VLAN1 could communicate with a host/namespace belonging to a different VLAN2.
  • Different Linux bridges (which could reside on different hosts) were connected by (virtual or real) cables between trunk ports or sub-interface ports; the cables connecting the bridges transferred packets with different tags. We used this to keep up the isolation of the VLANs against each other even when we extended the VLANs over multiple bridges.

The third point may be good in the sense of security in many applications - but it is also restrictive. The first deficit may be that at least some hosts in a VLAN2 should be able to reach a certain server in VLAN1. This problem can be solved by establishing routing, forwarding and packet filtering outside the bridge. But there may be other requirements ....

New challenges

More interesting may be configurations

  • where you need to set up some containers/namespaces as common members of two ore more VLANs
  • or in which you need to establish network namespaces for gathering network packets from different VLANs and organize a common communication with further networks via specific interfaces.

In future posts of this series, we, therefore, introduce additional network namespaces (representing LXC or Docker containers) to test examples for such configurations. These new namespaces should at least be able to communicate with member namespaces/hosts of different VLANs and transfer packets from multiple VLANs to other network namespaces or routers.

In the present post I walk through some basic considerations of such configurations. For this purpose we restrict the number of involved VLANs to 2 (VLAN1: green tags / VLAN2: pink tags). Each VLAN shall be represented by one example member network namespace (VLAN1: netns1 / VLAN2: netns2). In addition, we introduce a third network namespace netns3, which shall be connected to the VLANs and which should fulfill the following requirements:

  • Requirement 1: netns3 shall be able to receive packets from members of both VLANs and send packets to destination targets in both VLANs. I.e., netns3 must be able to communicate with member systems of both VLANs.
  • Requirement 2: netns3 shall, however, not become a packet forwarder between the VLANs; the VLANs shall remain separated despite the fact that they have a common communication partner netns3.

After all we have learned in this article series, we would, of course, try to establish the connection between members of VLAN1 (represented by netns1) and members of VLAN2 (netns2) to netns3 with the help of an intermediate network namespace netnsX. If required we would equip netnsX with a Linux bridge. Thus, the requirements lead to a typical

"3 point connection problem":
Each of the VLANs is connected to netnX by 2 separate "connectors" (NICs or ports of a Linux bridge inside netnsX). A third "connector" attaches netns3 somehow. Schematically this is shown in the following graphics:

We associate VLAN1 with VLAN packet tags depicted in green color, VLAN2 with packets tags in pink. From "requirement 2" we conclude that we have to be careful with forwarding inside of BOTH netns3 AND netnsX.

Note:
We are not talking about reaching a member of VLAN2 from certain members of VLAN1. We shall touch this VLAN subject, too, but only as a side aspect. In the center of our analysis are instead network namespaces which can talk freely to members of two VLANs and which can receive and work with packets from two VLANs without destroying the communication isolation of members in VLAN1 against members in VLAN2.

What are real world applications for scenarios with network namespaces connected to two or more VLANs?

Two basic applications scenarios are the following:

  • A common administrative network namespace - or container host - for systems in both VLANs. This namespace/container shall operate without allowing for traffic between the VLANs.
  • A system which transfers packets from/to systems in both VLANs via a router to/from the external world or the Internet - without allowing for traffic between the VLANs.

The challenge is to find virtual network configurations for such scenarios. To make it a bit more challenging we assume that both VLANs are defined for systems of the same IP network class. (There is no requirement that limits different VLANs to different IP classes. A VLAN can cover several IP class networks; on the other side two different VLANs can each have members of the same IP class).

There are of course more application scenarios - but the two elementary ones named above cover most of the basic principles. We shall see that - depending on the solution approach - routing, packet filters and even forwarding must be addressed to realize the objectives of a certain scenario.

Ambiguities: Two different classes of packet transfer solutions

In netns3 we need to work with packets arriving from both VLANs. We also need to send back packets to destinations in both VLANs. But, there is a basic ambiguity related to the third connector and the connection line between netnsX and netns3. It is expressed by the following question:

Do we want to or can we afford to exchange tagged packets between netnsX and netns3?

This is not so trivial a question as it may seem to be! The answer depends on whether the network devices or applications inside netns3 know how to deal with and how to direct or transfer tagged packets.

In case we keep up VLAN tags until the inside of netns3 we must either provide a proper termination for the connection interface(s) or be able to pass tagged packets onward. If, however, netns3 does not know how to deal with tagged packets or if it makes no sense to keep up tagging we would rather send untagged packets from netnsX to netns3. One good reason why it may not make sense to keep up tagging could be that the tags would not survive a subsequent routing to the outside world anyway.

Thus we arrive at two rather different classes of connectivity solutions:

Let us first concentrate on termination solutions for tagged packets inside netns3 as depicted on the left side of the upper drawing:

As we have already seen in previous posts it is no problem to keep up tagging on the way from netns1 or netns2 to netns3. We know how to transfer tagged and untagged packets in and out of Linux bridges and thus we can be confident to find a suitable transfer solution based on a bridge inside netnsX. By the help of 2 sub-interfaces of e.g. a virtual veth device we could terminate the network transport properly inside netns3. So, it seems to be easy to make netns3 a member of both VLANs in this first class of connection approach. But, as we shall understand in a minute, we need a little more than just a bridge in netnsX and veth sub-interfaces to get a working configuration ....

A really different situations arises if we needed a configuration as presented on the right side of the graphics. The challenge there is not so much the creation of untagged packets going out of netnsX but the path of VLAN-ignorant packets coming in e.g. from the external world through netns3 and heading for members of either VLAN. Such packets must somehow then be directed to the right VLAN according to the IP address of the target. Such a targeting problem typically requires some kind of routing. So, on first sight a Linux bridge does not seem to be of much help in netnsX as there is no routing on a level 2 device! But, actually, we shall find that a Linux bridge in netnsX can lead to a working solution for untagged packets from/to netns3 - but such a solution comes with a prize.

Approaches with terminated VLAN connections in a common network namespace fit very well to the scenario of a common container host for the administration of systems in multiple VLANs. Solutions which instead use untagged packets entering and leaving netns3, instead fits very well to scenarios where multiple VLANs want to use a common connection (Ethernet card) or a common router to external networks.

Solutions which use packet tags and terminate VLAN traffic inside a common member of multiple VLANs

Let us assume that netns3 shall represent a host for the administration of netns1 in VLAN 1 (green) and netns2 in VLAN 2 (pink). Let us decide to keep up tagging all along the way from netns1 or netns2 to netns3. From the previous examples in this blog post series the following approaches for a netnsX-bridge-configuration look very plausible:

However, if you only configured the bridge, its ports and the veth devices properly and eventually tried pinging from netns1 to netns3 you would fail. (There are articles and questions on the Internet describing problems with such situations...). So, what is missing? The answer is as simple as it is instructive:

VLANs define a closed broadcast environment on TCP/IP network level 2. Why are broadcasts so important? Because we need a working ARP protocol to connect level2 to level 3, and ARP sends broadcast requests for the MAC address of a target, which has a given IP address AND which, hopefully, is a member of the VLAN.

With a proper bridge port configuration such a ARP request packet would travel all along from netns1 to netns3. BUT:
The real challenge is the way back of ARP answering packets - such answering packets must reach their targets before any other communication on level 3 can start to work properly. As we only are in the middle of an initial ARP communication: How can netns3 know where to direct the ARP answering packets to if there are two possible paths back? Without help it cannot. So, the proper answer is:

We need to establish routes inside netns3 when we keep up the separation of the VLANs up until to 2 different termination points inside netns3. These routes for outgoing packets must assign IP-targets located in each of the VLANs to one of the 2 network interfaces (termination points) inside netns3 in a unique way.

This is a trivial point, but often enough people forget this type of routing. Note in addition:
If the different VLANs have members with an IP of one and the same IP class, then you do not differentiate routes in the sense of "network class <=> interface" but in the sense "host IP <=> interface"; such routes must be defined for all members of each VLAN. I shall give examples for corresponding commands in my next blog post of this series.

Forwarding?

As we talk of routing: Do we need forwarding, too? Answer: No, not as long as netns3 is the final target or the origin of packet transport in a given application scenario. Why is this important? Because routing between interfaces connected to bridge ports of different VLANs would establish a communication connection between otherwise separated VLANs.

To enable packets to cross VLAN borders we either have to destroy the separation already on a bridge port level OR we must allow for routing and forwarding between NICs which are located outside the bridge but which are connected to ports of the bridge. E.g., let us assume that the sub-interfaces in netns3 are named veth33.10 (VLAN1 termination) and veth33.20 (VLAN2 termination). If we had not just set up routes like

route add 192.168.5.1 veth33.10
route add 192.168.5.4 veth33.20

but in addition had enabled forwarding with

echo 1 > /proc/sys/net/ipv4/conf/all/forwarding

inside netns3 we would have established a communication line between our two VLANs. Fortunately, in many cases, forwarding is not required in a common member of two VLANs. Most often only route definitions are necessary. In particular, we can set up a host which must perform administrative tasks in both VLANs without creating an open communication line between the VLANs. However, we would have to trust the administrator of netns3 not to enable forwarding. Personally, I would not rely on this; it is more secure to establish port and IP related packet filtering on the bridge inside netnsX. Especially rules in the sense:

Only packets for a certain IP address are allowed to leave the Linux bridge (which establishes the VLANs) across a certain egress port to a certain VLAN member.

Such rules for bridge ports can be set up e-g- with special iptables commands for bridged packets.

Intermediate conclusions for solutions with VLAN termination in a common network namespace

We summarize the results of our theoretical discussion for the first class of solutions:

  • VLAN termination inside a network namespace (or container host), which shall become a common member of several VLANs, can easily be achieved with sub-interfaces of a veth device. The other interface of the veth pair can be attached by sub-interfaces OR as a pure trunk port to a Linux bridge which is connected to the different VLANs or which establishes the VLANs itself by proper port configurations.
  • If we terminate VLANs inside a network namespace or container host, which shall become a member of two or more VLANs, then we need to define proper routes to IP targets behind each of the different VLAN related interfaces. However, we do NOT need to enable forwarding in this namespace or container host.

A three point netnX solution without packet tagging, but with forwarding to a common target network namespace

Now, let us consider solutions of the second class indicated above. If you think about it a bit you may come up with the following basic and simple approach regarding netnsX and netns3:

This solution is solid in the sense that it works on network level 3 and that it makes use of standard routing and forwarding. The required VLAN tagging at each of the lower connection points in netnsX can be achieved by a properly configured sub-interface of a veth device interface. We do not employ any bridge services in netnsX in this approach; packet distribution to VLAN members must be handled in other network namespaces behind the VLAN connection points in netnsX. (We know already how to do this ...).

This simple solution, however, has its prize:

We need to enable forwarding for the transfer of packets from the VLAN connection interfaces (attaching e.g. netns1 and netns2 to netnsX) to the the interface attaching netns3 to netnsX. But, unfortunately, this creates a communication line between VLAN1 and VLAN2, too! To compensate for this we must set up a packet filter, with rules disallowing packets to travel between the VLAN connection points inside netnsX. Furthermore, packets coming via/from netns3 shall only be allowed to pass through exactly one of the lower VLAN interfaces in netnsX if and when the target IP fits to a membership in the VLAN behind the NIC.

There is, by the way a second prize, we have to pay in such a router like solution for the connection of VLANs to an outside world without tags:

Level 3 routing costs a bit more computational time than packet transport on level 2.

But, if you (for whatever reason) only can provide one working Ethernet interface to the outside world, it is a small prize to pay!

Intermediate result:

An intermediate virtual network namespace (or virtual host) netnsX with conventional routing/forwarding AND appropriate packet filter rules on a firewall can be used to control the communication of members of two or more VLANs to the outside world via a third (common) interface attached to netnsX. We do not need to care for VLAN tags beyond this third interface as VLAN tags do not survive forwarding. Further routing, forwarding and required NAT configurations with respect to the Internet can afterward be done inside yet another virtual namespace "netns3" (with a bridge and an attached real Ethernet card) or even beyond netns3 in an external physical router.

A three point netnX solution without packet tagging - but based on a Linux bridge

Now, let us consider how a Linux bridge in netnsX could transfer packets even if we do not tag packets on their way between the bridge and netns3. I.e., if we want connect two VLANs to a VLAN-ignorant network namespace netns3 and a VLAN indifferent world beyond netns3. What is the problem with a configuration as indicated on the right side of the picture on different solution classes?

A port to netns3 which shall emit untagged packets from a VLAN-aware Linux bridge must be configured such

  • that it accepts tagged packets from both VLAN1 and VLAN2 on egress; i.e. we must apply two VID settings (for green and pink tagged pakets).
  • that it sends out packets on egress untagged; i.e. we must configure the port with the flag "untagged".

But VID settings also filter and drop incoming "ingress" packets at a port! E.g. untagged packets from netns3 are dropped on their way into the Linux bridge. See the post Fun with ... – IV for related rules on Linux bridge ports. This is a major problem:

Firstly, because we cannot send any ARP broadcast requests from netns3 to netns1 or netns2. And, equally bad, netns3 cannot answer to any ARP requests which it may receive from members of VLAN1 or VLAN2:

ARP broadcast requests from e.g. netns1 will pass the bridge port to netns3 and arrive there untagged. However, untagged ARP answer packets will not be allowed to enter the bridge at the port for netns3 because they do not fit to the VID settings at this port.

But, can't we use PVID settings? Hmm, remember: Only one PVID setting is allowed at a port! But in our case ARP broadcast and answering packets must be able to reach members of both VLANs! Are we stuck, then? No, a working solution is the following:

In the drawing above we have indicated PVID settings by squares with dotted, colored borders and VID settings by squares with solid borders. The configuration may look strange, but it eliminates the obstacles for ARP packet exchange! And it allows for packet transfer from netns3 to both VLANs.

Actually, the "blue" PVID/VID setting reflects the default PVID/VID settings (VID=1; PVID=1) which come up whenever we create a port in VLAN-aware bridge! Up to now, we have always deleted these default values to guarantee a complete VLAN isolation; but you may already have wondered why this default setting takes place at all. Now, you got a reason.

If you, in addition, take into account that a Linux bridge learns about port-MAC relations and that it - under normal conditions - forwards or filters packets during bridge internal forwarding between ports

  • according to MAC addresses located behind a port
  • AND tags matching VID values at a port,

you may rightfully assume that packets cannot move from VLAN1 to VLAN2 or vice versa under normal operation conditions. We shall test this in an example scenario in one of the coming blog posts.

HOWEVER ....virtual networks with level 2 bridges are endangered areas. The PVID/VID settings of our present bridge based approach weaken the separation between the VLANs significantly.

Security aspects

For all configurations discussed above, we must be careful with netns3: netns3 is in an excellent position to potentially transfer packets between VLAN1 and VLAN2 - either by direct forwarding/routing in some of the above scenarios or by capturing, manipulating and re-directing packets. Secondly, netns3 is in an excellent position for man-in-the-middle-attacks

  • regarding traffic between members of either VLAN
  • or regarding traffic between the VLANs and the outside world beyond netns3.

netns3 can capture, manipulate and redirect any packets passing it. As administrators we should, therefore, have full control over netns3.

In addition: If you ever worked on defense measures against bridge related attack vectors you know

  • that a Linux bridge can be forced into a HUB mode if flooded with wrong or disagreeing MAC information.
  • that man-in-the-middle-attacks are possible by flooding hosts attached to bridges with wrong MAC-IP-information; this leads to manipulated ARP tables at the attacked targets.

These points lead to potential risks especially in the last bridge based solution to our three point problem. Reason: The "blue" PVID/VID settings there eliminate the previously strict separation of the two VLANs for packets which come from netns3 and enter the bridge at a related port. We rely completely on correct entries in the bridge's MAC/port relation table for a safe VLAN separation.

But the bridge could be manipulated from any of the attached container hosts into a HUB mode. This in turn would e.g. allow a member of VLAN1 to see (e.g. answering) packets, which arrive from netns3 (or an origin located beyond netns3) and which are targeted to a member of VLAN2. Such packets may carry enough information for opening other attack vectors.

So, a fundamental conclusion of our discussion is the following:

It is essential that you apply packet filter rules on bridge based solutions that hinder packets to reach targets (containers) with the wrong IP/MAC-relation at egress ports! Such rules can be applied to bridge ports by the various means of Linux netfilter tools.

On a host level this may be a task which becomes relatively difficult if you apply flexible DHCP-based IP assignments to members of the VLANs. But, if you need to choose between flexibility and full control about which attached namespace/container gets which IP (and MAC) and your virtual networks are not too big : go for control - e.g via setup scripts.

Summary and outlook

Theoretically, there are several possibilities to establish virtual communication lines from a network namespace or container to members of multiple virtual VLANs. Solutions with tagged packet transfer require a proper termination inside the common member namespace and the definition of routes. As long as we do not enable forwarding outside the VLAN establishing Linux bridge the VLANs remain separated. Solutions where packets are transferred untagged from the VLANs to a target network namespace require special PVID/VID settings at the bridge port to enable a bidirectional communication. These settings weaken the VLAN separation and underline the importance of packet filter rules on the Linux bridge and for the various bridge ports.

In the next post of this series we will look at commands for setting up a test environment for 2 VLANs with a common communication target. And we will test the considerations discussed above.

In the meantime : Happy New Year - and stay tuned for more adventures with Linux, Linux virtual bridges and network namespaces ...

Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – II

The topics of this blog post series are

  • the basic handling of network namespaces
  • and virtual networking between different network namespaces.

One objective is a better understanding of the mechanisms behind the setup for future (LXC) containers on a host; containers are based on namespaces (see the last post of this series for a mini introduction). The most important Linux namespace for networking is the so called "network namespace".

As explained in the previous article
Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – I
it is interesting and worthwhile to perform network experiments without referring to explicit names for network namespaces. Especially, when you plan to administer LXC containers with libvirt/virt-manager. You then cannot use the standard LXC tools or "ip"-options for explicit network namespace names.

We, therefore, had a look at relevant options for the ip-command and other typical userspace tools. The basic trick was/is to refer to PIDs of the processes originally associated with network namespaces. I discussed commands for listing network namespaces and associated processes. In addition, I showed how one can use shells for entering new or existing unnamed network namespaces. We finalized the first post with the creation of a veth device inside a distinct network namespace.

Advanced experiments - communication scenarios between network namespaces and groups of namespaces (or containers)

Regarding networking a container is represented by a network namespace, associated network devices and rules. A network namespace provides an isolation of the network devices assigned to this namespace plus related packet filter and routing rules from/against other namespaces/containers.

But very often you may have to deal not only with one container on a host but a whole bunch of containers. Therefore, another objective of experiments with network namespaces is

  • to study the setup of network communication lines between different containers - i.e. between different network namespaces -
  • and to study mechanisms for the isolation of the network packet flow between certain containers/namespaces against packets and from communication lines of other containers/namespaces or/and the host.

The second point may appear strange at first sight: Didn't we learn that the fundamental purpose of (network) namespaces already is isolation? Yes, the isolation of devices, but not the isolation of network packet crossing the network namespace borders. In realistic situations we, in addition, need to establish and at the same time isolate communication paths in between different containers/namespaces and to their environment.

Typically, we have to address a grouping of containers/namespaces in this context:

  • Different containers on one or several hosts should be able to talk to each other and the Internet - but only if these containers are members of a defined group.
  • At the same time we may need to isolate the communication occurring within a group of containers/namespaces against the communication flow of containers/namespaces of another group and against communication lines of the host.
  • Still, we may need to allow namespaces/containers of different groups to use a common NIC to the Internet despite an otherwise isolated operation.

All this requires a confinement of the flow of distinguished network packages along certain paths between network namespaces. Thus, the question comes up how to achieve separated virtual communication circuits between network namespaces already on the L2 level and across possibly involved virtual devices.

Veth devices, VLAN aware Linux bridges (or other types of virtual Linux switches) and VLAN tagging play a key role in simple (virtual) infrastructure approaches to such challenges. Packet filter rules of Linux' netfilter components additionally support the control of packet flow through such (virtual) infrastructure elements. Note:

The nice thing about network namespaces is that we can study all required basic networking principles easily without setting up LXC containers.

Test scenarios - an overview

I want to outline a collection of interesting scenarios for establishing and isolating communication paths between namespaces/containers. We start with a basic communication line between 2 different network namespaces. By creating more namespaces and veth devices, a VLAN aware bridge and VLAN rules we extend the test scenario's complexity step by step to cover the questions posed above. See the graphics below.

Everything actually happens on one host. But the elements of lower part (below the horizontal black line) also could be placed on a different host. The RJ45 symbols represent Ethernet interfaces of veth devices. These interfaces, therefore, appear mostly in pairs (as long as we do not define sub interfaces). The colors represent IDs (VIDs) of VLANs. Three standard Linux bridges are involved; on each of these bridges we shall activate VLAN filtering. We shall learn that we can, but do not need to tag packets outside of VLAN filtering bridges - with a few interesting exceptions.

I suggest 10 experiments to perform within the drawn virtual network. We cannot discuss details of all experiments in one blog post; but in the coming posts we shall walk through this graphics in several steps from the top to the bottom and from the left to the right. Each step will be accompanied by experiments.

I use the abbreviation "netns" for "network namespace" below. Note in advance that the processes (shells) underlying the creation of network namespaces in our experiments always establish "uts" namespaces, too. Thus we can assign different hostnames to the basic shell processes - this helps us to distinguish in which network namespace shell we operate by just looking at the prompt of a shell. All the "names" as netns1, netns2, ... appearing in the examples below actually are hostnames - and not real network namespace names in the sense of "ip" commands or LXC tools.

I should remark that I did the experiments below not just for fun, but because the use of VLAN tags in environments with Linux bridges are discussed in many Internet articles in a way which I find confusing and misleading. This is partially due to the fact that the extensions of the Linux kernel for VLAN definitions with the help of Linux bridges have reached a stable status only with kernel 3.9 (as far as I know). So many articles before 2014 present ideas which do not fit to the present options. Still, even today, you stumble across discussions which claim that you either do VLANs or bridging - but not both - and if, then only with different bridges for different VLANs. I personally think that today the only reason for such approaches would be performance - but not a strict separation of technologies.

Experiments

I hope the following experiments will provide readers some learning effects and also some fun with veth devices and bridges:

Experiment 1: Connect two namespaces directly
First we shall place the two different Ethernet interfaces of a veth device in two different (unnamed) network namespaces (with hostnames) netns1 and netns2. We assign IP addresses (of the same network class) to the interfaces and check a basic communication between the network namespaces. Simple and effective!

Experiment 2: Connect two namespaces via a bridge in a third namespace
Afterwards we instead connect our two different network namespaces netns1 and netns2 via a Linux bridge "brx" in a third namespace netns3. Note: We would use a separate 3rd namespace also in a scenario with containers to get the the bridge and related firewall and VLAN rules outside the control of the containers. In addition such a separate namespace helps to isolate the host against any communication (and possible attacks) coming from the containers.

Experiment 3: Establish isolated groups of containers
We set up two additional network namespaces (netns4, netns5). We check communication between all four namespaces attached to brx. Then we put netns1 and netns2 into a group ("green") - and netns4 and netns5 into another group ("rosa"). Communication between member namespaces of a group shall be allowed - but not between namespaces of different groups. Despite the fact that all namespaces are part of the same IP address class! We achieve this on the L2 level by assigning VLAN IDs (VIDs) to the bridge ports to which we attach netns1, netns2, em>netns2 and netns5.

We shall see how "PVIDs" are assigned to a specific port for tagging packets that move into the bridge through this port and how we untag outgoing packets at the very same port. Conclusion: So far, no tagging is required outside the Linux bridge brx for building simple virtual VLANs!

Experiment 4: Tagging outside the bridge?
Although not required we repeat the last experiment with defined subinterfaces of two veth devices (used for netns2 and netns5) - just to check that packet tagging occurs correctly outside the bridge. This is done in preparation for other experiments. But for the isolation of VLAN communication paths inside the bridge only the tagging of packets coming into the bridge through a port is relevant: A packet coming from outside is first untagged and then retagged when moving into the bridge. The reverse untagging and retagging for outgoing packets is done correctly, too - but the tag "color" outside the bridge actually plays no role for the filtered communication paths inside the bridge.

Experiment 5: Connection to a second independent environment - with keeping up namespace grouping
In reality we may have situations in which some containers of a defined group will be placed on different hosts. Can we extend the concept of separating container/namespace groups by VLAN tagging to a different hosts via two bridges? Bridge brx on the first host and a new bridge bry on the second (netns8)? Yes, we can!

In reality we would connect two hosts by Ethernet cards. We simulate this situation in our virtual environment again with a veth interface pair between "netns3" and "netns8". But as we absolutely do not want to mix packets of our two groups we now need to tag the packets on their way between the bridges. We shall see how to use subinterfaces of the (veth) Ethernet interfaces to achieve this. Note, that the two resulting communication paths between bridges may potentially lead to loops! We shall deal with this problem, too.

Experiment 6: Two tags on a bridge port? Members of two groups?
Now, we could have containers (namespaces) that should be able to communicate with both groups. Then we would need 2 VIDs on a bridge port for this special container/namespace. We establish netns9 for this test. We shall see that it is no problem to assign two VIDs to a port to filter the differently tagged packets going from the bridge outwards. Nevertheless we run into problems - not because of the assignment of 2 VIDs, but due the fact that we can only assign one PVID to each bridge port. This seems to limit our possibilities to tag incoming packets if we choose its value to be among the VIDS defined already on other ports. Then we cannot direct packages to 2 groups for existing VIDs.

We have to solve this by defining new additional paths inside the bridge for packages coming in through the port for netns9: We assign a PVID to the this port, which is different from all VIDs defined so far. Then we assign additional VIDs with the value of this new PVID to the ports of the members of our existing groups. An interesting question then is: Are the groups still isolated? Is pinging interrupted? And how to stop man-in-the-middle-attacks of netns9?

The answer lies in some firewall rules which must be established on the bridge! In case we use iptables (instead of the more suited ebtables) these rules MUST refer to the ports of the bridge via physdev options and IP addresses. However, ARP packets - coming from netns9 should pass to all interfaces of members of our groups.

Experiment 7: Separate the network groups by different IP address class
If we wanted a total separation of two groups we would also separate them on L3 - i.e we would assign IP addresses of separate IP address networks to the members of the different groups. Will transport across our bridges still work correctly under this condition? It should .... However, netns9 will get a problem then. We shall see that he could still communicate with both groups if we used subinterfaces for his veth interface - and defined two routes for him.

Experiment 8: Connection of container groups and the host to the Internet
Our containers/namespaces of group "green", which are directly or indirectly attached to bridge brx shall be able reach the Internet. The host itself, too. Normally, you would administer the host via an administration network, to which the host would connect via a specific network card separate from the card used to connect the containers/namespaces to the Internet. However, what can we do, if we only have exactly one Ethernet card available?

Then some extra care is required. There are several possible solutions for an isolation of the host's traffic to the Internet from the rest of the system. I present one which makes use of what we have learned so far about VLAN tagging. We set up a namespace netns10 with a third bridge "brz". We apply VLAN tagging in this namespace - inside the bridge, but also outside. Communication to the outside requires routing, too. Still, we need some firewall rules - including the interfaces of the bridge. The bridge can be interpreted as an IN/OUT interface plane to the firewall; there is of course only one firewall although the drawing indicates two sets of rules.

"netns11" just represents the Internet with some routing. We can replace the Ethernet card drawn in netns10 by a veth interface to achieve a connection to netns11; the second interface inside netns11 then represents some host on the Internet. It can be simulated by a tap device. We can check, how signals move to and from this "host".

Purely academic?

The scenarios discussed above seem to be complicated. Actually, they are not as soon as we get used to the involved elements and rules. But, still the whole setup may seem a bit academic ... However, if you think a bit about it, you may find that on a development system for web services you may have

  • two containers for frontend apache systems with load balancing,
  • two containers for web service servers,
  • two or three containers for a MySQL-systems with different types of replication,
  • one container representing a user system,
  • one container to simulate OWASP and other attacks on the servers and the user client.

If we want to simulate attacks on a web-service system with such a configuration on one host only, you are not so far from the scenario presented. Modern PC-systems (with a lot of memory) do have the capacity to host a lot of containers - if the load is limited.

Anyway, enough stuff for the coming blog posts ... During the posts I shall present the commands to set up the above network. These commands can be used in a script which gets longer with each post. But we start with a simple example - see:

Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – III

Linux bridges – can iptables be used against MiM attacks based on ARP spoofing ? – II

In the last article
Linux bridges – can iptables be used against MiM attacks based on ARP spoofing ? – I
we saw that iptables rules with options like

-m physdev --physdev-in/out device

may help in addition to other netfilter tools, which work on network level 2, to block redirected traffic to a "man in the middle system" on a bridge.

Tools like FWbuilder support the creation of such "physdev"-related rules as soon as bridge devices are marked as bridged in the interface definition process for the firewall host. However, we have also seen that we need to bind IP addresses to certain bridge ports. This in turn requires knowledge about a predictable IP-to-port-configuration. Such a requirement may be an obstacle for using iptables in scenarios with many virtual guests on one or several Linux bridges of a virtualization host as it reduces flexibility for automated IP address assignment.

Before we discuss administrative aspects in a further article, let us expand our iptables rules to a more complex situation:
In this article we discuss a scenario with 2 linked Linux bridges "virbr4" and "virbr6" plus the host attached to "virbr4". This provides us with a virtual infrastructure for which we need to construct a more complex, but more general set of rules in comparison to what we discussed in the last article. We will look at the required rules and their order. Testing of the rules will be done in a forthcoming article.

Two coupled bridges and the host attached via veth devices

You see my virtual bridge setup in the following drawing.

(Note for those who read the article before: I have exchanged the picture to make it consistent with a forthcoming article. The port for kali2 has been renamed to "vk42").

bridge3

The small blue rectangles inside the bridges symbolize standard Linux tap devices - whereas the RJ45 like rectangles symbolize veth devices. veth pairs deliver a convenient way on a Linux system to link bridges and to attach the host to them in a controllable way. As a side effect one can avoid to assign the bridge itself an IP address. See:
Fun with veth devices, Linux virtual bridges, KVM, VMware – attach the host and connect bridges via veth

In the drawing you recognize our bridge "virbr6" and its guests from the last article. The new bridge "virbr4" is only equipped with one guest (kali2); this is sufficient for our test case purposes. Of course, you could have many more guests there in realistic scenarios. Note that attaching certain groups of guests to distinct bridges also occurs in physical reality for a variety of reasons.

Two types of ports

For the rest of this article we call ports as vethb2 on virbr6, vethb1 and vmh1 on virbr4 "border ports" of their respective bridges. Such border ports

  • connect a bridge to another bridges,
  • connect a bridge to the virtualization host
  • or connect the bridge to hosts on external real Ethernet segments.

We remind the reader that it always is the perspective of the bridge that decides about the INcoming or OUTgoing direction of an Ethernet packet via a specific port when we define IN/OUT iptables rules.

Therefore, packets crossing a border port in the IN direction always come from outside the bridge. Packets leaving the port OUTwards may however come from guests of the bridge itself AND from guests outside other border ports of the very same bridge.

In contrast to border ports we shall call a port of a bridge with just one defined guest behind it a "guest port". [In our test case the bridge connection of guests is realized with tap devices because this is required by KVM. In the case of LXC and docker containers we would rather see veth-pairs.]

Multiple bridges on one host - how are the iptables rules probed?

Just from looking at the sketch we see a logical conundrum, which has a significant impact on the setup of iptables rules on a host with multiple bridges in place:

A packet created at one of the ports may leave the bridge where it has been created and travel in and through a neighboring bridge via border ports. But when and how are port related rules tested as the packet travels - lets e.g. say from kali5 to the guest at "vnet0" or to the host at "vmh2"?

  • Bridge for bridge - IN-Port-rules, then OUT-port-rules on the same first bridge => afterwards IN-port-rules/OUT-port-rules again - but this time for the ports of the next entered bridge?
  • Or: iptables rules are checked only once, but globally and for all bridges - with some knowledge of port-MAC-relations of the different bridges included?

If the latter were true just one passed ACCEPT rule on one single bridge port would lead to an overall acceptance of a packet despite the fact that the packet possibly will cross further bridges afterward. Such a behavior would seem unreasonable - but who knows ...

So the basic question is:

After having been checked on a first bridge, having been accepted for leaving one border port of this first bridge and then having entered a second linked bridge via a corresponding border port - will the packet be checked again against all denial and acceptance rules of the second bridge? Will the packet with its transportation attributes be injected again into the whole set of iptables rules?

It is obvious that the answer would have an impact of how we need to define our rules. Especially during port flooding, which we already observed in the tests described in our first article.

Tests of the order of iptables rules probing for ports of multiple bridges on a packet's path

As a first test we do something very simple: we define some iptables rules for ICMP pings formally in the following logical order: We first deny a passage through vethb1 on virbr4 before we allow the packet to pass vethb2 on virbr6:

bridge vibr4 rule 15:  src 192.168.50.14, dest 192.168.50.1 - ICMP IN vethb1 => DENY
bridge vibr6 rule 16:  src 192.168.50.14, dest 192.168.50.1 - ICMP OUT vethb2 => ALLOW

 
and then we test the order of how these rules are passed by logging them.

To avoid any wrong or missing ARP information on the involved guest/host systems and missing MAC-port-relations in the "forward databases" [FWB] of the bridges we first clear any iptables rules and try some pings before sending ping packets. Then we activate the rules and get for ping packets sent from kali4 to the host:

2016-02-27T12:09:33.295145+01:00 mytux kernel: [ 5127.067043] RULE 16 -- ACCEPT IN=virbr6 OUT=virbr6 PHYSIN=vk64 PHYSOUT=vethb2 MAC=96:b0:a9:7c:73:7d:52:54:00:74:60:4a:08:00 SRC=192.168.50.14 DST=192.168.50.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=22031 DF PROTO=ICMP TYPE=8 CODE=0 ID=1711 SEQ=1 
2016-02-27T12:09:33.295158+01:00 mytux kernel: [ 5127.067062] RULE 15 -- DENY IN=virbr4 OUT=virbr4 PHYSIN=vethb1 PHYSOUT=vmh1 MAC=96:b0:a9:7c:73:7d:52:54:00:74:60:4a:08:00 SRC=192.168.50.14 DST=192.168.50.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=22031 DF PROTO=ICMP TYPE=8 CODE=0 ID=1711 SEQ=1 
2016-02-27T12:09:34.302140+01:00 mytux kernel: [ 5128.075040] RULE 16 -- ACCEPT IN=virbr6 OUT=virbr6 PHYSIN=vk64 PHYSOUT=vethb2 MAC=96:b0:a9:7c:73:7d:52:54:00:74:60:4a:08:00 SRC=192.168.50.14 DST=192.168.50.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=22131 DF PROTO=ICMP TYPE=8 CODE=0 ID=1711 SEQ=2 
2016-02-27T12:09:34.302153+01:00 mytux kernel: [ 5128.075056] RULE 15 -- DENY IN=virbr4 OUT=virbr4 PHYSIN=vethb1 PHYSOUT=vmh1 MAC=96:b0:a9:7c:73:7d:52:54:00:74:60:4a:08:00 SRC=192.168.50.14 DST=192.168.50.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=22131 DF PROTO=ICMP TYPE=8 CODE=0 ID=1711 SEQ=2

 
Now we do a reverse test: We allow the incoming direction over port vk64 of virbr6 before we deny the incoming package over vethb1 on virbr4:

bridge vibr6 rule :  src 192.168.50.14, dest 192.168.50.1 - IN vk64 => ALLOW
bridge vibr4 rule :  src 192.168.50.14, dest 192.168.50.1 - IN vethb1 => DENY

 
We get

2016-02-27T14:02:32.821286+01:00 mytux kernel: [11913.962828] RULE 15 -- ACCEPT IN=virbr6 OUT=virbr6 PHYSIN=vk64 PHYSOUT=vethb2 MAC=96:b0:a9:7c:73:7d:52:54:00:74:60:4a:08:00 SRC=192.168.50.14 DST=192.168.50.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=21400 DF PROTO=ICMP TYPE=8 CODE=0 ID=2104 SEQ=1 
2016-02-27T14:02:32.821307+01:00 mytux kernel: [11913.962869] RULE 16 -- DENY IN=virbr4 OUT=virbr4 PHYSIN=vethb1 PHYSOUT=vmh1 MAC=96:b0:a9:7c:73:7d:52:54:00:74:60:4a:08:00 SRC=192.168.50.14 DST=192.168.50.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=21400 DF PROTO=ICMP TYPE=8 CODE=0 ID=2104 SEQ=1 
2016-02-27T14:02:33.820257+01:00 mytux kernel: [11914.962965] RULE 15 -- ACCEPT IN=virbr6 OUT=virbr6 PHYSIN=vk64 PHYSOUT=vethb2 MAC=96:b0:a9:7c:73:7d:52:54:00:74:60:4a:08:00 SRC=192.168.50.14 DST=192.168.50.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=21494 DF PROTO=ICMP TYPE=8 CODE=0 ID=2104 SEQ=2 
2016-02-27T14:02:33.820275+01:00 mytux kernel: [11914.962987] RULE 16 -- DENY IN=virbr4 OUT=virbr4 PHYSIN=vethb1 PHYSOUT=vmh1 MAC=96:b0:a9:7c:73:7d:52:54:00:74:60:4a:08:00 SRC=192.168.50.14 DST=192.168.50.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=21494 DF PROTO=ICMP TYPE=8 CODE=0 ID=2104 SEQ=2 

 
So to our last test:

bridge vibr6 rule :  src 192.168.50.14, dest 192.168.50.1 - IN vk64 => ALLOW
bridge vibr6 rule :  src 192.168.50.14, dest 192.168.50.1 - IN vethb2 => DENY
bridge vibr4 rule :  src 192.168.50.14, dest 192.168.50.1 - IN vethb1 => DENY

 
We get:

2016-02-27T14:26:08.964616+01:00 mytux kernel: [13331.634200] RULE 15 -- ACCEPT IN=virbr6 OUT=virbr6 PHYSIN=vk64 PHYSOUT=vethb2 MAC=96:b0:a9:7c:73:7d:52:54:00:74:60:4a:08:00 SRC=192.168.50.14 DST=192.168.50.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=27122 DF PROTO=ICMP TYPE=8 CODE=0 ID=2218 SEQ=1 
2016-02-27T14:26:08.964633+01:00 mytux kernel: [13331.634232] RULE 17 -- DENY IN=virbr4 OUT=virbr4 PHYSIN=vethb1 PHYSOUT=vmh1 MAC=96:b0:a9:7c:73:7d:52:54:00:74:60:4a:08:00 SRC=192.168.50.14 DST=192.168.50.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=27122 DF PROTO=ICMP TYPE=8 CODE=0 ID=2218 SEQ=1 
2016-02-27T14:26:09.972621+01:00 mytux kernel: [13332.643587] RULE 15 -- ACCEPT IN=virbr6 OUT=virbr6 PHYSIN=vk64 PHYSOUT=vethb2 MAC=96:b0:a9:7c:73:7d:52:54:00:74:60:4a:08:00 SRC=192.168.50.14 DST=192.168.50.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=27347 DF PROTO=ICMP TYPE=8 CODE=0 ID=2218 SEQ=2 
2016-02-27T14:26:09.972637+01:00 mytux kernel: [13332.643605] RULE 17 -- DENY IN=virbr4 OUT=virbr4 PHYSIN=vethb1 PHYSOUT=vmh1 MAC=96:b0:a9:7c:73:7d:52:54:00:74:60:4a:08:00 SRC=192.168.50.14 DST=192.168.50.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=27347 DF PROTO=ICMP TYPE=8 CODE=0 ID=2218 SEQ=2 

 

Intermediate conclusions

We can conclude the following points :

  • A packet is probed per bridge - in the order of how multiple bridges of the host are passed by the packet.
  • An ALLOW rule for a port on one bridge does not overrule a DENY rule for a port on a second bridge which the package may pass on its way.
  • A packet is tested both for IN/OUT conditions of a FORWARD rule for each bridge it passes.
  • If we split IN and OUT rules on a bridge (as we need to do within some tools as FWbuilder) than we must probe the OUT rules first to guarantee the prevention of illegal packet transport.

For the rest of the article we shall follow the same rule we already used as a guide line in the previous article: Our general iptables policy is that a packet will be denied if it is not explicitly accepted by one of the tested rule.

Blocking of border ports in port flooding situations

During our tests in the last article we have seen that port flooding situations may occur - depending among other things on the "setaging" parameter of the bridge and the resulting deletion of stale entries in the "Forward Database" [FWD] of a bridge. Flooding of veth based border ports may be critical for packet transmission and may have to be blocked in some cases:

E.g. it would be unreasonable to transfer packets logically meant for hosts beyond port vmh1 of virbr4 over vethb1/2 to virbr6. We would stop such packets already via OUT DENY rules for vethb1:

bridge vibr4 rule :  src "guest of virbr4", dest "no guest of virbr6" - OUT vethb1 => DENY

 

Rules regarding packets just crossing and passing a bridge

Think about a bridge "virbrx" linked on its both sides to two other bridges "virbr_left" and "virbr_right". In such a scenario packets could arrive at virbrx from bridge virbr_right, enter the intermediate bridge virbrx and leave it at once again for the third bridge virbr_left - because it never was destined to any guest of bridge virbrx.

For such packets we need at least one ACCEPT rule om virbrx - either on the IN direction of the border port of virbrx against virbr_right or on the OUT direction at the border port to virbr_left.

Again, we cling to our policy of the last article:
We define DENY rules for outgoing packets at all ports - also for border ports - and put these DENY rules to the top of the iptables list; then we define DENY rules for ports which are passed in IN direction; only after that we define ACCEPT rules for incoming packets for all ports of a bridge - including border ports - and set these rules below/after the DENIAL rules. This should provide us with a consistent handling also of packets crossing and passing bridges.

Grouping of guests/hosts

From looking at the drawing above we also understand the following point: In order to handle packets at border ports connecting two bridges we have the choice to block packets at either border port - i.e. before the OUTgoing port passage on the first bridge OR before the INcoming port passage on the second. We shall do the blocking at the port in the packets OUTgoing direction. Actually, there would be no harm in setting up reasonable DENY rules for both ports. Then we would safely cover all types of situations.

Anyway - we also find that the rules for border ports require a certain grouping of the guests and hosts:

  • Group 1: Guests attached to the bridge of the border port.
  • Group 2: Guests on the IN side of the border port of a bridge - i.e. the internal side of the bridge. This group includes Group 1 plus external guests of further bridges beyond other border ports of the very same bridge.
  • Group 3:Guests on the outgoing side of a border port - i.e. the side to the next connected bridge. This group contains hosts of Group 1 for the next connected bridge and/or groups of external hosts on the OUT side of all other border ports of the connected bridge.

These groups can easily be formed per bridge by tools like FWbuilder. Without going into details: Note that FWbuilder handles the overall logical OR/AND switching during a negation of multiple groups of hosts correctly when compiling iptables rules.

Overall rules order in case of multiple and connected bridges

Taking into account the results of the first article in this series I suggest the following order of iptables rules:

  • We first define OUT DENY rules for all guest ports of all bridges - with ports grouped by bridges just to keep the overview. These rules are the most important ones to prevent ARP spoofing and a resulting packet redirection.
  • We then define all OUT DENY rules for border ports of all bridges - first grouped by bridges and then per bridge and ports grouped by hosts for the OUTgoing direction. These rules cover also port flooding situations with respect to neighboring linked bridges.
  • We then define IN DENY rules for incoming packets over border ports. These rules may in addition to the previous rules prevent implausible packet transport.
  • Now we apply OUT DENY and IN DENY rules for Ethernet devices on the virtualization host. Such rules must must not be forgotten and can be placed herein the rules' sequence.
  • We then define IN ACCEPT rules on individual guest ports - ports again grouped by bridges.
  • We eventually define IN ACCEPT rules on bridge border ports - note that such rules are required for packets just passing an intermediate bridge without being destined to a guest of the bridge.
  • IN ACCEPT rules for the virtualization hosts's Ethernet interfaces must not be forgotten and can be placed at the end.

How does that look like in FWbuilder?

Before looking at the pics note that we have defined the host groups

  • br6_grp to contain kali3, kali4, kali5,
  • br4_grp to contain only kali2,
  • ext_grp to contain the host and some external web server "lamp".

With this we get the following 7 groups of rules:

full_1

full_2

full_3

full_4

full_5

full_6

full_7

Despite the host grouping : this makes quite a bunch of rules! But yet not uncontrollable ...

Enough for today. I hope that tests being performed in a third article of this series will not proof me wrong. I am confident .... See
Linux bridges – can iptables be used against MiM attacks based on ARP spoofing ? – III