Nvidia’s kernel module nvidia_uvm kills normal operation of the lsns command

Posted on 3. November 2017 by Ralph Mönchmeyer

Whilst preparing a second blog post about network namespaces I stumbled across a really strange bug:

Whenever one starts LibreOffice (with the OpenCL-options activated) the command “lsns” stops producing output. Not funny, as one may consider namespaces and operations investigating the namespace settings for Linux processes security relevant. So, I looked a bit into it …

“strace” led me to processes which were Nvidia related. Actually, on my Linux PC, I use the proprietary Nvidia driver 384.98 for my graphics card. The cause behind the trouble with LibreOffice is that the OpenCL-options of LO trigger loading the kernel module “nvidia_uvm”. This module supports CUDA operations – which seem to be used by some recent versions Linux programs as LO and also GIMP.

However, as soon as “nvidia_uvm” is loaded the command “lsns” stops producing output. This should not happen as “lsns” uses standard kernel interfaces for investigating process properties …???…

I have no remedy for this – and am waiting for an answer from the Nvidia developer forum. But at least in case of LibreOffice you can avoid this irritating bug by unsetting the OpenCL options ….

Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – I

Posted on 30. October 2017 by eremo

Recently, I started writing some blog posts about my first experiences with LXC-containers and libvirt/virt-manager. Whilst gathering knowledge about LXC basics I stumbled across four hurdles for dummies as me, who would like to experiment with network namespaces, veth devices and bridges on the command line and/or in the context of LXC-containers built with virt-manager:

When you use virt-manager/libvirt to set up LXC-containers you are no longer able to use the native LXC commands to deal with these containers. virt-manager/virsh/libvirt directly use the kernel API for cgroups/namespaces and provide their own and specific user interfaces (graphical, virsh, XML configuration files) for the setup of LXC containers and their networks. Not very helpful for quick basic experiments on virtual networking in network namespaces ….
LXC-containers created via virt-manager/virsh/libvirt use unnamed namespaces which are identified by unique inode numbers, but not by explicit names. However, almost all articles on the Internet which try to provide a basic understanding of network namespaces and veth devices explicitly use “ip” command options for named namespaces. This raises the question: How to deal with unnamed network namespaces?
As a beginner you normally do not know how to get a shell for exploring an existing unnamed namespace. Books offer certain options of the “ip”-command – but these again refer to named network namespaces. You may need such a shell – not only for basic experiments, but also as the administrator of the container’s host: there are many situations in which you would like to enter the (network) namespace of a LXC container directly.
When you experiment with complex network structures you may quickly loose the overview over which of the many veth interfaces on your machine is assigned to which (network) namespace.

Objectives and requirements

Unfortunately, even books as “Containerization with LXC” of K. Ivanov did not provide me with the few hints and commands that would have been helpful. I want to close this gap with some blog posts. The simple commands and experiments shown below and in a subsequent article may help others to quickly setup basic network structures for different namespaces – without being dependent on named namespaces, which will not be provided by virt-manager/libvirt. I concentrate on network namespaces here, but some of the things may work for other types of namespace, too.

After a look at some basics, we will create a shell associated with a new unnamed network namespace which will be different from the network namespace of other system processes. Afterwards we will learn how to enter an existing unnamed namespaces by a new shell. A third objective is the attachment of virtual network devices to a network namespace.

In further articles we will use our gathered knowledge to attach veth interfaces of 2 different namespaces to virtual bridges/switches in yet a third namespace, then link the host to the bridge/switch and test communications as well as routing. We shall the extend our virtual networking scenario to isolated groups of namespaces (or containers, if you like) via VLANs. As a side aspect we shall learn how to use a Linux bridge for defining VLANs.

All our experiments will lead to temporary namespaces which can quickly be cretated by scripts and destroyed by killing the basic shell processes associated with them.

Requirements: The kernel should have been compiled with option “CONFIG_NET_NS=y”. We make use of userspace tools that are provided as parts of a RPM or DEB packet named “util-linux” on most Linux distributions.

Namespaces

Some basics first. There are 6 different types of “namespaces” for the isolation of processes or process groups on a Linux system. The different namespace types separate

PID-trees,
the networks,
User-UIDs,
mounts,
inter process communication,
host/domain-names (uts) of process groups

against each each other. Every process on a host is attached to certain namespace (of each type), which it may or may not have in common with another process. Note that the uts-namespace type provides an option to give a certain process an uts-namespace which may get a different hostname than the original host of the process!

“Separation” means: Limitation of the view on the process’ own environment and on the environment of other processes on the system. “Separation” also means a limitation of the control a process can get on processes/environments associated with other namespaces.

Therefore, to isolate LXC containers from other containers and from the host, the container’s processes will typically be assigned to distinct namespaces of most of the 6 types. In addition: The root filesystem of a LXC containers typically resides in a chroot jail.

Three side remarks:

cgroups limit the ressource utilization of process groups on a host. We do not look at cgroups in this article.
Without certain measures the UID namespace of a LXC container will be the same as the namespace of the host. This is e.g. the case for a standard container created with virt-manager. Then root in the container is root on the host. When a container’s basic processes are run with root-privileges of the host we talk of a “privileged container”. Privileged containers pose a potential danger to the host if the container’s environment could be left. There are means to escape chroot jails – and under certain circumstances there are means to cross the borders of a container … and then root is root on the host.
You should be very clear about the fact that a secure isolation of processes and containers on a host depend on other more sophisticated isolation mechanisms beyond namespaces and chroot jails. Typically, SE Linux or Apparmor rules may be required to prevent crossing the line from a namespace attached process to the host environment.

In our network namespace experiments below we normally will not separate the UID namespaces. If you need to do it, you must map a non-privileged UID (> 1000) on UID 0 inside the namespace to be able to perform certain network operations. See the options in the man pages of the commands used below for this mapping.

Network namespaces

The relevant namespace type for the network environment (NICs, bridges etc.) to which a process has access to is the “network namespace”. Below I will sometimes use the abbreviation “net-ns” or simply “netns”.

When you think about it, you will find the above statements on network isolation a bit unclear:

In the real world network packets originate from electronic devices, are transported through cables and are then distributed and redirected by other devices and eventually terminate at yet other electronic devices. So, one may ask: Can a network packet created by a (virtual) network device within a certain namespace cross the namespace border (whatever this may be) at all? Yes, they can:

Network namespaces affect network devices (also virtual ones) and also routing rules coupled to device ports. However, network packets do NOT care about network namespaces on OSI level 2.

To be more precise: Network namespace separation affects network-devices (e.g. Ethernet devices, virtual Linux bridges/switches), IPv4/IPv6 protocol stacks, routing tables, ARP tables, firewalls, /proc/net, /sys/class/net/, QoS policies, ports, port numbers, sockets. But is does not stop an Ethernet packet to reach an Ethernet device in another namespace – as long as the packet can propagate through the virtual network environment at all.

So, now you may ask what virtual means we have available to represent something like cables and Ethernet transport between namespaces? This is one of the purposes veth devices have been invented for! So, we shall study how to bridge different namespaces by the using the 2 Ethernet interfaces of veth devices and by using ports of virtual Linux bridges/switches.

However, regarding container operation you would still want the following to be true for packet filtering:

A fundamental container process, its children and network devices should be confined to devices of a certain “network namespace” because they should not be able to have any direct influence on network devices of other containers or the host.
And: Even if packets move from one network namespace to another you probably want to be able to restrict this traffic in virtual networks as you do in real networks – e.g by packet filter rules (ebtables, iptables) or by VLAN definitions governing ports on virtual bridges/switches.

Many aspects of virtual bridges, filtering, VLANs can be tested already in a simple shell based namespace environment – i.e. without full-fletched containers. See the forthcoming posts for such experiments …

Listing network namespaces on a host

The first thing we need is an overview over active namespaces on a host. For listing namespaces we can use the command “lsns” on a modern Linux system. This command has several options which you may look up in the man pages. Below I show you an excerpt of the output of “lsns” for network namespaces (option “-t net”) on a system where a LXC container was previously started by virt-manager:

mytux:~ # lsns -t net -o NS,TYPE,PATH,NPROCS,PID,PPID,COMMAND,UID,USER 
        NS TYPE PATH              NPROCS   PID  PPID COMMAND                  UID USER  
4026531963 net  /proc/1/ns/net       389     1     0 /usr/lib/systemd/system    0 root   
4026540989 net  /proc/5284/ns/net     21  5284  5282 /sbin/init                 0 root

Actually, I have omitted some more processes with separate namespaces, which are not relevant in our context. So, do not be surprised if you should find more processes with distinct network namespaces on your system.

The “NS” numbers given in the output are so called “namespace identification numbers”. Actually they are unique inode numbers. (For the reader it may be instructive to let “lsns” run for all namespaces of the host – and compare the outputs.)

Obviously, in our case there is some process with PID “5282”, which has provided a special net-ns for the process with PID “5284”:

mytux:~ # ps aux | grep 5282
root      5282  0.0  0.0 161964  8484 ?        Sl   09:58   0:00 /usr/lib64/libvirt/libvirt_lxc --name lxc1 --console 23 --security=apparmor --handshake 26 --veth vnet1

This is the process which started the running LXC container from the virt-manager interface. The process with PID “5284” actually is the “init”-Process of this container – which is limited to the network namespace created for it.

Now let us filter or group namespace and process information in different ways:

Overview over all namespaces associated with a process

This is easy – just use the option “-p” :

mytux:~ # lsns -p 5284 -o NS,TYPE,PATH,NPROCS,PID,PPID,COMMAND,UID,USER 
        NS TYPE  PATH              NPROCS   PID  PPID COMMAND                                                            UID USER
4026531837 user  /proc/1/ns/user      416     1     0 /usr/lib/systemd/systemd --switched-root --system --deserialize 24   0 root
4026540984 mnt   /proc/5284/ns/mnt     20  5284  5282 /sbin/init                                                           0 root
4026540985 uts   /proc/5284/ns/uts     20  5284  5282 /sbin/init       
                                                    0 root
4026540986 ipc   /proc/5284/ns/ipc     20  5284  5282 /sbin/init                                                           0 root
4026540987 pid   /proc/5284/ns/pid     20  5284  5282 /sbin/init                                                           0 root
4026540989 net   /proc/5284/ns/net     21  5284  5282 /sbin/init                                                           0 root

Looking up namespaces for a process in the proc-directory

Another approach for looking up namespaces makes use of the “/proc” directory. E.g. on a different system “mylx“, where a process with PID 4634 is associated with a LXC-container:

mylx:/proc # ls -lai /proc/1/ns
total 0
344372 dr-x--x--x 2 root root 0 Oct  7 11:28 .
  1165 dr-xr-xr-x 9 root root 0 Oct  7 09:34 ..
341734 lrwxrwxrwx 1 root root 0 Oct  7 11:28 ipc -> ipc:[4026531839]
341737 lrwxrwxrwx 1 root root 0 Oct  7 11:28 mnt -> mnt:[4026531840]
344373 lrwxrwxrwx 1 root root 0 Oct  7 11:28 net -> net:[4026531963]
341735 lrwxrwxrwx 1 root root 0 Oct  7 11:28 pid -> pid:[4026531836]
341736 lrwxrwxrwx 1 root root 0 Oct  7 11:28 user -> user:[4026531837]
341733 lrwxrwxrwx 1 root root 0 Oct  7 11:28 uts -> uts:[4026531838]

mylx:/proc # ls -lai /proc/4634/ns
total 0
 38887 dr-x--x--x 2 root root 0 Oct  7 09:36 .
 40573 dr-xr-xr-x 9 root root 0 Oct  7 09:36 ..
341763 lrwxrwxrwx 1 root root 0 Oct  7 11:28 ipc -> ipc:[4026540980]
341765 lrwxrwxrwx 1 root root 0 Oct  7 11:28 mnt -> mnt:[4026540978]
345062 lrwxrwxrwx 1 root root 0 Oct  7 11:28 net -> net:[4026540983]
 38888 lrwxrwxrwx 1 root root 0 Oct  7 09:36 pid -> pid:[4026540981]
341764 lrwxrwxrwx 1 root root 0 Oct  7 11:28 user -> user:[4026531837]
341762 lrwxrwxrwx 1 root root 0 Oct  7 11:28 uts -> uts:[4026540979]

What does this output for 2 different processes tell us? Obviously, the host and the LXC container have different namespaces – with one remarkable exception: the “user namespace”! They are identical. Meaning: Root on the container is root on the host. A typical sign of a “privileged” LXC container and of potential security issues.

List all processes related to a given namespace?

“lsns” does not help us here. Note:

“lsns” only shows you the lowest PID associated with a certain (network) namespace.

So, you have to use the “ps” commands with appropriate filters. The following is from a system, where a LXC container is bound to the network namespace with identification number 4026540989:

mytux:~ # lsns -t net -o NS,TYPE,PATH,NPROCS,PID,PPID,COMMAND,UID,USER
        NS TYPE PATH              NPROCS   PID  PPID COMMAND                                               UID USER
4026531963 net  /proc/1/ns/net       401     1     0 /usr/lib/systemd/systemd --switched-root --system --d   0 root
4026540989 net  /proc/6866/ns/net     20  6866  6864 /sbin/init                                              0 root

mytux:~ #  ps -eo netns,pid,ppid,user,args --sort netns | grep 4026540989
4026531963 16077  4715 root     grep --color=auto 4026540989
4026540989  6866  6864 root     /sbin/init
4026540989  6899  6866 root     /usr/lib/systemd/systemd-journald
4026540989  6922  6866 root     /usr/sbin/ModemManager
4026540989  6925  6866 message+ /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation   
4026540989  6927  6866 tftp     /usr/sbin/nscd
4026540989  6943  6866 root     /usr/lib/wicked/bin/wickedd-dhcp6 --systemd --foreground
4026540989  6945  6866 root     /usr/lib/wicked/bin/wickedd-dhcp4 --systemd --foreground
4026540989  6947  6866 systemd+ avahi-daemon: running [linux.local]
4026540989  6949  6866 root     /usr/lib/wicked/bin/wickedd-auto4 --systemd --foreground
4026540989  6951  6866 avahi-a+ /usr/lib/polkit-1/polkitd --no-debug
n4026540989  6954  6866 root     /usr/lib/systemd/systemd-logind
4026540989  6955  6866 root     login -- root
4026540989  6967  6866 root     /usr/sbin/wickedd --systemd --foreground
4026540989  6975  6866 root     /usr/sbin/wickedd-nanny --systemd --foreground
4026540989  7032  6866 root     /usr/lib/accounts-daemon
4026540989  7353  6866 root     /usr/sbin/cupsd -f
4026540989  7444  6866 root     /usr/lib/postfix/master -w
4026540989  7445  7444 postfix  pickup -l -t fifo -u
4026540989  7446  7444 postfix  qmgr -l -t fifo -u
4026540989  7463  6866 root     /usr/sbin/cron -n
4026540989  7507  6866 root     /usr/lib/systemd/systemd --user
4026540989  7511  7507 root     (sd-pam)
4026540989  7514  6955 root     -bash

If you work a lot with LXC containers it my be worth writing some clever bash or python-script for analyzing the “/proc”-directory with adjustable filters to achieve a customizable overview over processes attached to certain namespaces or containers.

Hint regarding the NS values in the following examples:
The following examples have been performed on different systems or after different start situations of one and the same system. So it makes no sense to compare all NS values between different examples – but only within an example.

Create a shell inside a new network namespace with the “unshare” command …

For some simple experiments it would be helpful if we could create a process (as a shell) with its own network-namespace. For this purpose Linux provides us with the command “unshare” (again with a lot of options, which you should look up).

For starting a new bash with a separate net-ns we use the option “-n“:

mytux:~ # unshare -n /bin/bash 
mytux:~ # lsns -t net
        NS TYPE NPROCS   PID USER  COMMAND
4026531963 net     398     1 root  /usr/lib/systemd/systemd --switched-root --system --deserialize 24   
4026540989 net      21  5284 root  /sbin/init
4026541186 net       2 27970 root  /bin/bash

mytux:~ # ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

mytux:~ # exit
exit

mytux:~ # ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1   
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d7:58:88:ab:cd:ef brd ff:ff:ff:ff:ff:ff
....
....

Obviously, it is not possible to see from the prompt that we have entered a different (network) namespace with the creation of the new shell. We shall take care of this in a moment. For the time being, it may be a good idea to issue commands like

lsns -t net -p 1; lsns -t net -p $$

in the shell opened with “unshare”. However, also our look at the network interfaces proved that the started “bash” was directly associated with a different net-ns than the “parent” bash. In the “unshared” bash only a “lo”-device was provided. When we left the newly created “bash” we at once saw more network devices (namely the devices of the host).

Note: A namespace (of any type) is always associated with at least one process. Whenever we want to create a new namespace for an experiment we have to combine it with a (new) process. During the experiments in this post series we will create new network namespaces together with related simple bash-processes.

And: A namespace lives as long as the associated process (or processes). To keep a specific new network namespace alive for later experiments we put the associated basic bash-process into the background of the host-system.

In real world scenarios the processes related to namespaces are of course more complex than a shell. Examples are containers, browser-processes, etc. This leads us to the question whether we can “enter” an existing namespace somehow (e.g. with a shell) to gather information about it or to manipulate it. We will answer this question in a minute.

Information about host processes from a shell inside a specific network namespace?

You can get information about all processes on a host from any process with a specific network namespace – as long as the PID namespace for this process is not separated from the PID namespace of the host. And as long as we have not separated the UID namespaces: root in a network namespace then is root on the host with all the rights there!

Can a normal unprivileged user use “unshare”, too?

Yes, but his/her UID must be mapped to root inside the new network namespace. For this purpose we can use the option “-r” of the unshare command; see the man pages. Otherwise: Not without certain measures – e.g. on the sudo side. (And think about security when using sudo directives. The links at the end of the article may give you some ideas about some risks.)

You may try the following commands (here executed on a freshly started system):

myself@mytux:~> unshare -n -r /bin/bash 
mytux:~ # lsns -t net -t user
        NS TYPE  NPROCS   PID USER COMMAND
4026540842 user       2  6574 root /bin/bash
4026540846 net        2  6574 root /bin/bash
mytux:~ #

Note the change of the prompt as the shell starts inside the new network namespace! And “lsns” does not give us any information on the NS numbers for net and user namespaces of normal host processes!

However, on another host terminal the “real” root of the host gets:

mytux:~ # lsns -t net -t user 
        NS TYPE  NPROCS   PID USER   COMMAND
4026531837 user     382     1 root   /usr/lib/systemd/systemd --switched-root --system --deserialize 24   
4026531963 net      380     1 root   /usr/lib/systemd/systemd --switched-root --system --deserialize 24   
4026540842 user       1  6574 myself /bin/bash
4026540846 net        1  6574 myself /bin/bash

There, we see that the user namespaces of the unshared shell and other host processes really are different.

Open a shell for a new named network namespace

The “unshare” command does not care about “named” network namespaces. So, for the sake of completeness: If you like to or must experiment with named network namespaces you may want to use the “ip” command with appropriate options, e.g.:

mytux:~ # ip netns add mynetns1 
mytux:~ # ip netns exec mynetns1 bash   
mytux:~ # lsns -o NS -t net -p $$
        NS
4026541079
mytux:~ # exit 
mytux:~ # lsns -o NS -t net -p $$
        NS
4026531963
mytux:~ #

“mynetns1” in the example is the name that I gave to my newly created named network namespace.

How to open a shell for an already existing network namespace? Use “nsenter” …

Regarding processes with their specific namespaces or LXC containers: How can we open a shell that is assigned to the same network namespace as a specific process? This is what the command “nsenter” is good for. For our purposes the options “-t” and “-n” are relevant (see the man pages). In the following example we first create a bash shell (PID 15150) with a new network namespace and move its process in the background. Then we open a new bash in the foreground (PID 15180) and attach this bash shell to the namespace of the process with PID 15150:

mylx:~ # unshare -n /bin/bash &
[1] 15150
mylx:~ # lsns -t net 
        NS TYPE NPROCS   PID USER  COMMAND
4026531963 net     379     1 root  /usr/lib/systemd/systemd --switched-root --system --deserialize 24   
4026540983 net      23  4634 root  /sbin/init
4026541170 net       1 15150 root  /bin/bash

[1]+  Stopped                 unshare -n /bin/bash
mylx:~ # nsenter -t 15150 -n /bin/bash
mylx:~ # ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1   
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
mylx:~ # echo $$
15180
mylx:~ # lsns -t net -p $$
        NS TYPE NPROCS   PID USER COMMAND
4026541170 net       3 15150 root /bin/bash
mylx:~ #

Note, again, that “lsns” only gives you the lowest process number that opened a namespace. Actually, we are in a different bash with PID “15180”. If you want to see all process using the same network namespace you may use :

mylx:~ # echo $$
15180
mylx:~ # ps -eo pid,user,netns,args --sort user | grep 4026541170  
15150 root     4026541170 /bin/bash
15180 root     4026541170 /bin/bash
16284 root     4026541170 ps -eo pid,user,netns,
args --sort user
16285 root     4026541170 grep --color=auto 4026541170

Note that the shell created by nsenter is different from the shell-process we created (with unshare) as the bearing process of our namespace.

In the same way you can create a shell with nsenter to explore the network namespace of a running LXC container. Let us try this for an existing LXC container on system “mylx” with PID 4634 (see above: 4026540983 net 23 4634 root /sbin/init).

mylx:~ # nsenter -t 4634 -n /bin/bash
mylx:~ # ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1   
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
13: eth0@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000   
    link/ether 00:16:3e:a3:22:b8 brd ff:ff:ff:ff:ff:ff link-netnsid 0
mylx:~ # exit
exit

Obviously, an ethernet device eth0 exists in this container. Actually, it is an interface of a veth device with a peer interface “if14”; see below.

Change the hostname part of a shell’s prompt in a separate network namespace

We saw that the prompt of a shell in a separate network namespace normally does not indicate anything about the namespace environment. How can we change this? We need 2 steps to achieve this:

We open a shell in the background not only for a separate network namespace but also for a different uts namespace. Then any changes to the hostname inside the uts namespace for the running background process will have no impact on the host.
The “nsenter” command does not only work for shells but for any reasonable command. Therefore, we can also apply it for the command “hostname”.

Now, before we enter the separate namespaces of the process with yet another shell we can first change the hostname in the newly created uts namespace:

mytux:~ # unshare --net --uts /bin/bash &
[1] 25512
mytux:~ # nsenter -t 25512 -u hostname netns1

[1]+  Stopped                 unshare --net --uts /bin/bash   
mytux:~ # echo $$
20334
mytux:~ # nsenter -t 25512 -u -n /bin/bash 
netns1:~ #
netns1:~ # lsns -t net -t uts -p $$
        NS TYPE NPROCS   PID USER COMMAND
4026540975 uts       3 25512 root /bin/bash
4026540977 net       3 25512 root /bin/bash
netns1:~ # exit
mytux:~ # hostname
mytux

Note the “-u” in the command line where we set the hostname! Note further the change of the hostname in the prompt! In more complex scenarios, this little trick may help you to keep an overview over which namespace we are currently working in.

veth-devices

For container technology “veth” devices are of special importance. A veth device has two associated Ethernet interfaces – so called “peer” interfaces. One can imagine these interfaces like linked by a cable on OSI level 2 – a packet arriving at one interface gets available at the other interface, too. Even if one of the interfaces has no IP address assigned.

This feature is handy when we e.g. need to connect a host or a virtualized guest to an IP-less bridge. Or we can use veth-devices to uplink several bridges to one another. See a former blog post
Fun with veth devices, Linux virtual bridges, KVM, VMware – attach the host and connect bridges via veth
about these possibilities.

As a first trial we will assign the veth device and both its interfaces to one and the same network namespace. Most articles and books show you how to achieve this by the use of the “ip” command with an option for a “named” namespace. In most cases the “ip” command would have been used to create a named net-ns by something like

ip netns add NAME

where NAME is the name we explicitly give to the added network namespace. When such a named net-ns exists we can assign an Ethernet interface named “ethx” to the net-ns by:

ip link set ethx netns NAME

However, in all our previous statements no NAME for a network namespace has been used so far. So, how to achieve something similar for unnamed network namespaces? A look into the man pages helps: The “ip” command allows the introduction of a PID together with the option parameter “netns” at least for the variant “ip link set”. Does this work for veth devices and the command “ip link add”, too? And does it work for both Ethernet interfaces?

In the example discussed above we had a namespace 4026541170 of process with PID 15180. We open a bash shell on our host mylx, where PID 15150 still runs in the background, and :

mylx:~ # echo $$
27977
mylx:~ # lsns -t net
        NS TYPE NPROCS   PID USER  COMMAND   
4026531963 net     393     1 root  /usr/lib/systemd/systemd --switched-root --system --deserialize 24   
4026540983 net      23  4634 root  /sbin/init
4026541170 net       1 15150 root  /bin/bash
mylx:~ # ip link add veth1 netns 15150 type veth peer name veth2 netns 15150
mylx:~ # nsenter -t 15150 -n /bin/bash
mylx:~ # echo $$
28350
mylx:~ # ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: veth2@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000   
    link/ether 8e:a0:79:28:ae:12 brd ff:ff:ff:ff:ff:ff
3: veth1@veth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000   
    link/ether fa:1e:2c:e3:00:8f brd ff:ff:ff:ff:ff:ff
mylx:~ #

Success! Obviously, we have managed to create a veth device with both its 2 interfaces inside the network namespace associated with our background process of PID 15150.

The Ethernet interfaces are DOWN – but this was to be expected. So far, so good. Of course it would be more interesting to position the first veth interface in one network namespace and the second interface in another network namespace. This would allow network packets from a container to cross the border of the container’s namespace into an external one. Topics for the next articles …

Summary and outlook on further posts

Enough for today. We have seen how we can list (network) namespaces and associated processes. We are able to create shells together with and inside in a new network namespace. And we can open a shell that can be attached to an already existing network namespace. All without using a “NAME” of the network namespace! We have also shown how a veth device can be added to a specific network namespace. We have a set of tools now, which we can use in more complicated virtual network experiments.

In the next post

Fun with veth-devices, Linux bridges and VLANs in unnamed Linux network namespaces – II

I shall present a virtual network environment for several interesting experiments with network namespaces – or containers, if you like. Further articles will discuss such experiments step by setp.

Addendum, 25.03.2024: I have started a new series about virtual networking experiments concerning veths with VLAN-interfaces, namespaces, routes, ARP, ICMP and security aspects. If you are interested in these topics a look at the posts in the new series may give you some more information on specific topics.

Links

Introduction into network namespaces
http://www.linux-magazin.de/ Ausgaben/ 2016/06/ Network-Namespaces

Using unshare without root-privileges
https://unix.stackexchange.com/ questions/ 252714/ is-it-possible-to-run-unshare-n-program-as-an-unprivileged-user
https://bbs.archlinux.org/viewtopic.php?id=205240
https://blog.mister-muffin.de/ 2015/10/25/ unshare-without-superuser-privileges/

LXC-Container – Fehlermeldungen mit Bezug zum File memory.memsw.usage_in_bytes

Posted on 22. October 2017 by Ralph Mönchmeyer

Wenn man mit LXC-Containern experimentiert, kann es sein, dass man über Fehlermeldungen im System Log File stolpert, die etwa wie folgt aussehen:

2017-10-22T16:25:08.090990+02:00 mytux libvirtd[2795]: Failed to open file '/sys/fs/cgroup/memory/machine.slice/machine-lxc\x2d25976\x2dlxcx.scope/memory.memsw.usage_in_bytes': No such file or directory
2017-10-22T16:25:08.091288+02:00 mytux libvirtd[2795]: Unable to read from '/sys/fs/cgroup/memory/machine.slice/machine-lxc\x2d25976\x2dlxcx.scope/memory.memsw.usage_in_bytes': No such file or directory

In meinem Fall (Opensuse Leap 42.2) wurde eine solche Meldung alle 3 Sekunden für jeden laufenden LXC-Container generiert. Nicht sehr erfreulich …

Die Ursache dieser Meldungen liegt darin, dass die “memsw”-Dateien Information über die Nutzung des Swaps durch Prozesse sammeln, die über cgroup-Festlegungen limitiert sein mögen. Wenn aber der Kernel des Hosts ohne die Option CONFIG_MEMCG_SWAP_ENABLED kompiliert wurde, werden die obigen Fehler ausgegeben. Dies ist etwa für den Default-Kernel von Opensuse Leap 42.2 der “swapaccount=1” mitgibt. Das muss man halt in die Grub2-Konfiguration eintragen. Unter Opensuse mit YaST …

LXC-Betriebssystem-Container mittels virt-manager unter Opensuse Leap 42.2 – I

Posted on 1. October 2017 by Ralph Mönchmeyer

Das Thema “Container” ist wegen der guten Ressourcenauslastung auch für KMUs von Interesse. Will man sich dem Thema “Container” unter Linux annähern, ohne sich gleich mit Docker zu befassen, kann man alternativ zu LXC greifen. Dafür sprechen sogar ein paar gute Gründe:

LXC ist erprobt; viele Distributionen bringen LXC mit.
Ubuntu setzt mit der LXD-Umgebung explizit auf dieses Container-Format.
LXC (und LXD) setzen mehr auf Container für vollständige Betriebssysteme, Docker dagegen auf modulare Container für Einzelapplikationen. Wären da nicht diverse Sicherheitsthemen, so könnten performante, ressourcenschonende LXC-Container virtuelle (Linux-) Maschinen unter KVM ersetzen. Die potentielle Packungsdichte von LXC-Containern auf einem Host ist deutlich höher als die von KVM/qemu-VMs.
LXC erlaubt einen unpriviligierten Container-Betrieb – der Gewinn an Sicherheit erfordert allerdings Mühe im Container-Setup. Aber immerhin …
Auch in kleinen Umgebungen will man Container neben virtuellen KVM-Maschinen gerne grafisch verwalten und überwachen. Als handliche und vor allem erschwingliche Werkzeuge bleiben da eigentlich nur “virtlib/virt-manager” (rudimentäre Minimalausstattung) und “Proxmox” für den professionellen Einsatz übrig. Beide Tools unterstützen LXC-Container und erlauben zudem das Anlegen mehr oder weniger komplexer Storage-Konfigurationen.
Läuft der gewünschte Container erst mal, darf man sich über eine sehr gute Performance freuen (zumindest im kleinskaligen Betriebsumfeld).

Am letzten Wochenende wollte ich einen LXC-Container unter Opensuse Leap einfach mal ein wenig austesten. Für das Erstellen wollte ich das Gespann “libvirt/virt-manager” heranziehen. Die Anlage eines sog. “Betriebssystem-Containers” erwies sich damit unter Leap 42.2 aber als kleineres Abenteuer – selbst bei einer rein lokalen Storage-Konfiguration. Dieser und der nachfolgende Artikel sollen Einsteigern helfen, die ersten Hürden zu überwinden.

Begrifflichkeiten und eine Warnung

Ich gehe nachfolgend aus Platzgründen nicht weiter auf die grundlegenden Unterschiede zwischen Containern und voll- oder paravirtualisierten “Virtuellen Maschinen” [VMs] ein. Wer sich Container-Technologie im Schnelldurchgang reinziehen will, sei auf den Foliensatz https://de.slideshare.net/jpetazzo/anatomy-of-a-container-namespaces-cgroups-some-filesystem-magic-linuxcon verwiesen.

Für den vorliegenden Blog-Beitrag ist die Vorstellung hilfreich, ein Container sei so etwas wie eine aufgeblasene chroot-Umgebung, die Prozesse in einer vom restlichen Betriebssystem “isolierten” Umgebung bündelt.

Auch die Suse-Dokumentationen zur Anlage von Containern
https://www.suse.com/documentation/sles-12/singlehtml/book_virt/book_virt.html#sec.lxc.setup.container.app
https://doc.opensuse.org/documentation/leap/virtualization/book.virt_color_en.pdf
merken an:

Conceptually, containers can be seen as an improved chroot technique. The difference is that a chroot environment separates only the file system, whereas containers go further and provide resource management and control via cgroups.”

Die vom Container-Nutzer ausgelösten Prozesse laufen direkt unter dem Kernel des Hosts – und nicht wie in einer VM unter einem eigenem Kernel für die emulierte HW). Für die Container-
Isolation sorgen unter Linux cgroups und vor allem strong>namespaces (seit Kernelversion 3.8):

“cgroups” bündeln Prozesse in Bezug auf die limitierte Nutzung von System-Ressourcen auf dem Host.
“namespaces” sorgen hingegen dafür, dass Prozesse und ihre Kinder eine separate Sicht und einen abgegrenzten Zugriff auf ihre jeweilige Betriebssystem-Umgebung erhalten.

Vor allem “namespaces” sorgen also für eine (hoffentlich !) hinreichende Isolation der Container gegeneinander und gegenüber dem Host. Aber: Container lassen sich ohne zusätzliche Hilfsmittel bei weitem nicht so gut vom Host isolieren wie etwa virtuelle Maschinen unter KVM/qemu; resultierende Sicherheitsthemen werde ich in kommenden Blog-Beiträgen immer mal wieder aufgreifen. Ich zitiere einen der LXC-Vorkämpfer aus einem frühen Artikel (https://www.berrange.com/posts/2011/09/27/getting-started-with-lxc-using-libvirt/):

“Repeat after me “LXC is not yet secure. If I want real security I will use KVM”.

Das gilt mit Einschränkungen wohl auch heute noch; wer den professionellen Einsatz von LXC-Containern erwägt, muss sich mit SE-Linux- oder Apparmor-Profilen sowie ggf. Kernel-Capabilities auseinandersetzen. Ferner empfehle ich, einen Einsatz von LXC-Containern unter KVM-VMs in Betracht zu ziehen, wobei letztere dafür mit virtio-Treibern beschleunigt werden müssen.

Unterschiede zwischen nativen lxc-Tools und libvirt-lxc

Will man mit virt-manager und LXC-Containern arbeiten, begibt man sich auf ein nicht unumstrittenes Gleis. Es gibt nämlich zwei nicht kompatible Toolsets zum Aufsetzen und Verwalten von LXC-Containern: lxc und libvirt-xlc. “lxc” ist das native Toolset und besteht aus einer Reihe sehr flexibel einsetzbaren CLI-Kommandos. “lxc” liegt in den Version 1.1 bzw. 2.0 unter vielen aktuellen Linux-Distributionen vor (Leap 42.2 =>LXC V1.1, Debian Stretch => LXC V2.0). Container werden unter “lxc” über (vorgefertigte) “Configuration Files” definiert.

Libvirt greift dagegen auf XML-Files zur Definition von Containern zurück – also genau wie bei KVM/qemu-VMs. Libvirt nutzt für das LXC-Setup ferner direkte Schnittstellen zum Kernel und rankt darum eigene, unabhängige Tools, die mit den nativen LXC-Tools wenig gemein haben. Man kann native lxc-Kommandos meines Wissens deshalb auch nicht zur Verwaltung der Container verwenden, die mit libvirt-lxc erzeugt wurden. Dafür kann man nach der Einrichtung von libvirt-LXC-Containern aber auf das grafische “virt-manager”-Interface zum Starten, Stoppen und Manipulieren dieser Container zurückgreifen. Auf der Kommandozeile kann alternativ das “virsh”-Kommando mit lxc-bezogenen Optionen eingesetzt werden.

Beide Toolsets nehmen es einem ab, sich um die Konfiguration von cgroups (Ressourcenmanagement) und namespaces (Abschottung) im Detail und händisch kümmern zu müssen. Sie tun dies aber auf unterschiedliche Weise – und stellen zugehörige Informationen keineswegs in kompatibler Weise bereit. In diesem Artikel nutze ich die “libvirt-lxc”-Tools. Nicht zuletzt, weil ich selbst von KVM her komme. Aber – ich sage es vorweg:

Die libvirt-lxc-Tools lassen sich aus meiner Sicht nicht wirklich bruchfrei nutzen.

Der eine oder andere Nutzer wird daher mit wachsender Erfahrung zu den nativen Tools wechseln. Im Internet sind zudem sehr viel mehr Informationen zur Verwaltung von LXC-Containern mit den nativen Tools verfügbar als für die “libvirt-lxc”-Tools. Interessant ist ein LXC-Container-Setup mit libvirt/virt-manager aber allemal.

Ziel: LXC-Betriebssystem-Container

Unter “virt-manager” werden 2 LXC-Container-Typen unterschieden:

LXC Distribution
Container für eine Umgebung mit einer Root-Filesystem-Struktur für eine Distribution. Solche Container eigenen sich m.E. als Ersatz für KVM/qemu-VMs.
LXC Application Container für einzelne Anwendungen. Ein solcher Container bietet nur eine temporäre, isolierte Umgebung für einzelnen Anwendungen – und wird mit dem Schließen der Anwendung auch beendet. Solche Container sind eher mit Docker-Containern vergleichbar.

App-Containern fehlen bei ohne eigene Zusatzmaßnahmen wesentliche Elemente eines für die jeweilige Applikation funktionierenden Filesystems. Das ist ohne vorgefertigte Schablonen Bastelarbeit. Man muss sich je nach Applikation u.a. um Anteile von /etc, /sys, /proc/, /dev, /dev/shm und relevante Bibliotheks-Bereiche wie des Filesystems kümmern. Siehe die Links unten.

Das verführt dazu, am Anfang seiner Lernkurve einen “Betriebssystem-Container” zu erstellen. Für das Filesystem kann man dann nämlich eine Basisimplementation der jeweiligen Distribution heranziehen. Für Debian ist hier “debootstrap” zu nennen.

Bzgl. der praktischen Anlage notwendiger Verzeichnisinhalte erleichtern einem unter dem RPM-basierten Opensuse hingegen sog. Paket-Patterns das Leben. Für ein minimalistisch aufgesetztes Filesystem-Setup greift man hier auf das sog. “base”-Pattern zurück.

Opensuses Script für den Root-Filesystem-Unterbau

Bzgl. der Installation des “base”-Patterns für Container hat die OpenSuSE-Mannschaft es sich in letzter Zeit allerdings etwas zu einfach gemacht: Die aktuelle Doku verweist nämlich auf ein Skript “/usr/bin/virt-create-rootfs”.

Diese Script läuft ohne Zusatzwissen und ohne Modifikationen unter Leap 42.2/42.3 aber ins Leere. In Blick in in den Code zeigt: Standardmäßig wird nur Opensuse 13.1 unterstützt; die zugehörigen Repositories, die bemüht werden, sind aber längst obsolet. Ich zeige nachfolgend relevante Ausschnitte aus dem Script:

#!/bin/sh
set -e
....
....
function print_help
{
cat << EOF
virt-create-rootfs --root /path/to/rootfs [ARGS]

Create a new root file system to use for distribution containers.

ARGUMENTS

    -h, --help          print this help and exit
    -r, --root          path where to create the root FS
    -d, --distro        distribution to install
    -a, --arch          target architecture
    -u, --url           URL of the registration server
    -c, --regcode       registration code for the product
    -p, --root-pass     the root password to set in the root FS
    --dry-run           don't actually run it
EOF
}

ARCH=$(uname -i)
ROOT=
DISTRO=
URL=
REG_CODE=
ROOT_PASS=
DRY_RUN=
.....
.....
function call_zypper
{
    $RUN zypper --root "$ROOT" $*
}

function install_sle
{
    PRODUCT="$1"
    VERSION="$2"

    case "$VERSION" in
        12.0)
            # Transform into zypper internal version scheme
            VERSION="12"
            ;;
        *)
            fail "Unhandled SLE version: $VERSION"
            ;;
    esac
....
....

}

case "$DISTRO" in
    SLED-*)
        install_sle "SLED" "${DISTRO:5}"
        ;;
    SLED-* | SLES-*)
        install_sle "SLES" "${DISTRO:5}"
        ;;

    openSUSE-*)
        VERSION=${DISTRO:9}
        case "$VERSION" in
            13.1)
                REPO="http://download.opensuse.org/distribution/13.1/repo/oss/"
                UPDATE_REPO="http://download.opensuse.org/update/13.1/"
                ;;
            *)
                fail "Unhandled openSUSE version: $VERSION"
                ;;
        esac
        call_zypper ar "$REPO" "openSUSE"
        call_zypper ar "$UPDATE_REPO" "openSUSE udpate"
        call_
zypper in --no-recommends -t pattern base
        ;;
esac

if test "$DRY_RUN" != "yes"; then
    echo "pts/0" >> "$ROOT/etc/securetty"
    chroot "$ROOT" /usr/bin/passwd
fi

Man muss das betreffende Skript also anpassen oder die notwendigen Schritte manuell durchführen.

Modifikation des Scripts

Lässt man mal den Zugriff auf SLES oder SLED-Repositories außer Acht, so leistet das Script im Grunde nur Folgendes:

Ein Verzeichnis des Hosts als künftiges root-Verzeichnis des Containers aus einem Übergabeparameter auslesen.
Das grundlegende “oss”-Repository sowie das zugeh. Update-Repository für eine Distribution festlegen.
“zypper” in der Form “zypper –root PATH_TO_CONTAINER_ROOT in –no-recommends -t pattern” aufrufen.
Das root-Password für den chroot-Bereich ändern.

Entscheidend ist im Script-Ablauf die “–root” Option von zypper; die sorgt dafür, dass die Verzeichnisse/Files unterhalb des definierten chroot-Verzeichnisses (PATH_TO_CONTAINER_ROOT) gemäß des definierten Paket-Patterns gefüllt werden. Wir ergänzen das Script deshalb versuchsweise in der Abfrage für die Distribution und ändern die Statements in den Zeilen 205 – 207 ab:

....
.....
case "$DISTRO" in
    SLED-*)
        install_sle "SLED" "${DISTRO:5}"
        ;;
    SLED-* | SLES-*)
        install_sle "SLES" "${DISTRO:5}"
        ;;

    openSUSE-*)
        VERSION=${DISTRO:9}
        case "$VERSION" in
            13.1)
                REPO="http://download.opensuse.org/distribution/13.1/repo/oss/"
                UPDATE_REPO="http://download.opensuse.org/update/13.1/"
                ;;
            42.2)
                REPO="http://download.opensuse.org/distribution/leap/42.2/repo/oss/"
                UPDATE_REPO="http://download.opensuse.org/update/leap/42.2/"
                ;;
            *)
                fail "Unhandled openSUSE version: $VERSION"
                ;;
        esac
        echo "HERE0"
        call_zypper ar "$REPO" "openSUSE"
        echo "HERE1"
        call_zypper ar "$UPDATE_REPO" "openSUSE-udpate"
        echo "HERE2"
        call_zypper in --no-recommends -t pattern enhanced_base
        call_zypper in --no-recommends -t pattern  file_server
        #call_zypper in --no-recommends -t pattern base
        #call_zypper in --no-recommends -t pattern "patterns-openSUSE-enhanced_base"
        ;;
esac

Um wenigstens ein paar “Convenience”-Pakte zusätzlich zu erhalten, habe ich das “enhanced_base”-Pattern gewählt.

Anlegen des Filesystems

Für meinen Test nutze ich ein LVM-Volume auf einem SSD-Array; das Volume wird dabei auf das Verzeichnis “/kvm/” gemountet. Das Filesystem meines Containers soll unter “/kvm/lxc2/ zu liegen kommen. Also Verzeichnis angelegt und dann unser umgeschriebenes Script starten:

        
mytux:~ # virt-create-rootfs -r /kvm/lxc2/ -d openSUSE-42.2 
Adding repository 'Leap-42.2-Oss' .......................................................[done]
Repository 'Leap-42.2-Oss' successfully added

URI         : http://download.opensuse.org/distribution/leap/42.2/repo/oss/
Enabled     : Yes                                                          
GPG Check   : Yes                                                          
Autorefresh : No                                                           
Priority    : 99 (default priority)                                        

Repository priorities are without effect. All enabled repositories share the same priority.
r
Adding repository 'Leap-42.2-Update' ....................................................[done]
Repository 'Leap-42.2-Update' successfully added

URI         : http://download.opensuse.org/update/leap/42.2/oss/
Enabled     : Yes                                               
GPG Check   : Yes                                               
Autorefresh : No                                                
Priority    : 99 (default priority)                             

Repository priorities are without effect. All enabled repositories share the same priority.

New repository or package signing key received:

  Repository:       Leap-42.2-Oss                                       
  Key Name:         openSUSE Project Signing Key <opensuse@opensuse.org>
  Key Fingerprint:  22C07BA5 34178CD0 2EFE22AA B88B2FD4 3DBDC284        
  Key Created:      Mon May  5 10:37:40 2014                            
  Key Expires:      Thu May  2 10:37:40 2024                            
  Rpm Name:         gpg-pubkey-3dbdc284-53674dd4                        


Do you want to reject the key, trust temporarily, or trust always? [r/t/a/? shows all options] (r): a
Building repository 'Leap-42.2-Oss' cache ...............................................[done]
Building repository 'Leap-42.2-Update' cache ............................................[done]
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 163 NEW packages are going to be installed:
  aaa_base bash binutils coreutils cpio cracklib cracklib-dict-full dbus-1 dbus-1-x11
  device-mapper diffutils dirmngr dracut elfutils expat file file-magic filesystem fillup
  findutils fipscheck gawk gio-branding-openSUSE glib2-tools glibc gpg2 grep gzip hardlink
  hwinfo info insserv-compat kbd kbd-legacy klogd kmod kmod-compat krb5 libX11-6 libX11-data
  libXau6 libacl1 libadns1 libaio1 libapparmor1 libasm1 libassuan0 libattr1 libaudit1 libblkid1
  libbz2-1 libcap-ng0 libcap2 libcom_err2 libcrack2 libcryptsetup4 libcurl4 libdbus-1-3 libdw1
  libedit0 libelf0 libelf1 libexpat1 libfdisk1 libffi4 libfipscheck1 libgcc_s1 libgcrypt20
  libgio-2_0-0 libglib-2_0-0 libgmodule-2_0-0 libgmp10 libgobject-2_0-0 libgpg-error0 libidn11
  libkeyutils1 libkmod2 libksba8 libldap-2_4-2 liblua5_1 liblzma5 libmagic1 libmount1
  libmozjs-17_0 libncurses5 libncurses6 libnl-config libnl3-200 libopenssl1_0_0 libpcre1
  libpolkit0 libpopt0 libprocps3 libpth20 libqrencode3 libreadline6 libsasl2-3 libseccomp2
  libselinux1 libsemanage1 libsepol1 libsmartcols1 libssh2-1 libstdc++6 libsystemd0
  libtirpc-netconfig libtirpc3 libudev1 libusb-0_1-4 libusb-1_0-0 libustr-1_0-1 libutempter0
  libuuid1 libverto1 libwicked-0-6 libwrap0 libx86emu1 libxcb1 libxml2-2 libz1 libzio1
  mozilla-nspr ncurses-utils netcfg openSUSE-build-key openSUSE-release openSUSE-release-ftp
  openssh openssl pam pam-config patterns-openSUSE-base patterns-openSUSE-enhanced_base
  perl-base permissions pigz pinentry pkg-config polkit polkit-default-privs procps rpcbind rpm
  sed shadow shared-mime-info suse-module-tools sysconfig sysconfig-netconfig systemd
  systemd-presets-branding-openSUSE systemd-sysvinit sysvinit-tools terminfo-base time udev
  update-alternatives util-linux util-linux-systemd which wicked wicked-service xz

The following 2 NEW patterns are going to be installed:
  base enhanced_base

The following NEW product is going to be installed:
  openSUSE

163 new packages to install.
Overall download size: 48.1 MiB. Already cached: 0 B. After the operation, additional 178.0 MiB
will be used.
Continue? [y/n/...? shows all options] (y): y
....
....
New password: 
Retype new password: 
passwd: password updated successfully

Die Installation nimmt kaum Zeit in Anspruch. Danach liegt ein ziemlich umfassendes chroot-Filesystem vor, für das wir auch ein root-
Passwort definiert haben:

mytux:~ # la /kvm/lxc2
total 84
drwxr-xr-x 21 root root 4096 Sep 25 18:27 .
drwxr-xr-x  6 root root 4096 Sep 25 18:17 ..
drwxr-xr-x  2 root root 4096 Sep 25 18:28 bin
drwxr-xr-x  2 root root 4096 Oct  7  2016 boot
drwxr-xr-x  2 root root 4096 Sep 25 18:28 dev
drwxr-xr-x 56 root root 4096 Sep 25 18:29 etc
drwxr-xr-x  2 root root 4096 Oct  7  2016 home
drwxr-xr-x  8 root root 4096 Sep 25 18:28 lib
drwxr-xr-x  5 root root 4096 Sep 25 18:28 lib64
drwxr-xr-x  2 root root 4096 Oct  7  2016 mnt
drwxr-xr-x  2 root root 4096 Oct  7  2016 opt
dr-xr-xr-x  2 root root 4096 Oct  7  2016 proc
drwx------  4 root root 4096 Sep 25 18:28 root
drwxr-xr-x  7 root root 4096 Sep 25 18:28 run
drwxr-xr-x  2 root root 4096 Sep 25 18:28 sbin
drwxr-xr-x  2 root root 4096 Oct  7  2016 selinux
drwxr-xr-x  4 root root 4096 Sep 25 18:27 srv
dr-xr-xr-x  2 root root 4096 Oct  7  2016 sys
drwxrwxrwt  7 root root 4096 Sep 25 18:28 tmp
drwxr-xr-x 13 root root 4096 Sep 25 18:27 usr
drwxr-xr-x 12 root root 4096 Sep 25 18:27 var
mytux:~ #

Definition des LXC-Containers mit virt-manager

Wir starten nun virt-manager. Zunächst muss mal LXC auf unserem Host laufen – und wir brauchen eine Verbindung zum lokalen Socket. Dazu bemüht man den Menüpunkt “File >> Add Connection”:

Wir drücken dann auf den Button “Connect”. Nun legen wir unseren Container an über “File >> New Virtual Machine”.

Der Button “Forward” führt zu einem Dialog-Window, in dem wir direkt unser Verzeichnis für das Root-FS eintippen (wir befassen uns nicht mit Storage-Verwaltung).

Wieder “Forward” klicken; wir werden nun aufgefordert, RAM-Größe und die Anzahl der zu verwendenden CPUs festzulegen:

Schließlich geben wir dem Container noch einen Namen:

Die Netzwerk-Konfiguration unterlassen wir im Moment. Der Grund hierfür ist, dass wir ohne Installation weiterer Pakete im Container mit einer Netzwerkverbindung eh’ nicht viel anfangen können. Ich komme darauf aber im nächsten Blog-Beitrag zurück. Wir starten die Container-Installation – und uns wird fast umgehend eine Shell angeboten :

Zum Einloggen geben wir das für den chroot-Bereich definierte Passwort ein. So weit, so gut. Ein wenig Herumspielen zeigt sehr schnell: Die Installation ist wirklich recht minimal.

Ausblick

Im nächsten Blog-Beitrag

LXC-Betriebssystem-Container mittels virt-manager unter Opensuse Leap 42.2 – II

ergänze ich zunächst einige nützliche Pakete aus den Opensuse-Repositories. Dann zeige ich, wie man zu einer funktionierenden Netzwerk-Karte in Form eines veth-Devices kommt.

Links

Unterschiede LXC, LXD, Docker
https://unix.stackexchange.com/questions/254956/what-is-the-difference-between-docker-lxd-and-lxc
https://bobcares.com/blog/lxc-vs-docker/
http://www.zdnet.com/article/ubuntu-lxd-not-a-docker-replacement-a-docker-enhancement/

Zeichensatzvorgaben für MySQL und PHP – SET NAMES – und leere Strings durch htmlentities()

Posted on 18. September 2017 by Ralph Mönchmeyer

Gestern ist mir ein dummer Fehler passiert, dessen Analyse mich Zeit gekostet hat. Dabei erwies sich die Sache am Ende als trivial. Es ging letztlich um konsistente Zeichensatzeinstellungen für PHP und MySQL – allerdings mit einem mir bislang unbekannten Nebeneffekt.

Zeichensätze im Kontext von PHP und MySQL sind ein Thema vieler Foren- und Q&A-Artikel im Internet – und nicht immer trifft man auf zufriedenstellende Antworten. Ich hoffe, dieser Artikel trägt anhand eines Beispiels zu etwas mehr Klarheit bei.

Voraussetzungen und aufgetretener Fehler

Unsere hausinternen Apache- und Datenbank-Server laufen normalerweise vollständig unter UTF-8. Inklusive der eigenen Datenbank- und Tabellen-Kollationen unter MySQL- oder MariaDB-Systemen. Aber für Tests müssen wir immer mal wieder gezielt eine Kompatibilität zu den festgelegten Kollationen für MySQL-Banken/Tabellen auf Kundenservern herstellen. Meist kommt dann Latin1 (iso-8859-1) ins Spiel.

Gestern mussten wir für einen solchen Test Datensätze eines von uns entwickelten, php-basierten CMS von einem gehosteten MySQL-Kundenserver in unsere lokale Test-Datenbank übernehmen. Diese Datensätze beinhalteten viele Text-Strings mit deutschen Umlauten. Da es sich nur um wenige Records handelte, haben wir die Daten in diesem Fall mit Copy/Paste und mit Hilfe von Eingabefeldern unserer CMS-Verwaltungsoberfläche übernommen. Das CMS lief dabei unter einer lokalen UTF8-Standarddomäne. Unsere aktuellen CMS-Programme zeigten die fraglichen Textstrings denn auch korrekt in der Web-Oberfläche des CMS an.

Es gab aber andere zu untersuchende Punkte im Layout. Um die Unstimmigkeiten zu testen, haben wir zusätzlich eine separate Testdomäne angelegt und diverse PHP-Klassen vom Kundenserver (auf dem dortigen Versionsstand) in bestimmte Verzeichnisse des zugehörigen Web-Spaces auf unserem lokalen Server geladen. Ergebnis: 2 Domainen mit z.T. unterschiedlichen Versionsständen von PHP-Klassen.

Und dann passierte es:

Bei einem Aufruf der Webseiten in der Testdomäne verschwanden plötzlich fast alle Text-Strings aus der Web-Oberfläche.

Bilder und grundsätzlicher Aufbau der Webseiten bleiben jedoch erhalten. Ich war zunächst völlig verblüfft über diesen Effekt. Zumal der Einsatz unserer lokalen Versionen der gleichen PHP-Klassen kein Verschwinden der Textstrings zeitigte.

Die Analyse war nicht ohne, da ich zunächst nicht wusste, wonach ich zu suchen hatte. Man denkt da zunächst natürlich an Unterschiede im Programmcode selbst. Am Schluss entpuppten sich aber eine im wesentlichen unmodifizierte Klasse, die die Datenbankverbindung steuert, sowie eine Klasse, die die Strings aus Sicherheitsgründen nach unerlaubten Sequenzen filtert, als Kerne des Übels. Ausschlaggebend waren allerdings nicht Programmunterschiede sondern bestimmte Parameter-Setzungen sowie Server- und Zeichensatzeinstellungen. Aber der Reihe nach.

Zeichensatzeinstellungen in der Kommunikationskette zwischen einem PHP-Server und einer MySQL-Datenbank

In der Datenaustausch-Kette zwischen einer MySQL-Datenbank und PHP-Modulen auf einem Web-Server (und natürlich auch bei der Übersendung eines HTML-/XML- oder JSON-Outputs an Web-Clients) spielen verschiedene Zeichensatzeinstellungen eine Rolle. Einige davon können vom Entwickler beeinflusst werden. Andere wiederum nicht immer.

Für unseren Fall waren vor allem die nachfolgenden Punkte relevant; sie betreffen den Web-Server (mit PHP-Modul) und die MySQL-Datenbank:

Zeichensätze (“Kollationen”) der MySQL-Datenbank und zugehöriger Datenbanktabellen.
Zeichensatz-Vorgaben zur MySQL-Verbindung und zum Datentransfer von und zu (PHP-)Programmen, die mittels der SQL-Direktive “SET NAMES” vorgenommen wurden.
Einstellungen für den Default-Character-Set des PHP-Apache-Moduls.
Zeichensatzeinstellungen für die PHP-Funktion “htmlentities()”.

Zeichensätze in der Datenbank und die Direktive “SET NAMES”

Für die Kollation der relevanten MySQL-Datenbank und ihrer Tabellen war auf dem Kundenserver “latin1_german2_ci” gewählt worden. Unsere Einstellungen im lokalen Testsystem waren dazu auf Tabellenebene kompatibel. Die Kollation einer Datenbanktabelle bestimmt letztlich aber nur die interne Ablage der Daten in der Tabelle und nicht den Zeichensatz, unter dem z.B. per SQL ermittelte Resultsets an weiterverarbeitende Programme übermittelt werden.

Für letzteres sind andere Parameter verantwortlich, die man als Entwickler für eine spezifische Verbindung zur MySQL-Datenbank einstellen kann. Unter MySQL und der MariaDB nutzt man dafür etwa die SQL-Direktive “SET NAMES” (oder aber die Funktion mysqli_set_charset(); s.u.).

“SET NAMES” führt zur gleichzeitigen Festlegung dreier RDBMS-Parameter für die Behandlung einer spezifischen Datenbankverbindung. Diese Parameter sind: character_set_client, character_set_connection, character_set_results.

Siehe hierzu etwa
https://dev.mysql.com/doc/refman/5.7/en/set-names.html
und
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_character_set_results.

Die Datenbankverbindung wird unter dem PHP/mysqli-Interface dabei über das Connection-Objekt

$dbi = mysqli_connect(….)

identifiziert. Im Wesentlichen legen die genannten Parameter Folgendes fest:

character_set_client: Zeichensatz, unter dem Daten, die vom Datenbank-Client zum RDBMS-Server transferiert werden, interpretiert werden.
character_set_connection: Handhabung bestimmterErsetzungen und Konversionen, u.a. von Numbers zu Strings.
character_set_results: Zeichensatz, unter dem Daten, die vom RDBMS zu Client-Programmen übermittelt werden interpretiert werden.

Der “Client” ist in unserem Fall natürlich ein PHP-Programm auf einem Apache-Server. Die Anwendung des “SET NAMES”-Statements durch ein PHP-Programm, z.B.

$sql_unames = “SET NAMES ‘utf8′”;
$this->dbi->query($sql_unames);
//dbi->Datenbank-Connection Object
//$this->dbi = mysqli_connect(….)

ist somit von zentraler Bedeutung für die Kommunikation zwischen einem PHP-Programm und einer MySQL/MariaDB! Solange eine hinreichende Konvertierung verwendeter Zeichen gewährleistet ist, muss der Zeichensatz, unter dem Resultsets zum PHP-Programm übermittelt werden, nicht zwingend mit dem der Datenbanktabellen selbst identisch sein.

Es sei darauf hingewiesen, dass es zu “SET NAMES” auch Alternativen gibt, die man im Rahmen des mysqli-Interfaces einsetzen kann und sollte (s.u.). Dass wir in einigen Programmen noch SET NAMES verwenden, hat lediglich historische Gründe. Die hier beschriebene Problematik gilt aber unabhängig vom genauen Werkzeug zur Einstellung der Zeichensatzparameter.

Konsistenz zu Zeichensatzvorgaben für PHP – Einstellungen in der php.ini

Das PHP-Programm muss in jedem Fall mit dem gewählten Zeichensatz für Resultsets adäquat umgehen können; s.u.. Der rechte Bereich der folgenden Skizze, die ich aus einem anderen Artikel dieses Blogs entliehen habe, verdeutlicht das für den Fall “SET NAMES ‘latin1′” :

Nun könnte man in seinen PHP-Programmen natürlich spezifische Umwandlungsfunktionen (u.a. iconv(), mb_convert_encoding(), utf8-encode(), utf8-decode) für die Zeichensatzkonvertierung aus- und eingehender Strings bemühen. Die linke Seite der Skizze liefert hierfür Beispiele (s. den dazu gehörigen Abschnitt weiter unten).

Wenn möglich, kann man aber auch einen Standardzeichensatz für die PHP-Verarbeitung von Strings vorgeben und sich darauf bzgl. der Datenbankinteraktion verlassen.

Entsprechende Einstellungen nimmt man in der Konfigurationsdatei
/etc/php7/apache2/php.ini
vor. Die relevanten Parameter sind dort:

; PHP's default character set is set to UTF-8.
; http://php.net/default-charset
default_charset = "UTF-8"

; PHP internal character encoding is set to empty.
; If empty, default_charset is used.
; http://php.net/internal-encoding
;internal_encoding =

; PHP input character encoding is set to empty.
; If empty, default_charset is used.
; http://php.net/input-encoding
;input_encoding =

; PHP output character encoding is set to empty.
; If empty, default_charset is used.
; mbstring or iconv output handler is used.
; See also output_buffer.
; http://php.net/output-encoding
;output_encoding =

Die hierfür gesetzten Werten sollte natürlich mit den Einstellungen für die Datenbankverbindung zusammenpassen.

Kundenvorgaben

Wir parametrieren in Kundenprojekten die PHP-Methoden für den Verbindungsaufbau zum Datenbankserver meist gemäß expliziter Kundenvorgaben. In unserem Fall hatten wir “SET NAMES ‘latin1′” gewählt. (Hinweis: Die Zeichensatz-Namen auf einem MySQL-Server enthalten grundsätzlich keine Trennzeichen; daher ist statt iso8859-1 “latin1” zu verwenden).

Der Grund dafür war, dass das Apache-PHP-Modul des (gehosteten) Kunden-Web-Servers auf “iso-8859-1” eingestellt war. Das bestätigte eine Überprüfung mit phpinfo(). Diese Einstellungen sollten wir gem. Kundenvorgabe nicht ändern, da auf dem Web-Server auch andere PHP-Programme als unsere eigenen laufen müssen.

Auf dem Kundenserver waren die Zeichensatzeinstellungen also konsistent.

Ursache unseres Problems und die Rolle von htmlentities()

Man ahnt es bereits: Durch das Kopieren der PHP-Klasse für die Steuerung von Datenbankverbindungen vom Kundenserver auf die Testdomäne unseres lokalen Web-Servers entstand dort u.a. eine Inkonsistenz zwischen der Zeichensatzbehandlung unter PHP (utf8!) und dem Zeichensatz für den Transfer von Resultsets aus der MySQL-Datenbank (latin1).

Diese Inkonsistenz kam jedoch noch nicht zum Tragen, als wir die Daten per Copy/Paste über lokale Web-Interfaces einer UTF8-Standard-Domäne in die Bank einbrachten.

Für unsere Testdomäne dagegen muss man jedoch u.a. eine fehlerhafte Konvertierung von deutschen Umlauten erwarten! Warum aber verschwanden die Text-Strings in Gänze aus den Web-Oberflächen?

Die Antwort lieferte schließlich eine von mir bislang zu wenig beachtete Eigenschaft der PHP-Funktion “htmlentities()”.

Unsere Web-Generatoren jagen Strings vor einer Web-Darstellung durch eine Reihe von Prüfroutinen, Transformatoren für erlaubte Zeichenfolgen und durch Filter (u.a. HTMLPurifier, aber auch eigene Filter). Dabei wird in einem Zwischenschritt (nach einer vorhergehenden Konvertierung erlaubter HTML-Zeichenfolgen) auch “htmlentities()” eingesetzt.

htmlentities() erlaubt selbst eine Vorgabe des “Character Sets” über einen Parameter. Ein Check zeigte: Dieser Parameter stand in der lokalen Testdomäne explizit auf “UTF-8”. Diese Einstellung betrifft jedoch eine Konvertierung in den gewünschten Ziel-Zeichensatz für den HTML-Output. Hier hatten wir kein Problem, da die
HTML-Header der Webseiten bzw. die vom Apache-Server generierten HTTP-Header tatsächlich auf UTF-8 ausgerichtet waren.

Allerdings sorgten schon die vorhergehenden Widersprüche zwischen dem Zeichensatz der Datenbank-Resultsets und dem Zeichensatz für die anschließende Behandlung von Strings durch PHP. Das hatte gravierende Folgen. Unter http://php.net/manual/de/function.htmlentities.php findet man nämlich folgenden Hinweis:

Rückgabewerte
Gibt die kodierte Zeichenkette zurück. Enthält der string eine in dem übergebenen encoding ungültige Code Unit Sequenz, wird eine leere Zeichenkette zurückgegeben, sofern weder das ENT_IGNORE noch das ENT_SUBSITUTE Flag gesetzt sind.

(Hervorhebung durch mich).

Das war des Rätsels Lösung:

Eine Inkonsistenz in der Zeichensatzbehandlung im Datenaustausch zwischen unserer MySQL-Datenbank und PHP führte zu nicht behandelbaren Zeichen bei der Anwendung von htmlentities() auf Strings – und diese Funktion produzierte dann gemäß ihrer Default-Einstellungen leere Strings.

Trivial – man muss es halt nur wissen! Ein Test mit

“SET NAMES ‘utf8′”

ließ denn alle verschwundenen Strings auch in unserer Testdomäne prompt wieder erscheinen!

Notwendige Checks vor der Anwendung von SET NAMES (oder von mysqli_set_charset())

Hat man die Kette der Zeichensatzsetzungen bzw. Zeichensatzbehandlung im Austausch zwischen PHP und einer MySQL-Datenbank erst einmal verstanden, so ist auch klar, wo man mit vorbeugenden Maßnahmen ansetzen kann. Solche Vorkehrungen sind – wie das Beispiel zeigt – vor allem dann notwendig, wenn man die Zeichensatz-Einstellungen für PHP nicht auf allen involvierten Web-Servern beeinflussen kann oder darf.

Bevor man “SET NAMES” (oder mysqli_set_charset(); s.u.) in einem PHP-Programm tatsächlich anwendet, sollte man die Setzung des “Default Character Sets” in der php.ini für den aktuellen Server explizit abfragen – mittels

ini_get(‘default_charset’);

– und dann mit der ebenfalls abfragbaren Kollation der Datenbanktabellen vergleichen. Das Ergebnis dieses Vergleichs kann man dann nach bestimmten Regeln behandeln:

Z.B. Warnhinweise bei (ernsthafter) Inkompatibilität ausgeben. Oder wenn man wirklich sicher ist, dass nur deutsche Umlaute zu potentiellen Problemen führen können: Wahl des zu den PHP-Einstellungen konsistenten Zeichensatzes für den Transfer von Resultsets aus der Datenbank. In unserem Fall also “utf8”.

Alternative: Ändern der php.ini-Vorgaben für den Zeichensatz durch und für das laufende PHP-Programm

Auf festgestellte Inkonsistenzen in den Zeichensatzeinstellungen kann man u.U. – nämlich wenn man die dafür nötigen Rechte besitzt – auch mit einer Modifikation der php.ini-Vorgaben für das laufende Programm reagieren. Dazu muss man natürlich die Funktion ini_set() bemühen; Bsp.:

ini_set( string ‘default_charset’, ‘ISO-8859-1’)

bemühen.

Empfohlener Weg zum Setzen des “Character Sets” für eine MySQL-Verbindung

Ich möchte explizit darauf hinweisen, dass es andere Möglichkeiten als die SQL-Direktive “SET NAMES” gibt, Character Sets für die Datenbankverbindung zu setzen. Das PHP-Manual empfiehlt explizit, die Funktion

mysqli_set_charset(mysqli $link , string $charset)

anstelle von SQL und “Set Names” zu verwenden. Siehe: http://php.net/manual/de/mysqli.set-charset.php

Zeichensätze und die Web-Client-Seite

Obwohl nicht Kernthema dieses Artikels werfen wir noch einen ergänzenden Blick auf den
Datenaustausch des PHP/Web-Servers mit Web-Clients (z.B. einem Browser). Die Zeichensatzthematik setzt sich natürlich auch auf dieser Kommunikationsstrecke fort; der linke Teil der obigen Skizze verdeutlicht das am Beispiel von Ajax/Ajaj-Programmen. Diese erfordern i.d.R. UTF-8 für einen ordnungsgemäßen Datenaustausch mit dem Web-Server.

Ist der Parameter “default_charset” in der php.ini-Datei aber auf “iso-8859-1” gesetzt, so muss man ein- und ausgehende Daten entsprechend konvertieren. Dafür eignen sich die Funktionen utf8_decode() für einlaufende POST-/GET-Daten aus Ajax-Programmen und utf8_encode() bei der Erzeugung des Ajax/Ajaj-Outputs in Richtung Web-Client.

Für reinen HTML-Output gilt analoges; dabei sind aber auch HTTP- und HTML-Header-Anweisungen für Character Sets zu setzen. Siehe hierzu:
http://www.html-info.eu/php/php-als-script-sprache/item/zeichensatz-latin1-oder-unicode-utf-8.html

Fazit

In unbeabsichtigter Weise kann htmlentities() plötzlich zu einer Testfunktion für die Konsistenz zwischen

der Zeichensatz-Einstellungen durch “SET NAMES” bzw. mysqli_set_charset() für die MySQL/MariaDB-Datenbankanbindung an ein PHP-Programm
und den Zeichensatzeinstellungen für das PHP-Modul selbst auf dem Webserver

werden – und in letzter Konsequenz zu leeren Strings auf Webseiten führen.

Es lohnt sich vor einem Einsatz von “SET NAMES” – oder besser mysqli_set_charset() – eigentlich immer, die Einstellungen in der “php.ini” bzgl. des “default_charset” und verwandter Parameter abzufragen und daraus angemessene Konsequenzen für den Aufbau der Datenbankverbindung zu ziehen.

Dies gilt vor allem dann, wenn man im Rahmen von Tests und Produktivierungen auf verschiedenen Servern arbeiten will oder muss – und dabei nicht alle Konfigurationsparameter der Server selbst beeinflussen kann und darf.

Neben dem Setzen von Server- und Verbindungsparametern kann man auf festgestellte oder vorgegebene Zeichensatzanforderungen oder Zeichensatzdiskrepanzen aber auch gezielt mit verschiedenen Funktionen reagieren, die PHP für eine Zeichensatz-Detektion und eine explizite Zeichensatzkonvertierung von Strings anbietet.

Linux-Blog – Dr. Mönchmeyer / anracon

Notes about Linux, ML and some simple math …

Nvidia’s kernel module nvidia_uvm kills normal operation of the lsns command

LXC-Container – Fehlermeldungen mit Bezug zum File memory.memsw.usage_in_bytes

LXC-Betriebssystem-Container mittels virt-manager unter Opensuse Leap 42.2 – I

Zeichensatzvorgaben für MySQL und PHP – SET NAMES – und leere Strings durch htmlentities()