Opensuse Leap, KVM/QEMU-guests, KMS – problems with QXL/Spice after upgrade to Leap 15

Many services in my LAN are provided by virtual KVM/QEMU-guests of a dedicated KVM-server-host. More special services are provided by local KVM-guests on Workstations. All my virtualized systems get normally equipped with a qxl/spice-combination to provide a graphical interface - which can be used on the KVM-host or by remote spice clients. A direct graphical access via a spice client is required only seldomly (besides ssh connections) - but in some cases or situations it is useful, if not mandatory.

Recently, I upgraded both the central KVM-Server, its guests and also guests on workstations from Opensuse Leap 42.3 to Leap 15.0. Unfortunately, after the upgrade no graphical access was possible any longer to my guest-systems via spice-clients (as virt-viewer or the graphical spice-console of virt-manager).

After some tests I found out that this was due to missing KMS on the guest systems - present qxl-modules, however, do require KMS. But, if you update/install from an ISO image "KMS" may not be compatible with the graphical SuSE installer. And due to previous configurations "nomodeset" may be a kernel parameter used in the installation before the upgrade. Here is the story ...

The problem: Unreadable, freezing interfaces on spice-clients

Normally, I upgrade by a 5-step sequence: 1) Update all packets, 2) reduce repositories to the Update- and OSS-repository, 3) switch to the repositories of the new distribution, 4) "zypper dup --download-only", 5) "zypper --no-refresh dup" (see e.g. how-to-upgrade-from-opensuse-leap-422-to-423/).

The KVM server-host itself gave me no major problem during its upgrade. Also the KVM-guests - with their server-services - seemed to work well. Most often, I access the KVM-guest-systems via ssh to perform regular administration tasks. So, I did not even notice for a while that something was wrong with the qxl/spice-configuration. But when I used "virt-manager" in an "ssh -X" session from my workstation to the KVM-server and tried to open the graphical console for a guest there, I got an unreadable virtual screen, which froze instantly and did no longer react to any input - special commands sent via "virt-manager" to switch to a non-graphical console terminal were ignored. The same happened with "virt-viewer" and/or when I executed virt-manager directly on the graphical screen of the KVM-server.

Independent test with a new installation of a "Leap 15"-guest-system

To find out more I tested a new installation of Leap 15 on my Leap 15 workstation as a KVM-server. I chose a guest system configuration with standard equipment - among other things qxl/spice-components. The installation was started via the "virt-installer" component of virt-manager. I used an ISO-image of the Leap 15 installation image.

First thing I stumbled across was that I had to use a "No KMS" for the text console setting on the first screen of the Opensuse installer (see the options for the graphical setup there; F3-key). Otherwise the installer froze during the first "udev" checks. I knew this effect already from installations on some physical systems. Note that the choice of "No KMS" leads to an kernel parameter entry "nomodeset" in the command line for the kernel call in the Grub2 configuration (see the file /etc/default/grub).

Such a full new installation lead into a void. SuSE's graphical installer itself worked perfectly with high resolution. However, after a restart of the freshly installed guest-system the switch to the graphical screen lead to a flickering virtual screen and the graphical display of the SDDM login manager never appeared.

Still and luckily, it was possible to login as root and execute a "init 3" command - which brought me to a working, non-flickering ASCII console interface (TTY1).

I had experienced this type of behavior before, too, on some real physical systems. I recommend Leap users to be prepared: activate ssh-services during installation and open the firewall for ssh-ports! The SuSE-installer allows for such settings on its summary screen for the installation configuration. This gives you a chance to switch to a working text console (init 3) from a remote SSH-command line - if the graphical console does not allow for any input.

Tests 2: Upgrade to latest KVM / QEMU / libvirt - versions of the "Virtualization" repository

An installation of the cutting edge versions of KVM/QEMU and spice sever and client software did not change anything - neither on my dedicated KVM-server-system and its upgraded guests nor on my workstation with the fresh Leap 15 test-guest.

Repair of the older upgraded guest-systems

The emulated "hardware" of my older (upgraded) guest-systems, which stem from an OS 13.2 era, is a bit different from the equipment of the newest test guest. On these older systems I still had the choice to use a cirrus, vga or vmwvga graphics. Interestingly, these drivers do not provide the performance of qxl - but they worked at least with the virtual spice client displays.

One central question was whether the QXL driver - more precisely the corresponding kernel module "qxl" - was loaded at all. Answer for the guests on the central KVM-server: No, it was not.

So, on the first guest I simply tried "modprobe qxl" - and hey - the module loaded. An "init 5" then gave me the aspired graphical screen.

Later I checked that actually no "nomodeset" parameter was set in these guests. So, something went "wrong" with the system's startup-configuration during the upgrade procedure. I have no real clue why - as the qxl-module was loaded without problems on the previous Leap 42.3 installations.

For Opensuse Leap the wrapper-script "mkinitrd" transfers a running configuration with its loaded kernel modules via "dracut" into a suitable and permanent initramfs/initrd- and systemd-startup configuration. So, issue "mkinitrd" after a successful test of a qxl/spice-interface.

Repair of the freshly installed Leap 15 guest-system

On the freshly installed Leap 15 guest client on my workstation things were slightly different: The qxl-module did not load there. A "modinfo qxl" shows you parameters you can apply. The right thing to do there was to try

modprobe qxl modeset=1

This worked! Then I eliminated the "nomodeset"-parameter from the "GRUB_CMDLINE_LINUX_DEFAULT"-entry in the file "/etc/default/grub" and started "mkinitrd" to get a stable permanent startup-configuration (via dracut) which booted the guest into graphical mode afterwards.

Adjustment of the TTYs and KDE

As soon as you have a reasonable configuration, you may want to adjust the screen dimensions of the text consoles tty1 to tty6, the sddm-login-screen and the KDE or Gnome screen. These are topics for separate articles. See, however, previous articles in this blog on KVM guests for some hints.

For the text console TTYs try reasonable settings for the following entries in the "/etc/default/grub" - below e.g. for a resolution of "1680x1050":

GRUB_CMDLINE_LINUX_DEFAULT=" .... OTHER PARAMETERS ...... video=1680x1050"
GRUB_GFXMODE="1680x1050"
GRUB_GFXPAYLOAD=keep

(see https://www.suse.com/de-de/support/kb/doc/?id=7017979)

Do not forget to execute "mkinitrd" again afterwards!

For KDE adjustments a user can use the command "systemsettings5" and then specify screen resolutions in the dialog for "display and monitors".

Conclusion

The graphical installer of Opensuse, but also upgrade-procedures on working virtual KVM/QEMU guests with qxl/spice can lead to situations where KMS is or must be deactivated both for physical and virtual systems. As a consequence the "qxl-module" may not be loadable automatically afterwards. This can lead to failures during the start of a graphical qxl/spice-interface for local and remote spice-clients.

The remedy is to remove any "nomodeset"-parameter which may be a part of the entry for "GRUB_CMDLINE_LINUX_DEFAULT" in the file "/etc/default/grub".

For tests of the qxl-driver try loading the qxl-module with "modprobe qxl modeset=1". After a successful start of a graphical interface use the command "mkinitrd" (whilst the qxl-module is loaded!) to establish a permanent configuration which loads "qxl" during system start.

VMware Workstation – Virtual Ethernet failed – sometimes for good!

I use VMware Workstation as a hypervisor for hosting a few MS Windows guests, which I need in customer projects. As I do not trust Windows systems I sometimes place such guests in different virtual and isolated host-only networks on my workstation or a dedicated server. On other systems I run virtualized Linux server systems for production and tests with the help of KVM/QEMU, LXC and libvirt. Some time ago I learned the hard way that I have to keep track of the different "local" virtual networks more thoroughly. As the VMware side was affected by a misconfiguration in a rather peculiar way I thought this experience might be interesting for others, too.

Host interface to VMware virtual network sometimes not available?

The whole thing coincided with an upgrade of one of my Opensuse workstations. I am always a bit nervous whether and how such an upgrade may impact the VMware WS installation. Quite often I experienced problems with the automatic compilation in the past. In addition the start of some modules lead to secondary errors. However, at a first test VMware WS 14 compiled without problems on a system with Opensuse Leap 15. No wonder, I thought, as the kernel version 4.12 of Leap15 is relatively old.

Then I had to set up a Windows guest. I gave it an IP address within a virtual VMware host-only network, which covered an IPv4 address range of a class C network. VMware uses a kind of virtual bridge to support such a network; normally one associates a virtual network device on the host with it, to which you can assign a certain IP address in the net's address ange. For a C-net VMware uses yyy.yyy.yyy.1 as a default on the host's interface (with yyy.yyy.yyy defining the C-network). In my case the device on the host was named "vmnet6". In the beginning this virtual interface also worked as expected.

However, during the following days I noticed trouble with "vmnet6". In a normal host configuration the VMware WS initialization script ("/etc/init.d/vmware") is started as a LSB service by systemd. The script loads VMware modules and configures the defined virtual networks. Unfortunately, relatively often, my "vmnet6" interface did not become directly available after the start of the Linux host. Some experiments showed however that I could circumvent this problem by some "dubious" action:

I had to enter the "Virtual Network Editor" with root rights and save the already existing virtual network again. Then ifconfig or the ip command enlisted interface "vmnet6" - which also seemed to work normally.

You get a strange feeling when you are forced to remedy something as root .... However, I ignored this problem, which I could not solve directly, for a while. I also noticed that the problem did not occur all the time. This should have rang a bell ... but I was too stupid. Until, yesterday, when I needed to set up another special MS WIN guest in the same address range.

Virtual Ethernet failed

In addition I upgraded to VMware WS Version 14.1.3. The (automatic) compilation of the VMware WS modules on Opensuse Leap 15 again worked without major problems. But, when I manually (re-) started the VMware modules via "/etc/init.d/vmware restart" I saw an error message:

"Virtual Ethernet failed".

Such a message makes you nervous because you assume some real big problem with the VMware modules. However, starting some of my VMware guests (not the ones linked to vmnet6) proved that these guests operated flawlessly and could communicate via their and the hosts network devices. But my "vmnet6" did not appear in the output of "ip a s".

The problem "Virtual Ethernet failed" is reported in some Internet articles; however without a real solution. See e.g. https://ubuntuforums.org/showthread.php?t=1592977; https://www.linuxquestions.org/questions/slackware-14/vmware-workstation-unable-to-start-services-4175547448/; https://communities.vmware.com/thread/264235.

Now, I seriously began to wonder whether and how this problem was related to the already known problem with my virtual device "vmnet6". Some tests quickly showed that the error message disappeared when I eliminated the host-only network related to "vmnet6". Then I tried a new host-only test network with a device "vmnet7" and a different IP range. Also then the "virtual Ethernet" started flawlessly. So, what was wrong with "vmnet6" and/or its IP range? Some residual garbage from old installations? A search for configuration files gave me no better ideas.

vnetlib-log

After a minute I thought: Maybe VMware is right. A look into the log-file "/var/log/vnetlib" revealed:

Oct 14 14:22:49 VNL_Load - LOG_ERR logged
Oct 14 14:22:49 VNL_Load - LOG_WRN logged
Oct 14 14:22:49 VNL_Load - LOG_OK logged
Oct 14 14:22:49 VNL_Load - Successfully initialized Vnetlib
...
Oct 14 14:22:49 VNL_StartService - Started "Bridge" service for vnet: vmnet0
Oct 14 14:22:49 VNLPingAndCheckSubnet - Return value of vmware-ping: 0
...
Oct 14 14:22:50 VNL_CheckSubnetAvailability - Subnet: xxx.xxx.xxx.xxx on vnet: vmnet2 is available
...
Oct 14 14:22:49 VNL_CheckSubnetAvailability - Subnet: yyy.yyy.yyy.yyy on vnet: vmnet6 is not available
Subnet on vmnet6 is no longer available for usage, please run the network editor to reconfigure different subnet
Oct 14 14:22:50 VNL_CheckSubnetAvailability - Subnet: xxx.xxx.xxx.xxx on vnet: vmnet7 is available
Oct 14 14:22:50 VNL_CheckSubnetAvailability - Subnet: xxx.xxx.xxx.xxx on vnet: vmnet8 is available
.....

(I have replaced real addresses by xxx and yyy.) So, obviously VMware at startup performs a kind of ping check beyond the locally defined virtual devices and networks. Interesting! Why do they ping?

The answer is trivial: When you set up the (virtual) internal bridge for the host-only network you may want to guarantee that the IP-address for the respective host device ("vmnet6") does not exist anywhere else in the reachable network - especially as the host's interface of a host-only network may be used for a host controlled routing (and NAT) of the otherwise isolated VMware guests to the outside world.

Address overlap

And this resolved my problem: A manual ping for the address of the planned host's interface address yyy.yyy.yyy.1 indeed gave me a result. I located the source on another server, where I sometimes perform tests of different virtual network configurations with KVM guests, LXC containers and libvirt. Unfortunately, I had not disabled all of my test networks after my last tests. Due to default DHCP-settings a virtual interface with address yyy.yyy.yyy.1 got established on the test server whenever it was booted. This also explained the finding that the VMware WS problem did not occur always - it only happened when the test server was active in my physical LAN!

Then I actually remembered that I had had a similar problem once in 2016, whilst experimenting with QEMU/VMware-bridge-coupling. (See: Opensuse/Linux – KVM, VMware WS – virtuelle Brücken zwischen den Welten). So, I should have known better and planned my "local" virtual experiments a bit more carefully to avoid address overlaps with other potential virtual networks on different hosts in the LAN.

Stupid me ... The VMware WS startup script was absolutely right to not establish a second address of that kind on my workstation! This lead to the error message "Virtual ethernet failed" - which in my opinion should include a bit more detailed information or a hint to a log-file. But the fault clearly was on my side: One should not think in terms of a local host environment when using VMware WS for virtual networks, but consider the global network configuration.

Still, the whole VMware handling of a possible IP address overlap left me a bit puzzled :

Why can you by saving the questionable virtual network again establish a host interface despite the fact that another interface with the same address is running somewhere in the network? OK, you are root when you do it - but why is no warning given at this point? (VMware could do a ping check there, too .... )

A second question also worried me: Why did the existence of two devices with the same IP-address did not lead to more chaos in the network? One reason probably was that I had allowed pinging but no general TCP-transport to the virtual device on the test server. And explicit routes were otherwise defined properly on the different hosts. However, I could bet on some problems on the ARP-level when the test server was up and running. Anyway - such a basic misconfiguration in a a network may lead to security holes, too, and should, of course, be avoided.

A third question that came up for a second was: How do you avoid overlaps in case one wants to assign KVM/QEMU-guests and VMware guests IP addresses within the same network address space on one and the same virtualization host? Such a scenario is not at all as far fetched as it may seem at first sight. One reason for such a configuration could be the EU GDPR (DSGVO):

If you need to guarantee a customer confidentiality and are nevertheless forced to use a standard MS Windows client, you have a problem as MS may in an uncontrollable way transfer data to their own servers (outside the EU). Just read the license and maintenance agreements you sign with the operation of a standard Win 10 client! Therefore, you may want to isolate such clients drastically and only allow for communication with certain IP-addresses on the Internet (and NOT with MS servers). You may allow communication with some (virtualized) Linux machines in the same sub-network and one of them may serve as a gateway and perimeter firewall with strict filters. You can build up such scenarios by coupling a QEMU-virtual bridge to a VMware virtual bridge. At the same time, however, you need full control over the DHCP-systems on both sides (besides a bit of scripting) to avoid address overlaps. But this is actually easy: the DHCP-control files on the VMware side are found under the directory "/etc/vmware/vmnetX/dhcpd", with "X" standing for a virtual VMware interface number. On the QEMU side you find the files at "/etc/libvirt/qemu/networks". There you can control your IP assignments (or even no assignments for some special interfaces). Ok, but this is the beginning of another story.

Conclusion

Never think "local" or host based when working with virtual networks! Always include local virtual test networks in your documentation of your global network landscape! An do not always just accept the standard address assignment for host interfaces to virtual networks without thinking.

KVM: fsck direkt auf dem KVM-Host zu Gast-Filesystemen in LUKS-verschlüsselten LVM-Volumes – Unterschiede zwischen Raw Devices und qcow2-Image-Files

Ich betreibe bestimmte Linux-Systeme als KVM-Gäste mit verschlüsseltem Plattenunterbau. Die Partitionen des KVM-Hosts selbst sind zwar auch verschlüsselt; in diesem Artikel betrachte ich aber verschlüsselte virtuelle Disks für KVM-Gastsysteme. Ich nutze hierfür auf dem KVM-Host definierte LVM-Volumes, die mit dm-crypt/LUKS verschlüsselt wurden.

Dabei setze ich regelmäßig zwei Varianten ein:

  • Variante 1: Ein oder mehrere verschlüsselte LVM-Volumes werden dem Gastsystem (in entschlüsseltem Zustand) als "Raw-Devices" zur Verfügung gestellt. Das Gastsystem erstellt dort seine Partitionen (oder im Einzelfall auch eigene LVM-Volumes) mit je einem Linux-Filesystem.
  • Variante 2: Verschlüsselte LVM-Volumes des Hosts enthalten qcow2-Container-Dateien, die vom KVM-Gast als virtuelle Disks genutzt werden. Im qcow2-Container legt der Gast dann eigene LVM-Volumes mit einem Linux-Filesystem an.

Für regelmäßige fsck-Checks der Filesysteme im KVM-Gast kann man einerseits dadurch sorgen, dass man entsprechende Einstellungen für das Gast-Filesystem selbst vornimmt. So kann man mit "tune2fs" den sog. "maximum mount count" auf eine hinreichend kleine Anzahl von Mounts stellen. Das empfiehlt sich vor allem beim root-Filesystem des Gastes: Das darf ja bei der Durchführung von fsck nicht gemountet sein - dies erfordert ansonsten Kunstgriffe, wenn man im bootenden KVM-Gast vor dem Mounten des root-Filesystems fsck erzwingen will.

Manchmal möchte man im Rahmen automatisierter Maintenance-Verfahren aber auch direkt vom KVM-Host aus Filesystem-Checks mit fsck für die Filesystem der KVM-Gäste durchführen. Natürlich ohne das Gastsystem hochzufahren. Wie macht man das im Fall der genannten zwei Varianten?

fsck in Variante 1 - LVM-Volume als verschlüsseltes Raw-Device des Gastes

Variante 1 weist folgende Schichtung bzgl. der physikalischen und virtuellen Disks auf:

  • Physikalische Plattenpartitionen für Raid   >>  
  • Raid 10   >>  
  • LVM   >>  
  • LVM-Groups und LVM-Volumes, die der Host nutzen kann   >>  
  • dm-crypt/LUKS-Verschlüsselung eines (oder mehrerer) von LVM-Volumes, die den Gästen as Raw-Devices bereitgestellt werden.   >>  
  • KVM/QEMU- und Gastsystem mit Zugriff auf das Raw-Volume als virtuelle Platte   >>  
  • Partitionen (oder LVM-Volumes) im Gastsystem   >>  
  • ext4-Filesysteme im Gastsystem

Hier führt der Weg über die Anwendung von "cryptsetup" und z.B. das Tool "kpartx" (das man natürlich installiert haben muss). Wir führen alle Operation natürlich nur dann durch, wenn das KVM-Gastsystem selbst nicht läuft. Wir müssen vermeiden, dass auf die Partitionen des Gastes von mehreren Betriebssystemen aus (Host und Gast) gleichzeitig schreibend zugegriffen wird.

Ich gehe in unserem Beispiel mal davon aus, dass die Entschlüsselung des betreffenden LVM-Volumes für den Gast auf dem KVM-Host noch nicht vorgenommen wurde. Dieses Volume liege in einer logischen Volume Group "lvg2" und habe die Bezeichnung "lvhd0".

Das Kommando "la" ist in folgendem Beispiel ein Alias für 'ls -la'; alle Kommandos werden direkt auf dem Host ausgeführt. Das jeweilige KVM-Gastsystem ist nicht hochgefahren. Unter dem Verzeichnis "/dev/mapper" finden wir dann bei aktivierten Volume-Groups auf dem KVM-Host etwa folgenden Eintrag vor;

mytux:~ # la /dev/mapper
total 0
drwxr-xr-x  2 root root     340 Aug  4 09:46 .
drwxr-xr-x 22 root root    9720 Aug  4 09:50 ..
crw-------  1 root root 10, 236 Aug  4 09:41 control
... 
lrwxrwxrwx  1 root root       8 Aug  4 09:46 lvg2-lvhd0 -> ../dm-11
...                                                                    

Zunächst müssen wir dieses Volume entschlüsseln:

mytux:~ # cryptsetup open /dev/mapper/lvg2-lvhd0  cr_hd0
Enter passphrase for /dev/mapper/lvg2-lvhd0: 

mytux:~ # la /dev/mapper
total 0
drwxr-xr-x  2 root root     340 Aug  4 09:46 .
drwxr-xr-x 22 root root    9720 Aug  4 09:50 ..
crw-------  1 root root 10, 236 Aug  4 09:41 control
lrwxrwxrwx  1 root root       8 Aug  4 09:46 cr_hd0 -> ../dm-16
...
lrwxrwxrwx  1 root root       8 Aug  4 09:46 lvg2-lvhd0 -> ../dm-11
...                                                                    

Ok. Der Befehl "qemu-img" informiert uns darüber, dass wir es tatsächlich mit einem Raw-Device (von 100GB Größe) zu tun haben:

 
mytux:~ # qemu-img info /dev/mapper/cr_hd0
image: /dev/mapper/cr_hd0
file format: raw
virtual size: 100G (107372085248 bytes)
disk size: 0

Infos zur Partitionierung des "Raw Devices"
In unserem Beispiel befinden sich auf dem entschlüsselten LVM-Volume des Hosts zwei Partitionen des Gastes: eine swap-Partition und eine Partition mit einem ext4-Filesystem. Es gibt mehrere Tools, mit denen man die Partitionsstruktur unterhalb eines Raw-Devices für einen KVM-Gast auf dem Host selbst untersuchen kann:
"fdisk", "parted", "virt-filesystems" und eben auch "kpartx":

 
mytux:~ # fdisk -l /dev/mapper/cr_hd0
Disk /dev/mapper/cr_imap: 100 GiB, 107372085248 bytes, 209711104 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00041fe1

Device                    Boot   Start       End   Sectors  Size Id Type
/dev/mapper/cr_hd0-part1         2048   3067903   3065856  1.5G 82 Linux swap / Solaris
/dev/mapper/cr_hd0-part2 *    3067904 167772159 164704256 78.6G 83 Linux

mytux:~ # parted /dev/mapper/cr_hd0 unit s print
Model: Linux device-mapper (crypt) (dm)
Disk /dev/mapper/cr_hd0: 209711104s
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start     End         Size        Type     File system     Flags
 1      2048s     3067903s    3065856s    primary  linux-swap(v1)  type=82
 2      3067904s  167772159s  164704256s  primary  ext4            boot, type=83
 
mytux:~ # virt-filesystems -a /dev/mapper/cr_hd0 --extra
/dev/sda1
/dev/sda2
mytux:~ # virt-filesystems -a /dev/mapper/cr_hd0 --extra -l
Name       Type        VFS   Label  Size         Parent
/dev/sda1  filesystem  swap  -      1569718272   -
/dev/sda2  filesystem  ext4  -      84328579072  -
 
mytux:~ # kpartx -l /dev/mapper/cr_hd0 
cr_hd01 : 0 3065856 /dev/mapper/cr_hd0 2048
cr_hd02 : 0 164704256 /dev/mapper/cr_hd0 3067904

Hinweis:

kpartx liefert Infos zur Partionierung (samt Offests) auch für Disk-Image-Files im "Raw"-Format. kpartx funktioniert jedoch nicht für Disk-Image-Files im qcow2-Format!

Mit Ausnahme von "virt-filesystems" stellen uns alle oben vorgestellten Tools auch Offset-Informationen zur Verfügung:
Die Angaben 2048(s) und 3067904(s) entsprechen Offset-Adressen der Partitionen; wir müssen die Zahl der Sektoren allerdings noch mit der Anzahl der Bytes (512) multiplizieren: also z.B. 2048 * 512 ist der Offset für die erste (swap-) Partition.

Man könnte zur Partitionsbestimmmung auch "guestfish" und die zu guestfish gehörigen Sub-Kommandos run und list-filesystems oder aber auch das Tool "qemu-nbd" heranziehen. "qemu-nbd" diskutiere ich gleich im Detail anhand der Variante 2.

Partitionen des Raw-LVM-Volumes (= Raw-Device für das KVM-Gastsytem )ansprechen
"kpartx -a" liefert uns für Devices im Raw-Format eine einfache Möglichkeit, die darin liegenden Partitionen anzusprechen.

mytux:~ # kpartx -a /dev/mapper/cr_hd0
mytux:~ # la /dev/mapper
total 0
drwxr-xr-x  2 root root     420 Aug  6 14:36 .
drwxr-xr-x 22 root root    9740 Aug  6 14:36 ..
crw-------  1 root root 10, 236 Aug  6 08:26 control
lrwxrwxrwx  1 root root       8 Aug  6 14:32 cr_hd0 -> ../dm-16
lrwxrwxrwx  1 root root       8 Aug  6 14:36 cr_hd01 -> ../dm-17
lrwxrwxrwx  1 root root       8 Aug  6 14:36 cr_hd02 -> ../dm-18
lrwxrwxrwx  1 root root       8 Aug  6 14:36 cr_hd0_part1 -> ../dm-17
lrwxrwxrwx  1 root root       8 Aug  6 14:36 cr_hd0_part2 -> ../dm-18
....
lrwxrwxrwx  1 root root       7 Aug  6 08:55 lvg2-lvhd0 -> ../dm-11
...
mytux:~ # 

Man beachte die unterschiedliche Bezeichnung, die auf das gleiche Device verlinken. Wir wissen bereits, dass die zweite Partition ein ext4-Filesystem des KVM-Gastes enthält. Also

mytux:~ # fsck -f /dev/mapper/cr_hd02
fsck from util-linux 2.29.2
e2fsck 1.42.11 (09-Jul-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/cr_hd02: 1016912/5152768 files (0.1% non-contiguous), 6724050/20588032 blocks
mytux:~ # 

Anschließend können wir das Mapping durch "kpartx -d" wieder rückgängig machen:

mytux:~ # kpartx -d /dev/mapper/cr_hd0
mytux:~ # la /dev/mapper
total 0
drwxr-xr-x  2 root root     340 Aug  6 14:45 .
drwxr-xr-x 22 root root    9700 Aug  6 14:45 ..
crw-------  1 root root 10, 236 Aug  6 08:26 control
lrwxrwxrwx  1 root root       8 Aug  6 14:32 cr_hd0 -> ../dm-16
...
lrwxrwxrwx  1 root root       7 Aug  6 08:55 lvg2-lvhd0 -> ../dm-11 
mytux:~ # 

Was tun, wenn das Gastsystem auch selbst LVM nutzt?
In unserem Fall lag im Gastsystem selbst keine LVM-Struktur vor. Hätten wir das gehabt, hätten wir noch zwei weitere Schritte vornehmen müssen - nämlich "vgscan", "vgchange -ay". Erst danach hätten wir "fsck" ausführen können. Wir werden dies weiter unten bei der Diskussion der Variante 2 sehen.

Arbeit über Loop-Devices
Der Vollständigkeit halber zeige ich kurz noch, wie man Partitionen von KVM-RAW-Devices über ihre Offsets auch als Loop-Devices ansprechen kann. Die zweite Partition hat in unserem Beispiel einen Offset von 3067904 * 512 = 1570766848 Bytes.

Also:

mytux:~ # losetup -r -o 1570766848  /dev/loop3 /dev/mapper/cr_hd0
mytux:~ # fsck -f /dev/loop3
fsck from util-linux 2.29.2
e2fsck 1.42.11 (09-Jul-2014)
fsck.ext4: Operation not permitted while trying to open /dev/loop3
You must have r/w access to the filesystem or be root
mytux:~ # losetup -d  /dev/loop3 
mytux:~ # losetup  -o 1570766848  /dev/loop3 /dev/mapper/cr_hd0  
mytux:~ # fsck -f /dev/loop3     
fsck from util-linux 2.29.2
e2fsck 1.42.11 (09-Jul-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/loop3: 1016912/5152768 files (0.1% non-contiguous), 6724050/20588032 blocks
mytux:~ # 
mytux:~ # tune2fs -l /dev/loop3
tune2fs 1.42.11 (09-Jul-2014)
Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          4388dd4b-ac1a-5c9c-b8d6-88e53b12bd2d
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              5152768
Block count:              20588032
Reserved block count:     267644
Free blocks:              13863982
Free inodes:              4135856
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1019
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Mon Jan  6 15:44:37 2014
Last mount time:          Mon Aug  6 08:56:27 2018
Last write time:          Mon Aug  6 15:01:58 2018
Mount count:              0
Maximum mount count:      2
Last checked:             Mon Aug  6 15:01:58 2018
Check interval:           172800 (2 days)
Next check after:         Wed Aug  8 15:01:58 2018
Lifetime writes:          2270 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      ce2daf83-bc25-44d5-fcdf-b35afd5c8f2b
Journal backup:           inode blocks
mytux:~ # 
mytux:~ # losetup -d  /dev/loop3

Der Leser sieht, dass ich in der letzten Befehlssequenz anfänglich aus lauter guter Gewohnheit das Device nur im read-only modus als Loop-Device angelegt hatte. Erst nach einer Korrektur läuft dann fsck. Natürlich kann man dann z.B. auch tune2fs auf das Loop-Device anwenden. Über Loop-Devices geht es also auch. Man muss dann halt nur die Offsets wissen!

fsck in Variante 2 - verschlüsseltes LVM-Volume mit Disk-Image-File im "qcow2"-Format

In diesem Szenario liegt eine wirklich komplexe Schichtung vor:

KVM-Host -> LVM-Volume-Group -> Luks-verschlüsseltes LVM-Volume -> qcow2-Image-File -> Partitions- und LVM-Struktur des KVM-Gastsystems -> Volumegroup mit LVM-Volume des Gastes -> ext4-Filesystem

In diesem Szenario wirken sich vor allem zwei bedeutsame Unterschiede zur Variante 1 aus:

  • Unser Disk-Image-File (eine Art Container-File; "os43.qcow2") hat kein Raw-Format - daran scheitert u.a. "kpartx -a".
  • In dem Container-File befindet sich eine Partition, mittels derer das Gastsystem eine LVM-Logical-Volume-Group "lvg1" samt einem Logical Volume "lvroot" angelegt hat.

Zum ersten Problem:
Es ist hier zu bedenken, dass wir das zu prüfende Filesystem ja nicht auf dem Host mounten wollen. Das einzige mir bekannte Programm, das uns den Inhalt (also die Partitionen) des qcow2-Files samt Offsets bedarfsgerecht zur Verfügung stellt, ist <strong>qemu-nbd</strong>. Bedarfsgerecht heißt hier, dass Physical Volumes (Partitione) des Gastes anschließend auf dem KVM-Host in Form von Loop-Devices weiter genutzt werden können. Das ermöglicht es uns dann, die LVM Volume-Group und die LVM-Volumes innerhalb des qcow2-Files anzusprechen und "fsck -f" auszuführen.

Hinweis:

Es gibt zwar eine Möglichkeit eine Standardvariante von fsck innerhalb des Kommandos "guestfish" aus der libguestfs-Suite anzuwenden (s.u.); "fsck -f" geht damit aber nicht.

Zum zweiten Problem:
Neben der Aufdröselung der Partitionsstruktur (samt Offsets) im qcow2-File mit seinem komplexen Format, müssen geeignete Tools auf dem Host auch noch die interne LVM-Struktur erkennen und zu aktivieren. Wir werden hierfür das Gespann "vgscan und "vgchange" einsetzen.

Entschlüsselung und Mounten des LVM-Volumes des Hosts
Alle nachfolgenden Kommandos werden wieder direkt auf dem KVM-Host ausgeführt. Zunächst müssen wir wie in Variante 1 das passende LVM-Volume des KVM-Hosts entschlüsseln. Wir müssen das dekryptierte Device anschließend aber auch noch an geeigneter Stelle des Hosts mounten, um das dort enthaltene Disk-Image-File ansprechen zu können:

mxtux:~ # cryptsetup open /dev/mapper/volssd10-kvmos  cr_kvmos
Enter passphrase for /dev/mapper/volssd10-kvmos:
mxtux:~ # 
mxtux:~ # la /dev/mapper
total 0
drwxr-xr-x  2 root root     420 Aug  6 11:52 .
drwxr-xr-x 27 root root   12300 Aug  6 11:52 ..
crw-------  1 root root 10, 236 Aug  6 08:49 control
lrwxrwxrwx  1 root root       8 Aug  6 11:48 cr_kvmos -> ../dm-16
...
lrwxrwxrwx  1 root root       7 Aug  6 11:48 volssd10-kvmos -> ../dm-6
...  
mxtux:~ # 
mount /dev/mapper/cr_kvmos /kvm/os                                                                  
mxtux:~ # 

Informationen zum Partitionsaufbau innerhalb des qcow2-Image-Files:
Während kpartx keine ausreichenden Informationen liefert, ermöglicht uns das Kommando "virt-filesystems" aus der libguestfs-Suite (bei Bedarf installieren!) einen Einblick in die Unterteilung des qcow2-Image-Files "os43.qcow2":

mytux:~ # virt-filesystems  -a  /kvm/os/os43.qcow2 --extra -l 
Name              Type        VFS   Label  Size        Parent
/dev/sda1         filesystem  swap  -      2153775104  -
/dev/lvg1/lvroot  filesystem  ext4  -      8589934592  -

Netterweise erkennt "virt-filesystems" sogar die LVM-Volume-Group "lvg1" und das Volume "lvroot" ! Wir könnten dieses logische Volume des Gastes nun sogar mittels des Kommandos "guestmount" (ebenfalls Teil der libguestfs) auf dem Host mounten und mit den Inhalten arbeiten:

mytux:~ # guestmount -a /kvm/os/os43.qcow2 -m /dev/lvg1/lvroot --ro /mnt2
mytux:~ # la /mnt2
total 136
drwxr-xr-x  23 root root   4096 Jun  7 18:27 .
drwxr-xr-x  40 root root   4096 Jul 25 18:55 ..
drwxr-xr-x   2 root root   4096 May 14 19:20 bin
drwxr-xr-x   3 root root   4096 May 14 19:22 boot
drwxr-xr-x   2 root root   4096 May 14 17:09 dev
drwxr-xr-x 128 root root  12288 Aug  4 11:41 etc
drwxr-xr-x   3 rmu  users  4096 May 28 17:43 extras
drwxr-xr-x   4 root root   4096 May 15 20:29 home
drwxr-xr-x  12 root root   4096 May 14 19:20 lib
drwxr-xr-x   7 root root  12288 May 14 19:21 lib64
drwx------   2 root root  16384 May 14 17:09 lost+found
drwxr-xr-x   2 root root   4096 May 10  2017 mnt
drwxr-xr-x   2 root root   4096 May 10  2017 opt
drwxr-xr-x   2 root root   4096 May 14 17:09 proc
drwx------   9 root root   4096 Jun 12 21:52 root
drwxr-xr-x   2 root root   4096 May 14 17:09 run
drwxr-xr-x   2 root root  12288 May 14 21:12 sbin
drwxr-xr-x   2 root root   4096 May 10  2017 selinux
drwxr-xr-x   5 root root   4096 May 14 17:12 srv
drwxr-xr-x   2 root root   4096 May 14 17:09 sys
drwxrwxrwt  26 root root  12288 Aug  4 11:41 tmp
drwxr-xr-x  13 root root   4096 May 14 17:10 usr
drwxr-xr-x  12 root root   4096 May 14 17:24 var
mytux:~ # umount /mnt2

Leider bringt uns das hinsichtlich des angestrebten "fsck" aber gar nichts.

Einsatz von "qemu-nbd"
Im Gegensatz zu RAW-Devices oder Raw-Image-Files kommen wir an dieser Stelle nicht um den Einsatz von des qemu-eigenen Kommandos "qemu-nbd" herum. Also:

mxtux:~ # modprobe nbd max_part=8
mxtux:~ # qemu-nbd --connect=/dev/nbd0 /kvm/os/os43.qcow2 
mxtux:~ # la /dev/ | grep nbd0
brw-rw----   1 root disk       43,   0 Aug  6 16:22 nbd0
brw-rw----   1 root disk       43,   1 Aug  6 16:22 nbd0p1
brw-rw----   1 root disk       43,   2 Aug  6 16:22 nbd0p2

Was verbirgt sich hinter diesen neuen Devices dahinter?

mxtux:~ # fdisk -l /dev/nbd0
Disk /dev/nbd0: 15 GiB, 16106127360 bytes, 31457280 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000392a8

Device      Boot   Start      End  Sectors Size Id Type
/dev/nbd0p1         2048  4208639  4206592   2G 82 Linux swap / Solaris
/dev/nbd0p2 *    4208640 31457279 27248640  13G 8e Linux LVM

Aha, fdisk erkennt, dass die zweite Partition LVM nutzt. Wie kommen wir nun weiter? "kpartx" führt uns nicht zum Ziel, da LVM Volume Groups erst aktiviert werden müssen. Hierzu sind zwei Schritte nötig

  • Schritt 1: Wir müssen uns das LVM-Device des qcow2-Files auf dem Host zugänglich machen - als Loop-Device. Dazu berechnen wir dessen Offset ( = 4208640 * 512 = 2154823680)
  • Schritt 2: Wir müssen wir auf dem KVM-Host das Kommando "vgscan" ausführen und danach die gewünschten Volume Group mit "vgchange" aktivieren

Also:

mxtux:~ # vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "volssd10" using metadata type lvm2
  Found volume group "lvgssd5" using metadata type lvm2
  Found volume group "lvg2" using metadata type lvm2
  Found volume group "lvg10f2" using metadata type lvm2
  Found volume group "volgrp1" using metadata type lvm2

mxtux:~ # losetup -o 2154823680 /dev/loop5 /dev/nbd0

mxtux:~ # vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "lvg1" using metadata type lvm2
  Found volume group "volssd10" using metadata type lvm2
  Found volume group "lvgssd5" using metadata type lvm2
  Found volume group "lvg2" using metadata type lvm2
  Found volume group "lvg10f2" using metadata type lvm2
  Found volume group "volgrp1" using metadata type lvm2
mxtux:~ # 

Aha, vgscan erkennt eine neue Volume-Group "lvg1". Wir sehen hier übrigens, dass es sich lohnt, die Bezeichnungen von Groups auf dem Host und den Gastsystemen unterschiedlich und global eindeutig zu wählen - etwas, das ich hier zu meiner Schande versäumt habe. Nun müssen wir die Volume Group noch aktivieren:

mxtux:~ # vgchange -ay lvg1
  1 logical volume(s) in volume group "lvg1" now active
mxtux:~ #
mxtux:~ # la /dev/mapper
total 0
drwxr-xr-x  2 root root     460 Aug  6 16:44 .
drwxr-xr-x 28 root root   12880 Aug  6 16:44 ..
crw-------  1 root root 10, 236 Aug  6 08:49 control
lrwxrwxrwx  1 root root       8 Aug  6 11:48 cr_kvmos -> ../dm-16
...
lrwxrwxrwx  1 root root       8 Aug  6 16:44 lvg1-lvroot -> ../dm-19
...
lrwxrwxrwx  1 root root       8 Aug  6 16:33 nbd0p2p1 -> ../dm-18
...

mxtux:~ # fdisk -l /dev/mapper/lvg1-lvroot
Disk /dev/mapper/lvg1-lvroot: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

mxtux:~ # virt-filesystems -a /dev/nbd0 --extra --parts --blkdevs --filesystems -l --lvs
Name             Type       VFS  Label MBR Size        Parent
/dev/sda1        filesystem swap -     -   2153775104  -
/dev/lvg1/lvroot filesystem ext4 -     -   8589934592  -
/dev/lvg1/lvroot lv         -    -     -   8589934592  /dev/lvg1
/dev/sda1        partition  -    -     82  2153775104  /dev/sda
/dev/sda2        partition  -    -     8e  13951303680 /dev/sda
/dev/sda         device     -    -     -   16106127360 -

Nun können wir den Filesystem-Check des LVM-Volume des KVM-Gasts, das sich m qcow2-File befindet, auf dem KVM-Host ausführen. Und danach alle Kommandos wieder rückgängig machen:

mxtux:~ # fsck -f /dev/mapper/lvg1-lvroot
fsck from util-linux 2.29.2
e2fsck 1.42.11 (09-Jul-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/lvg1-lvroot: 245825/524288 files (0.1% non-contiguous), 1707336/2097152 blocks
mxtux:~ # 
mxtux:~ #  vgchange -an lvg1
  0 logical volume(s) in volume group "lvg1" now active
rux:~ # la /dev/mapper
total 0
drwxr-xr-x  2 root root     440 Aug  6 17:09 .
drwxr-xr-x 27 root root   12840 Aug  6 17:09 ..
crw-------  1 root root 10, 236 Aug  6 08:49 control
lrwxrwxrwx  1 root root       8 Aug  6 11:48 cr_kvmos -> ../dm-16
...
lrwxrwxrwx  1 root root       8 Aug  6 16:33 nbd0p2p1 -> ../dm-18
..
mxtux:~ # losetup -d /dev/loop5 
mxtux:~ # qemu-nbd -d /dev/nbd0 
/dev/nbd0 disconnected
mxtux:~ # rmmod nbd
mxtux:~ # 

Ich bitte zu beachten, dass wir in diesem Fall trotz der bereits sehr komplexen Schichtung immer noch einen Vorteil hatten: Es gab nur genau ein virtuelles LVM-Physical-Volume des Gast-Systems, nämlich die zweite Partition innerhalb des qcow2-Files. Ferner gabe es nur eine Volume-Group. Bei mehreren "Volume Groups", die sich über unetrschiedliche "Physical Volumes" aus verschiedenen virtuellen Partitionen des Gastes erstreckten, hätten wir alle zugehörigen virtuellen Partitionen auf dem Host als Loop-Devices bereitstellen müssen. Das ist uns im Beipiel erspart geblieben.

fsck in einer dritten Variante - verschlüsseltes LVM-Volume mit einem RAW Image File

Nach all dem Zirkus mit qcow2 stellt sich die Frage: Warum verwendet man nicht gleich Image-Files im Raw-Format? Das ist ein gute Frage; ich werde die Antwort aber auf einen anderen Artikel verschieben. Genannt sei nur der Vorteil des langsam auf die Maximalgröße wachsenden Platzbedarfs im Falle qcow2. Für Disk-Image-Files im Raw-Format spricht aber die Performance. Dennoch ein lapidarer Hinweise zum Einsatz von "fsck" für Partitionen in Raw-Container-Files:

Dieser Fall kann im Kern fast genauso behandelt werden kann, wie oben unter Variante 1 beschrieben. Wer so etwas testen will, kann ja mal ein qcow2-File in ein Raw-Format-File mittels des Kommandos "qemu-img convert" umwandeln (s. die entsprechende man -Seite).

Fazit und eine Alternative

"fsck" mit zugehörigen Optionen für Filesysteme anzuwenden, die sich in verschlüsselten LVM-Volumes eines KVM-Hosts befinden, ist relativ einfach, wenn das LVM-Volume dem KVM-Gast entweder direkt als Raw-Device zur Verfügung gestellt wird oder aber über ein Disk-Image-File im RAW-Format, das sich auf dem Volume befindet. Befinden sich dann unter dem Raw-Device nur gewöhnliche Partitionen kann man sich das Leben mit "kpartx" bequem machen.

Deutlich schwieriger wird die Sache aber mit qcow2-Imag-Files und/oder virtuellen Partitionen von Image-Files, die auf dem Gastsystem in dortigen LVM-Volume-Groups eingesetzt werden. Im Falle von qcow2-Files muss man zunächst zwingend das Kernel-Modul "nbd" und den "qemu-nbd"-Befehl einsetzen. LVM-Groups innerhalb vom Image-Disk-Files verlangen ferner die Bereitstellung aller entsprechenden zugehörigen virtuellen "Physical Volumes" (virtuelle Partitionen) als Loop-Devices auf dem KVM-Host. Danach sind die Befehle "vgscan" und "vgchange" anzuwenden, um schließlich unter "/dev/mapper" das logische Volume des Gastes mit seinem Filesystem zu erhalten. Erst dann kann man hierauf "fsck" anwenden. Das ist schon komplex, aber man hat am Ende die volle Kontrolle über fsck.

Wem das alles zu schwierig ist, der kann alternativ mal eine einfache Standardvariante des fsck-Befehls unter "guestfish" für Filesysteme von KVM-Gästen ausprobieren. Funktioniert für Raw-Devices und qcow2-Files!