Long boot time of Kali Linux after swap partition changes – swap settings for the initramfs

Recently, I reactivated an old KVM-based installation of Kali Linux from 2018. So, some hurdles to upgrade the Kali distribution to version 2020.4 had to be overcome. Actually, it was a mess. I had to solve multiple circle like problems with package dependencies (Python3, Ruby, special packages in the Discovery section, …). Sometimes I had to delete packages explicitly. The new Kali minimum installation and its enhancement via meta-packages contributed to the problems. In the end I also reinstalled a the Gnome desktop environment – supplemented by a few KDE applications. Now, my Kali was fully functional again. However, the remaining disk space had shrunk …

Resizing the qcow2-file for the virtual disk of the virtual machine for Kali Linux

During all the back and forth with installing meta-packages I came close to exhausting virtual disk space, before I could clean up and remove packages again (with “apt-get autoclean” and “apt-get autoremove”). The virtual hard disk was a qcow2-file in my case. For further experiments I had to expand its capacity. This could easily be done by

qemu-img resize PATH_TO_qcow2_FILE +NG

where I had to chose a suitable “N” (20). The extended file must, of course, still fit into its filesystem on the KVM host!

Extending and rearranging partitions within the Kali system

The more difficult part in my case was the rearrangement of the two partitions ( one for the “/”-fs and one for swap) on the virtual disk for my Kali guest system. I did this within the running Kali system. Bad habit; such operations are dangerous, of course, but I trusted in the abilities of gparted. An extension of a mounted “ext4”-formatted partition is in my experience no major problem as long as there is enough free space behind its current location …. But, I recommend, of course, to make a backups of your virtual machine, before you start with any potentially dangerous operations regarding the file-systems of your Linux machines.

As my old Kali installation unfortunately did not have a LVM-layout and the disk partition table was of the old MS-DOS type, I actually had to move a blocking swap partition which was located directly after the “/”-partition. Meaning: I had to delete and later recreate it at a different position. Of course, after having disabled the swap in the running Kali guest :-). The partition keeping the “/”-filesystem (ext4) could then be extended without any problems on the running system. The new swap partition afterwards got its place behind the “/”-partition. According to “gparted” everything was OK after the partition changes: Then I rebooted…

The problem: A boot time of almost 30 secs …

The next restart took about 28 secs! But the machine came up in the end – which is almost a wonder, after I understood the cause of the problem. A standard boot-process until login normally required about 4 secs, only, before my filesystem changes. A major discrepancy and a clear indications of a major problem! Looking at the “dmesg”-output I got the impression that the delay had to do with operations occurring at the very beginning of the boot process. So, I checked the “/etc/fstab”. And got a first glimpse of the cause: The entries there referred to the UUIDs of the partitions! Not unexpectedly – this is the standard these days for almost all major distributions.

So, stupid me: Of course, the UUID of the swap partition had changed during the mentioned operations! I adapted it to the new value (which I got from gparted). I also checked whether there was
any reference to the swap-partition in the Grub command line (check “/etc/default/grub” for a “resume”-parameter!) To be on the safe side I also checked the result of “update-grub” in the file “/boot/grub/grub.cfg”). I did not find any references there. So, I gladly restarted my Kali system. However, the problem had not disappeared … .

Solution: The initramfs-configuration includes an explicit setting for the swap-partition!

Now, at this point there were not many options left. I started suspecting that the initramfs had a wrong entry. Now, where do we find options regarding the swap for the initramfs on a Debian based systems? A bit of duckduckgoing pointed me to the file

/etc/initramfs-tools/conf.d/resume

And there I did find an entry like

RESUME=UUID=d222524c-5add-2fcf-82dd-4d1b7e528d0c

OK – I changed it to the new value. Then I used

update-initramfs -u

to update the initramfs. And, guess what: The problem disappeared! I got my 4 secs of boot time again.

Conclusion

Never forget to check the UUID-settings for partitions after major changes of the filesystem-layout on your disks. Do the necessary checks not only on real host systems, but also on virtualized systems. Check both the “/etc/fstab” AND the Grub2-configuration AND the initramfs-configuration for possible references to changed or moved partitions.

Off topic – but related: When copying the contents of a partition with “dd” into another partition on your hard disk, e.g. for a backup, you should also care about the fact that you have two partitions with the same UUID afterwards. This may lead to major problems for any active Grub2. Always change the UUID of the copied partition with tune2fs before any reboots. If the copied partition was a bootable one, you also take care of its “/etc/fstab”-entries and initramfs settings, if you want it to be bootable in its new place.

General warning: Whenever you move/copy partitions, write down and save the original UUIDs – you may need them in case of trouble.

 

Opensuse Leap, KVM/QEMU-guests, KMS – problems with QXL/Spice after upgrade to Leap 15

Many services in my LAN are provided by virtual KVM/QEMU-guests of a dedicated KVM-server-host. More special services are provided by local KVM-guests on Workstations. All my virtualized systems get normally equipped with a qxl/spice-combination to provide a graphical interface – which can be used on the KVM-host or by remote spice clients. A direct graphical access via a spice client is required only seldomly (besides ssh connections) – but in some cases or situations it is useful, if not mandatory.

Recently, I upgraded both the central KVM-Server, its guests and also guests on workstations from Opensuse Leap 42.3 to Leap 15.0. Unfortunately, after the upgrade no graphical access was possible any longer to my guest-systems via spice-clients (as virt-viewer or the graphical spice-console of virt-manager).

After some tests I found out that this was due to missing KMS on the guest systems – present qxl-modules, however, do require KMS. But, if you update/install from an ISO image “KMS” may not be compatible with the graphical SuSE installer. And due to previous configurations “nomodeset” may be a kernel parameter used in the installation before the upgrade. Here is the story …

The problem: Unreadable, freezing interfaces on spice-clients

Normally, I upgrade by a 5-step sequence: 1) Update all packets, 2) reduce repositories to the Update- and OSS-repository, 3) switch to the repositories of the new distribution, 4) “zypper dup –download-only”, 5) “zypper –no-refresh dup” (see e.g. how-to-upgrade-from-opensuse-leap-422-to-423/).

The KVM server-host itself gave me no major problem during its upgrade. Also the KVM-guests – with their server-services – seemed to work well. Most often, I access the KVM-guest-systems via ssh to perform regular administration tasks. So, I did not even notice for a while that something was wrong with the qxl/spice-configuration. But when I used “virt-manager” in an “ssh -X” session from my workstation to the KVM-server and tried to open the graphical console for a guest there, I got an unreadable virtual screen, which froze instantly and did no longer react to any input – special commands sent via “virt-manager” to switch to a non-graphical console terminal were ignored. The same happened with “virt-viewer” and/or when I executed virt-manager directly on the graphical screen of the KVM-server.

Independent test with a new installation of a “Leap 15”-guest-system

To find out more I tested a new installation of Leap 15 on my Leap 15 workstation as a KVM-server. I chose a guest system configuration with standard equipment – among other things qxl/spice-components. The installation was started via the “virt-installer” component of virt-manager. I used an ISO-image of the Leap 15 installation image.

First thing I stumbled across was that I had to use a “No KMS” for the text console setting on the first screen of the Opensuse installer (see the options for the graphical setup there; F3-key). Otherwise the installer froze during the first “udev” checks. I knew this effect already from installations on some physical systems. Note that the choice of “No KMS” leads to an kernel parameter entry “nomodeset” in the command line for the kernel call in the Grub2 configuration (see the file /etc/default/grub).

Such a full new installation lead into a void. SuSE’s graphical installer itself worked perfectly with high resolution. However, after a restart of the freshly installed guest-system the switch to the graphical screen lead to a flickering virtual screen and the graphical display of the SDDM login manager never appeared.

Still and luckily, it was possible to login as root and execute a “init 3” command – which brought me to a working, non-flickering ASCII console interface (TTY1).

I had experienced this
type of behavior before, too, on some real physical systems. I recommend Leap users to be prepared: activate ssh-services during installation and open the firewall for ssh-ports! The SuSE-installer allows for such settings on its summary screen for the installation configuration. This gives you a chance to switch to a working text console (init 3) from a remote SSH-command line – if the graphical console does not allow for any input.

Tests 2: Upgrade to latest KVM / QEMU / libvirt – versions of the “Virtualization” repository

An installation of the cutting edge versions of KVM/QEMU and spice sever and client software did not change anything – neither on my dedicated KVM-server-system and its upgraded guests nor on my workstation with the fresh Leap 15 test-guest.

Repair of the older upgraded guest-systems

The emulated “hardware” of my older (upgraded) guest-systems, which stem from an OS 13.2 era, is a bit different from the equipment of the newest test guest. On these older systems I still had the choice to use a cirrus, vga or vmwvga graphics. Interestingly, these drivers do not provide the performance of qxl – but they worked at least with the virtual spice client displays.

One central question was whether the QXL driver – more precisely the corresponding kernel module “qxl” – was loaded at all. Answer for the guests on the central KVM-server: No, it was not.

So, on the first guest I simply tried “modprobe qxl” – and hey – the module loaded. An “init 5” then gave me the aspired graphical screen.

Later I checked that actually no “nomodeset” parameter was set in these guests. So, something went “wrong” with the system’s startup-configuration during the upgrade procedure. I have no real clue why – as the qxl-module was loaded without problems on the previous Leap 42.3 installations.

For Opensuse Leap the wrapper-script “mkinitrd” transfers a running configuration with its loaded kernel modules via “dracut” into a suitable and permanent initramfs/initrd- and systemd-startup configuration. So, issue “mkinitrd” after a successful test of a qxl/spice-interface.

Repair of the freshly installed Leap 15 guest-system

On the freshly installed Leap 15 guest client on my workstation things were slightly different: The qxl-module did not load there. A “modinfo qxl” shows you parameters you can apply. The right thing to do there was to try

modprobe qxl modeset=1

This worked! Then I eliminated the “nomodeset”-parameter from the “GRUB_CMDLINE_LINUX_DEFAULT”-entry in the file “/etc/default/grub” and started “mkinitrd” to get a stable permanent startup-configuration (via dracut) which booted the guest into graphical mode afterwards.

Adjustment of the TTYs and KDE

As soon as you have a reasonable configuration, you may want to adjust the screen dimensions of the text consoles tty1 to tty6, the sddm-login-screen and the KDE or Gnome screen. These are topics for separate articles. See, however, previous articles in this blog on KVM guests for some hints.

For the text console TTYs try reasonable settings for the following entries in the “/etc/default/grub” – below e.g. for a resolution of “1680×1050”:

GRUB_CMDLINE_LINUX_DEFAULT=” …. OTHER PARAMETERS …… video=1680×1050″
GRUB_GFXMODE=”1680×1050″
GRUB_GFXPAYLOAD=keep

(see https://www.suse.com/de-de/support/kb/doc/?id=7017979)

Do not forget to execute “mkinitrd” again afterwards!

For KDE adjustments a user can use the command “systemsettings5” and then specify screen resolutions in the dialog for “display and monitors”.

Conclusion

The graphical installer of Opensuse, but also upgrade-procedures on working virtual KVM/QEMU guests with qxl/spice can lead to situations where KMS is or
must be deactivated both for physical and virtual systems. As a consequence the “qxl-module” may not be loadable automatically afterwards. This can lead to failures during the start of a graphical qxl/spice-interface for local and remote spice-clients.

The remedy is to remove any “nomodeset”-parameter which may be a part of the entry for “GRUB_CMDLINE_LINUX_DEFAULT” in the file “/etc/default/grub”.

For tests of the qxl-driver try loading the qxl-module with “modprobe qxl modeset=1”. After a successful start of a graphical interface use the command “mkinitrd” (whilst the qxl-module is loaded!) to establish a permanent configuration which loads “qxl” during system start.

VMware Workstation – Virtual Ethernet failed – sometimes for good!

I use VMware Workstation as a hypervisor for hosting a few MS Windows guests, which I need in customer projects. As I do not trust Windows systems I sometimes place such guests in different virtual and isolated host-only networks on my workstation or a dedicated server. On other systems I run virtualized Linux server systems for production and tests with the help of KVM/QEMU, LXC and libvirt. Some time ago I learned the hard way that I have to keep track of the different “local” virtual networks more thoroughly. As the VMware side was affected by a misconfiguration in a rather peculiar way I thought this experience might be interesting for others, too.

Host interface to VMware virtual network sometimes not available?

The whole thing coincided with an upgrade of one of my Opensuse workstations. I am always a bit nervous whether and how such an upgrade may impact the VMware WS installation. Quite often I experienced problems with the automatic compilation in the past. In addition the start of some modules lead to secondary errors. However, at a first test VMware WS 14 compiled without problems on a system with Opensuse Leap 15. No wonder, I thought, as the kernel version 4.12 of Leap15 is relatively old.

Then I had to set up a Windows guest. I gave it an IP address within a virtual VMware host-only network, which covered an IPv4 address range of a class C network. VMware uses a kind of virtual bridge to support such a network; normally one associates a virtual network device on the host with it, to which you can assign a certain IP address in the net’s address ange. For a C-net VMware uses yyy.yyy.yyy.1 as a default on the host’s interface (with yyy.yyy.yyy defining the C-network). In my case the device on the host was named “vmnet6”. In the beginning this virtual interface also worked as expected.

However, during the following days I noticed trouble with “vmnet6”. In a normal host configuration the VMware WS initialization script (“/etc/init.d/vmware”) is started as a LSB service by systemd. The script loads VMware modules and configures the defined virtual networks. Unfortunately, relatively often, my “vmnet6” interface did not become directly available after the start of the Linux host. Some experiments showed however that I could circumvent this problem by some “dubious” action:

I had to enter the “Virtual Network Editor” with root rights and save the already existing virtual network again. Then ifconfig or the ip command enlisted interface “vmnet6” – which also seemed to work normally.

You get a strange feeling when you are forced to remedy something as root …. However, I ignored this problem, which I could not solve directly, for a while. I also noticed that the problem did not occur all the time. This should have rang a bell … but I was too stupid. Until, yesterday, when I needed to set up another special MS WIN guest in the same address range.

Virtual Ethernet failed

In addition I upgraded to VMware WS Version 14.1.3. The (automatic) compilation of the VMware WS modules on Opensuse Leap 15 again worked without major problems. But, when I manually (re-) started the VMware modules via “/etc/init.d/vmware restart” I saw an error message:

“Virtual Ethernet failed”.

Such a message makes you nervous because you assume some real big problem with the VMware modules. However, starting some of my VMware guests (not the ones linked to vmnet6) proved that these guests operated flawlessly and could communicate via their and the hosts network devices. But my “vmnet6” did not appear in the output of “ip a s”.

The problem “Virtual Ethernet failed” is reported in some Internet articles; however without a real solution. See e.g. https://ubuntuforums.org/showthread.php?t=1592977; https://www.linuxquestions.org/questions/slackware-14/vmware-workstation-unable-to-start-services-4175547448/; https://communities.vmware.com/thread/264235.

Now, I seriously began to wonder whether and how this problem was related to the already known problem with my virtual device “vmnet6”. Some tests quickly showed that the error message disappeared when I eliminated the host-only network related to “vmnet6”. Then I tried a new host-only test network with a device “vmnet7” and a different IP range. Also then the “virtual Ethernet” started flawlessly. So, what was wrong with “vmnet6” and/or its IP range? Some residual garbage from old installations? A search for configuration files gave me no better ideas.

vnetlib-log

After a minute I thought: Maybe VMware is right. A look into the log-file “/var/log/vnetlib” revealed:

Oct 14 14:22:49 VNL_Load - LOG_ERR logged
Oct 14 14:22:49 VNL_Load - LOG_WRN logged
Oct 14 14:22:49 VNL_Load - LOG_OK logged
Oct 14 14:22:49 VNL_Load - Successfully initialized Vnetlib
...
Oct 14 14:22:49 VNL_StartService - Started "Bridge" service for vnet: vmnet0
Oct 14 14:22:49 VNLPingAndCheckSubnet - Return value of vmware-ping: 0
...
Oct 14 14:22:50 VNL_CheckSubnetAvailability - Subnet: xxx.xxx.xxx.xxx on vnet: vmnet2 is available
...
Oct 14 14:22:49 VNL_CheckSubnetAvailability - Subnet: yyy.yyy.yyy.yyy on vnet: vmnet6 is not available
Subnet on vmnet6 is no longer available for usage, please run the network editor to reconfigure different subnet
Oct 14 14:22:50 VNL_CheckSubnetAvailability - Subnet: xxx.xxx.xxx.xxx on vnet: vmnet7 is available
Oct 14 14:22:50 VNL_CheckSubnetAvailability - Subnet: xxx.xxx.xxx.xxx on vnet: vmnet8 is available
.....

(I have replaced real addresses by xxx and yyy.) So, obviously VMware at startup performs a kind of ping check beyond the locally defined virtual devices and networks. Interesting! Why do they ping?

The answer is trivial: When you set up the (virtual) internal bridge for the host-only network you may want to guarantee that the IP-address for the respective host device (“vmnet6”) does not exist anywhere else in the reachable network – especially as the host’s interface of a host-only network may be used for a host controlled routing (and NAT) of the otherwise isolated VMware guests to the outside world.

Address overlap

And this resolved my problem: A manual ping for the address of the planned host’s interface address yyy.yyy.yyy.1 indeed gave me a result. I located the source on another server, where I sometimes perform tests of different virtual network configurations with KVM guests, LXC containers and libvirt. Unfortunately, I had not disabled all of my test networks after my last tests. Due to default DHCP-settings a virtual interface with address yyy.yyy.yyy.1 got established on the test server whenever it was booted. This also explained the finding that the VMware WS problem did not occur always – it only happened when the test server was active in my physical LAN!

Then I actually remembered that I had had a similar problem once in 2016, whilst experimenting with QEMU/VMware-bridge-coupling. (See: Opensuse/Linux – KVM, VMware WS – virtuelle Brücken zwischen den Welten). So, I should have known better and planned my “local” virtual experiments a bit more carefully to avoid address overlaps with other potential virtual networks on different hosts in the LAN.

Stupid me … The VMware WS startup script was absolutely right to not establish a second address
of that kind on my workstation! This lead to the error message “Virtual ethernet failed” – which in my opinion should include a bit more detailed information or a hint to a log-file. But the fault clearly was on my side: One should not think in terms of a local host environment when using VMware WS for virtual networks, but consider the global network configuration.

Still, the whole VMware handling of a possible IP address overlap left me a bit puzzled :

Why can you by saving the questionable virtual network again establish a host interface despite the fact that another interface with the same address is running somewhere in the network? OK, you are root when you do it – but why is no warning given at this point? (VMware could do a ping check there, too …. )

A second question also worried me: Why did the existence of two devices with the same IP-address did not lead to more chaos in the network? One reason probably was that I had allowed pinging but no general TCP-transport to the virtual device on the test server. And explicit routes were otherwise defined properly on the different hosts. However, I could bet on some problems on the ARP-level when the test server was up and running. Anyway – such a basic misconfiguration in a a network may lead to security holes, too, and should, of course, be avoided.

A third question that came up for a second was: How do you avoid overlaps in case one wants to assign KVM/QEMU-guests and VMware guests IP addresses within the same network address space on one and the same virtualization host? Such a scenario is not at all as far fetched as it may seem at first sight. One reason for such a configuration could be the EU GDPR (DSGVO):

If you need to guarantee a customer confidentiality and are nevertheless forced to use a standard MS Windows client, you have a problem as MS may in an uncontrollable way transfer data to their own servers (outside the EU). Just read the license and maintenance agreements you sign with the operation of a standard Win 10 client! Therefore, you may want to isolate such clients drastically and only allow for communication with certain IP-addresses on the Internet (and NOT with MS servers). You may allow communication with some (virtualized) Linux machines in the same sub-network and one of them may serve as a gateway and perimeter firewall with strict filters. You can build up such scenarios by coupling a QEMU-virtual bridge to a VMware virtual bridge. At the same time, however, you need full control over the DHCP-systems on both sides (besides a bit of scripting) to avoid address overlaps. But this is actually easy: the DHCP-control files on the VMware side are found under the directory “/etc/vmware/vmnetX/dhcpd”, with “X” standing for a virtual VMware interface number. On the QEMU side you find the files at “/etc/libvirt/qemu/networks”. There you can control your IP assignments (or even no assignments for some special interfaces). Ok, but this is the beginning of another story.

Conclusion

Never think “local” or host based when working with virtual networks! Always include local virtual test networks in your documentation of your global network landscape! An do not always just accept the standard address assignment for host interfaces to virtual networks without thinking.