Shrinking an encrypted partition with LVM on LUKS

Posted on 9. November 2018 by eremo

Whilst experimenting with with encrypted partitions I wanted to shrink an “oversized” LUKS-encrypted partition. I was a bit astonished over the amount of steps required; in all the documentations on this topic some points were missing. In addition I also stumbled over the mess of GiB and GB units used in different tools. Safety considerations taking into account the difference are important to avoid data loss. Another big trap was the fact that the tool “parted” expects an end point on the disk given in GB when resizing. I nearly did it wrong.

Otherwise the sequence of steps described below worked flawlessly in my case. I could reboot the resized root-system enclosed in the encrypted partition after having resized everything.

I list up the necessary steps below in a rather brief form, only. I warn beginners: This is dangerous stuff and the risk for a full data loss not only in the partition you want to resize is very high. So, if you want to experiment yourself – MAKE A BACKUP of the whole disk first.

Read carefully through the steps before you do anything. If you are unsure about what the commands mean and what consequences they have do some research on the Internet ahead of any actions. Likewise in case of trouble or error messages.

You should in general be familiar with partitioning, LVM, dm-crypt and LUKS. I take no responsibility for any damage or data loss and cannot guarantee the applicability of the described steps in your situation – you have been warned.

My disk layout

My layout of the main installation was

SSD –> partition 3 –> LUKS –> LVM –> Group “vg1” –> Volume “lvroot” –> ext4 fs [80 GiB]

SSD –> partition 3 –> LUKS –> LVM –> Group “vg1” –> Volume “lvswap” –> swap fs [2 GiB]

The partition had a size around 104 GiB before shrinking. “lvroot” included a bootable root filesystem of 80 GiB in size. The swap volume (2 GiB) will later help us to demonstrate that shrinking may lead to gaps between logical LVM volumes. We shall shrink the file system, its volume, the volume group and also the encrypted the partition in the end.

In my case I could attach the disk in question to another computer where I performed the resizing operation. If you have just one computer available use a Live System from an USB stick or a DVD. Or boot another Linux installation available on one of the other disks of your system. [I recommend to always have a second small bootable installation available on one of the disks of your PC/laptop besides the main installation for daily work :-). This second installation can reside in an encrypted LVM volume (LUKS on LVM), too. It may support administration and save your life one day in case your main installation becomes broken. There you may also keep copies of the LUKS headers and keyfiles …].

Resizing will be done by a sequence of 16 steps – following the disk layout in reverse order as shown above; i.e. we proceed with first resizing the filesystem, then taking care about the LVM layout and eventually changing the partition size.

You open an encrypted partition with LVM on LUKS just as any dm-crypt/LUKS-partition by

cryptsetup open /dev/YOUR_DEVICE… MAPPING-Name

For the mapping name I shall use “cr-ext“. Note that manually closing the encrypted device requires to deactivate the volume groups in the kernel first; in our case

vgchange -a n vg1;
cryptsetup close cr-ext

Otherwise you may not be able to close your device.

A sequence of 16 steps ….

Let us now go through the details of the required steps – one by one.

Step 1: Get an overview over your block devices

You should use

lsblk

In my case the (external) disk appeared as “/dev/sdg” – the encrypted partition was located on “/dev/sdg3”.

***********

Step 2: Open the encrypted partition

cryptsetup open /dev/sdg3 cr-ext

Check that the mapping is done correctly by “la /dev/mapper” (la = alias for “ls -la”). Watch out not only for the “cr-ext”-device, but also for LVM-volumes inside the encrypted partition. They should appear automatically as distinct devices.

***********

Step 3: Get an overview on LVM structure

Use the following standard commands:

pvdisplay
vgdisplay
lvdisplay

“pvdisplay”/”vgdisplay” should show you a PV device “/dev/mapper/cr-ext” with volume group “vg1” (taking full size). If the volume groups are not displayed you may need to make them available by “vgchange –available y” (see the man pages).

“lvdisplay” should show you the logical volumes consistently with the entries in “/dev/mapper”. Note: “lvdisplay” actually shows the volumes as “/dev/vg1/lvroot”, whereas the link in “/dev/mapper” includes the name of the volume group: “/dev/mapper/vg1-lvroot”. In my case the original size of “lvroot” was 80 GiB and the size of the swap “lvswap” was 2 GiB.

***********

Step 4: Check the integrity of the filesystem

Use fsck:

mytux:~ # fsck /dev/mapper/vg1-lvroot 
fsck from util-linux 2.31.1
e2fsck 1.43.8 (1-Jan-2018)
/dev/mapper/vg1-lvroot: clean, 288966/5242880 files, 2411970/20971520 blocks

The filesystem should be clean. Otherwise run “fsck -f” – and answer the questions properly. “fsck” in write mode should be possible as we have not mounted the filesystem. We avoid doing so throughout the whole procedure.

***********

Step 5: Check the physical block size of the filesystem and the used space within the filesystem

Use “fdisk -l” to get the logical and physical block sizes

Disk /dev/mapper/vg1-lvroot: 80 GiB, 85899345920 bytes, 167772160 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Use tunefs to get some block information:

tune2fs -l /dev/mapper/vg1-lvroot

to get the number of total blocks (20971520), blocksize (4096) and free blocks (18559550) – in my case :

Total = 85899345920 Bytes = 85 GB = 80 GiB 
free  = 76019916800 Bytes = 76 GB = 70,8 GiB

If you mount df -h may will show you less available space due to a 5% free limit set => 67G. So, not much space was occupied by my (Opensuse) test installation inside the volume.

***********

Step 6: Plan the reduced volume and filesystem sizes ahead – perform safety and limit considerations

Plan ahead the new diminished size of the LVM-Volume (“lvroot”). Let us say, we wanted 60G instead of the present 80G. Then we must reduce the filesystem [fs] size even more and use a safety-factor of 0.9 => 54G. I stress:

Make the filesystem size at least 10% smaller than the planned logical volume size!

Why this precaution? There may be a difference for the meaning of “G” for the use of the resize commands: “resize2fs” and the “lvresize“. It could be “GB” or “GiB” in either case.

Have a look into the man pages. This difference is really dangerous! What would be the worst case that could happen? We resize the filesystem in GiB and the volume size in GB – then the volume size would be too small.

So, let us assume that “lvresize” works with GB – then the 60G would correspond to 60 * 1024 * 1000 * 1000 Bytes = 6144000000 Bytes. Assume further that “resize2fs” works with GiB instead. Then we would get 54 * 1024 * 1024 * 1024 Bytes = 57982058496 Bytes. We would still have a reserve! We would potentially through away disk space;
however, we will compensate for it afterwards (see below).

If you believe what the present man-pages say: For “resize2fs” the “size” parameter can actually be given in units of “G” – but internally GiB are used. For “lvresize”, however, the situation is unclear.

In addition we have to check the potential reduction range against the already used space! In our case there is no problem (see step 5). However, how far could you maximally shrink if you needed to?

Minimum-consideration: Give the 10 GiB used according to step 5 a good 34% on top. (We always want a free space of 25%). Multiply by a factor of 1.1 to account for potential GB-GiB differences => 15 GiB. This is a rough minimum limit for the minimum size of the filesystem. The logical volume size should again be larger by a factor of 1.1 at least => 16.5 GB.

Having calculated your minimum you choose your actual shrinking size above this limit: You reduce the volume size by something between the 80 GiB and the 16.5 GiB. You set a number – let us say 60 GiB – and then you reduce the filesystem by a factor 0.9 more => 54 GB.

***********

Step 7: Shrink the filesystem

Check first that the fs is unmounted! Otherwise you may run into trouble:

mytux:~ # resize2fs /dev/mapper/vg1-lvroot 60G
resize2fs 1.43.8 (1-Jan-2018)
Filesystem at /dev/mapper/vg1-lvroot is mounted on /mnt2; on-line resizing required
resize2fs: On-line shrinking not supported

So unmount, run fsck again and then resize:

maytux:~ # umount /mnt2
mytux:~ # e2fsck -f /dev/mapper/vg1-lvroot
e2fsck 1.43.8 (1-Jan-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/vg1-lvroot: 288966/5242880 files (0.2% non-contiguous), 2411970/20971520 blocks
mytux:~ # resize2fs /dev/mapper/vg1-lvroot 54G
resize2fs 1.43.8 (1-Jan-2018)
Resizing the filesystem on /dev/mapper/vg1-lvroot to <pre style="padding:8px;"> (4k) blocks.
The filesystem on /dev/mapper/vg1-lvroot is now 14155776 (4k) blocks long.
mytux:~ # e2fsck -f /dev/mapper/vg1-lvroot
e2fsck 1.43.8 (1-Jan-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/vg1-lvroot: 288966/3538944 files (0.3% non-contiguous), 2304043/14155776 blocks

***********

Step 8: Shrink the logical volume

We use “lvreduce” to resize the LVM volume; the option parameter “L” together with a “size” determines how big the volume will become. [ Note that there is a subtle difference if you provide the size with a negative “-” sign! See the man page for the difference!]

mytux:~ # lvreduce -L 60G /dev/mapper/vg1-lvroot 
  WARNING: Reducing active logical volume to 60.00 GiB.
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce vg1/lvroot? [y/n]: y
  Size of logical volume vg1/lvroot changed from 80.00 GiB (20480 extents) to 60.00 GiB (15360 extents).
  Logical volume vg1/lvroot successfully resized.
mytux:~ #

Hey, we worked in GiB – good to know!

***********

Step 9: Extend the fs to the volume size again

We compensate for the lost space of almost 6 GiB:

mytux:~ # resize2fs /dev/mapper/vg1-lvroot 
resize2fs 1.43.8 (1-Jan-2018)
Resizing the filesystem on /dev/mapper/vg1-lvroot to 15728640 (4k) blocks.
The filesystem on /dev/mapper/vg1-lvroot is now 15728640 (4k) blocks long.

Run gparted to get an overview over the present situation.

***********

Step 10: Check for gaps between the volumes of your LVM volume group

Besides the LVM volume for the OS and data related filesystem we had a swap volume, too! As we have not changed it there should be a gap between the volumes now – and indeed:

mytux:~ # pvs -v --segments /dev/mapper/cr-ext
    Wiping internal VG cache
    Wiping cache of LVM-capable devices
  PV                 VG  Fmt  Attr PSize   PFree  Start SSize LV     Start Type   PE Ranges                     
  /dev/mapper/cr-ext vg1 lvm2 a--  103.96g 41.96g     0 15360 lvroot     0 linear /dev/mapper/cr-ext:0-15359    
  /dev/mapper/cr-ext vg1 lvm2 a--  103.96g 41.96g 15360  5120            0 free                                 
  /dev/mapper/cr-ext vg1 lvm2 a--  103.96g 41.96g 20480   512 lvswap     0 linear /dev/mapper/cr-ext:20480-20991
  /dev/mapper/cr-ext vg1 lvm2 a--  103.96g 41.96g 20992  5623            0 free                                 
mytux:~ #

Now, to close the gap we can move the second volume. This requires multiple trials with “pvmove –alloc anywhere”.

Note the information “/dev/mapper/cr-ext:20480-20991” and use this exactly in “pvmove”:

mytux:~ # pvmove --alloc anywhere /dev/mapper/cr-ext:20480-20991
  /dev/mapper/cr-ext: Moved: 0.20%
  /dev/mapper/cr-ext: Moved: 100.00%
mytux:~ # pvs -v --segments /dev/mapper/cr-ext                  
    Wiping internal VG cache
    Wiping cache of LVM-capable devices
  PV                 VG  Fmt  Attr PSize   PFree  Start SSize LV     Start Type   PE Ranges                     
  /dev/mapper/cr-ext vg1 lvm2 a--  103.96g 41.96g     0 15360 lvroot     0 linear /dev/mapper/cr-ext:0-15359    
  /dev/mapper/cr-ext vg1 lvm2 a--  103.96g 41.96g 15360  5632            0 free                                 
  /dev/mapper/cr-ext vg1 lvm2 a--  103.96g 41.96g 20992   512 lvswap     0 linear /dev/mapper/cr-ext:20992-21503
  /dev/mapper/cr-ext vg1 lvm2 a--  103.96g 41.96g 21504  5111            0 free

You may have to apply “pvmove” several times, but each time with the new changed position parameters – be careful !!! (Are your backups OK???).

This may give you some seemingly erratic movements, but eventually, you should see a continuous alignment:

mytux:~ # pvmove --alloc anywhere /dev/mapper/cr-ext:20992-21503
  /dev/mapper/cr-ext: Moved: 1.76%
  /dev/mapper/cr-ext: Moved: 100.00%
mytux:~ # pvs -v --segments /dev/mapper/cr-ext
    Wiping internal VG cache
    Wiping cache of LVM-capable devices
  PV                 VG  Fmt  Attr PSize   PFree  Start SSize LV     Start Type   PE Ranges                     
  /dev/mapper/cr-ext vg1 lvm2 a--  103.96g 41.96g     0 15360 lvroot     0 linear /dev/mapper/cr-ext:0-15359    
  /dev/mapper/cr-ext vg1 lvm2 a--  103.96g 41.96g 15360   512 lvswap     0 linear /dev/mapper/cr-ext:15360-15871
  /dev/mapper/cr-ext vg1 lvm2 a--  103.96g 41.96g 15872 10743            0 free                                 
mytux:~ #

***********

Step 11: Resize/reduce the physical LVM

After having aligned the volumes of the volume group vg1 we now can resize the physical LVM volume supporting the vg1 group. We use “pvresize” for this purpose. “pvresize” works in GiB – but its parameters just have a “G”.
In my example I shrank the physical LVM area down to 80G = 80GiB. This left more than enough space for my two LVM volumes.

mytux:~ # pvresize --setphysicalvolumesize 80G /dev/mapper/cr-ext  
/dev/mapper/cr-ext: Requested size 80.00 GiB is less than real size 103.97 GiB. Proceed?  [y/n]: y
  WARNING: /dev/mapper/cr-ext: Pretending size is 167772160 not 218034943 sectors.
  Physical volume "/dev/mapper/cr-ext" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized
mytux:~ # pvdisplay
....   
  --- Physical volume ---
  PV Name               /dev/mapper/cr-ext
  VG Name               vg1
  PV Size               80.00 GiB / not usable 3.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              20479
  Free PE               4607
  Allocated PE          15872
  PV UUID               eFf9we-......

Again, we obviously worked in GiB – good.

***********

Step 12: Set new size of the encrypted region

It is disputed whether this step is required. LUKS never registers the size of the encrypted area. However, if you had an active mounted partition … In our case, I think the step is not required, but …..

“cryptsetup resize” wants sectors as a size unit. We have to calculate a bit again based on the amount of sectors (each with 512 Byte; see below) used:

mytux:~ # cryptsetup status cr-ext
/dev/mapper/cr-ext is active.
  type:    LUKS1
  cipher:  aes-xts-plain64
  keysize: 512 bits
  key location: dm-crypt
  device:  /dev/sdg3
  sector size:  512
  offset:  65537 sectors
  size:    218034943 sectors
  mode:    read/write

We have to determine the new size in sectors to reduce the size of the LUKS area. I chose:
185329702 blocks (rounded up) – this corresponds to around 88.4 GiB; according to our 10% safety rule this should be sufficient.

mytux:~ # cryptsetup -b 185329702 resize cr-ext
mytux:~ # cryptsetup status cr-ext             
/dev/mapper/cr-ext is active.
  type:    LUKS1
  cipher:  aes-xts-plain64
  keysize: 512 bits
  key location: dm-crypt
  device:  /dev/sdg3
  sector size:  512
  offset:  65537 sectors
  size:    185329702 sectors
  mode:    read/write

***********

Step 13: Reduce the size of the physical partition – a pretty scary step!

This is one of the most dangerous steps. And irreversible … Interestingly, gparted will not let you do a resize. Yast’s partitioner allows for it after several confirmations. I used “parted” in the end.

But especially with parted you have to be extremely careful – parted expects a partition’s end position in GB. This means that to calculate the resulting size correctly you must also look at the partition’s starting position. In the end it is the difference that counts. Read the man pages and have a look into other resources.

mytux:~ # parted /dev/sdg
GNU Parted 3.2
Using /dev/sdg
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print                                                            
Model: USB 3.0 (scsi)
Disk /dev/sdg: 512GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name     Flags
 1      1049kB  16.8MB  15.7MB               primary  bios_grub
 2      33.6MB  604MB   570MB                primary  boot, esp
 3      604MB   112GB   112GB                primary  legacy_boot, msftdata
 4      291GB   312GB   21.5GB                        lvm
 5      312GB   314GB   2147MB               primary  msftdata

(parted) resizepart 
Partition number? 3                                                       
End?  [112GB]? 89GB                                                       
Warning: Shrinking a partition can cause data loss, are you sure you want to continue?
Yes/No? Yes                                                               
(parted) print      
Model: USB 3.0 (scsi)
Disk /dev/sdg: 512GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name     Flags
 1      1049kB  16.8MB  15.7MB               primary  bios_grub
 2      33.6MB  604MB   570MB                primary  boot, esp
 3      604MB   89.0GB  88.4GB               primary  legacy_boot, msftdata
 4      291GB   312GB   21.5GB                        lvm
 5      312GB   314GB   2147MB               primary  msftdata

(parted) q                                                                
Information: You may need to update /etc/fstab.

As you see: We end up with a size of 88.4 GB and NOT 89 GB !!!

In addition we shall see in a second that the GB are NOT GiB here! So our safety factor is really important to get close to the aspired 80 GiB.

***********

Step 14: Set new size of the encrypted region

As we have some space left now in the partition we can enlarge the encryption area and later also the PV size to the possible maximum.
First, we set back the size of the encryption area to the full partition size; though not required in our case – it does not harm:

mytux:~ # cryptsetup  resize cr-ext
mytux:~ # cryptsetup status cr-ext
/dev/mapper/cr-ext is active and is in use.
  type:    LUKS1
  cipher:  aes-xts-plain64
  keysize: 512 bits
  key location: dm-crypt
  device:  /dev/sdg3
  sector size:  512
  offset:  65537 sectors
  size:    172582959 sectors
  mode:    read/write

***********

Step 15: Reset the PV size to the full partition size

We use “pvsize” again without any size parameter. The volume group will follow automatically.

mytux:~ # pvresize  /dev/mapper/cr-ext    
  Physical volume "/dev/mapper/cr-ext" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized 
mytux:~ # pvdisplay
...  
  --- Physical volume ---
  PV Name               /dev/mapper/cr-ext
  VG Name               vg1
  PV Size               82.29 GiB / not usable 1000.50 KiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              21067
  Free PE               5195
  Allocated PE          15872
  PV UUID               eFf9we-......
..
mytux:~ # vgdisplay
....
  --- Volume group ---
  VG Name               vg1
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  16
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               82.29 GiB
  PE Size               4.00 MiB
  Total PE              21067
  Alloc PE / Size       15872 / 62.00 GiB
  Free  PE / Size       5195 / 20.29 GiB
  VG UUID               dZakZi-......

Our logical volumes are alive, too, and have the right size.

mytux:~ # lvdisplay
...
  --- Logical volume ---
  LV Path                /dev/vg1/lvroot
  LV Name                lvroot
  VG Name                vg1
  LV UUID                5cDvmf-......
  LV Write Access        read/write
  LV Creation host, time install, 2018-11-06 15:33:52 +0100
  LV Status              available
  # open                 0
  LV Size                60.00 GiB
  Current LE             15360
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     1024
  Block device           254:17
   
  --- Logical volume ---
  LV Path                /dev/vg1/lvswap
  LV Name                lvswap
  VG Name                vg1
  LV UUID                lCU75L-......
  LV Write Access        read/write
  LV Creation host, time install, 2018-11-06 18:21:11 +0100
  LV Status              available
  # open                 0
  LV Size                2.00 GiB
  Current LE             512
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     1024
  Block device           254:18

And:

mytux:~ # pvs -v --segments /dev/mapper/cr-ext
    Wiping internal VG cache
    Wiping cache of LVM-capable devices
  PV                 VG  Fmt  Attr PSize  PFree  Start SSize LV     Start Type   PE Ranges                     
  /dev/mapper/cr-ext vg1 lvm2 a--  82.29g 20.29g     0 15360 lvroot     0 linear /dev/mapper/cr-ext:0-15359    
  /
dev/mapper/cr-ext vg1 lvm2 a--  82.29g 20.29g 15360   512 lvswap     0 linear /dev/mapper/cr-ext:15360-15871
  /dev/mapper/cr-ext vg1 lvm2 a--  82.29g 20.29g 15872  5195            0 free

We also check with gparted:

Everything is fortunately Ok!

***********

Step 16: Closing and leaving the encrypted device

mytux:~ # vgchange -a n vg1

  0 logical volume(s) in volume group "vg1" now active
mytux:~ # 
mytux:~ # cryptsetup close cr-ext
mytux:~ #

Conclusion

Shrinking an encrypted Luks partition which is used as a physical LVM volume and contains a LVM group with volumes is no rocket science. One has to apply the necessary shrinking steps starting from the innermost objects to the containers around them. I.e, we start shrinking the LVM volumes first, then take care of the LVM volume group and the LVM PV. Eventually, we deal with the encryption area and the hard disk partition.

During each step one has to be careful regarding the tools and sizing units: For each tool it has to be clarified whether it works in GB or GiB. Safety margins during each shrinking step have to be calculated and to be taken into account.

Links

https://www.rootusers.com/lvm-resize-how-to-decrease-an-lvm-partition/
https://wiki.archlinux.org/index.php/Resizing_LVM-on-LUKS
https://blog.shadypixel.com/how-to-shrink-an-lvm-volume-safely/
https://unix.stackexchange.com/questions/41091/how-can-i-shrink-a-luks-partition-what-does-cryptsetup-resize-do
LVM-fragmentation
See the discussion in the following link:
https://askubuntu.com/questions/196125/how-can-i-resize-an-lvm-partition-i-e-physical-volume
https://www.linuxquestions.org/questions/linux-software-2/how-do-i-lvm2-defrag-or-move-based-on-logical-volumes-689335/
Introduction into LVM
https://www.tecmint.com/create-lvm-storage-in-linux/
https://www.tecmint.com/extend-and-reduce-lvms-in-linux/

Cryptsetup close – not working for LVM on LUKS – device busy

Posted on 8. November 2018 by eremo

I am working right now on some blog posts on a full dmcrypt/LUKS-encryption of Laptop SSDs. Whilst experimenting I stumbled across the following problem:

Normally, you would open an encrypted device by

cryptsetup open /dev/YourDevice cr-YourMapperLabel

(You have to replace the device-names and the mapper-labels by your expressions, of course). You would close the device by

cryptsetup close cr-YourMapperLabel

In my case I had created a test device “/dev/sdg3”; the mapper label was “cr-sg3”. However, whenever I tried to close this special crypto-device I got the following message:

mytux:~ # cryptsetup close  cr-g3
Device cr-g3 is still in use.

Ops, had I forgotten a mount or was something operating on the filesystem inside cr-sg3 still? Unfortunately, lsof did not show anything. No process seemed to block the device …

mytux:~ # lsof /dev/mapper/cr-g3
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1111/gvfs
      Output information may be incomplete.

But

mytux:~ # dmsetup info -C  
Name                Maj Min Stat Open Targ Event  UUID                                                                
....
cr-g3               254  16 L--w    2    1      0 CRYPT-LUKS1-Some_UUID_Info-cr-g3                  
....

Note the “2” in the Open column. Clearly, there was a count of 2 of “something” that had a firm grip on the device. I also tried

mytux:~ # dmsetup remove -f cr-g3
device-mapper: remove ioctl on cr-g3  failed: Device or resource busy
Command failed

After this trial the dmsetup table even showed an error:

mytux:~ # dmsetup table
...
cr-g3: 0 218034943 error 
...

What the hack? On the Internet I found numerous messages related to similar experiences – but no real solution.

https://www.linuxquestions.org/ questions/ linux-kernel-70/ device-mapper-remove-ioctl-on-failed-device-or-resource-busy-4175481642/
https://serverfault.com/ questions/ 824425/ cryptsetup-cannot-close-mapped-device
https://gitlab.com/ cryptsetup/ cryptsetup/ issues/315
https://askubuntu.com/ questions/ 872044/ mounting-usb-disk-with-luks-encrypted-partion-fails-with-a-cryptsetup-device-al

Whatever problems these guys had – I tested with other quickly created encrypted filesystems. In most cases I had no problems to close them again. It took me a while to figure out that the cause of my problems was LVM2! Stupid me! Actually, I had defined a LVM volume group “vg1” inside the encrypted partition – on purpose. The related volumes could, of course, be seen in the “/dev/mapper/” directory

mytux:~ # la /dev/mapper 
total 0
drwxr-xr-x  2 root root     440 Nov  8 15:53 .
drwxr-xr-x 28 root root   12360 Nov  8 15:53 ..
crw-------  1 root root 10, 236 Nov  8 15:46 control
lrwxrwxrwx  1 root root       8 Nov  8 16:12 cr-g3 -> ../dm-16
...
lrwxrwxrwx  1 root root       8 Nov  8 15:54 vg1-lvroot -> ../dm-17
lrwxrwxrwx  1 root root       8 Nov  8 15:53 vg1-lvswap -> ../dm-18
...

and also in “dmsetup”

mytux:~ # dmsetup info -C  
Name                Maj Min Stat Open Targ Event  UUID                                                                
.... 
vg1-lvswap          254  18 L--w    0    1      0 LVM-Some_UUID_Info
vg1-lvroot          254  17 L--w    0    1      0 LVM-Some_UUID_Info
....
cr-g3               254  16 L--w    2    1      0 CRYPT-LUKS1-Some_UUID_Info-cr-g3                  
....

So do not let yourself get confused by the fact that “lsof” may not help you to detect processes which access the crypto devices in question! It may be the kernel (with its LVM component) itself! (triggered by udev …). This problem will always appear when you work with LVM on top of LUKS. It also explained the count of 2. The solution was simple afterwards:

mytux:~ # vgchange -a n vg1
  0 logical volume(s) in volume group "vg1" now active
mytux:~ # cryptsetup close cr-g3
mytux:~ #

Conclusion:
Always remember to explicitly deactivate LVM volume groups that may exist on LUKS encrypted partitions before closing the crypto-device! Have a look with the commands “lvs” or “vgdisplay” first!

Addendum 12.11.2018:
The last days I have read a bit more on crypto-partitions and volumes on the Internet. I came across an article which describes the deactivation of the volume groups before closing the crypto-device clearly and explicitly:
https://blog.sleeplessbeastie.eu/ 2015/11/16/ how-to-mount-encrypted-lvm-logical-volume/

Ausgehende HTTP-Anfragen von E-Mail Servern, Spamassassin und sa-learn

Posted on 20. October 2018 by Ralph Mönchmeyer

Es ist grundsätzlich gut, Server abzuschotten. Das gilt natürlich auch für E-Mail-Server unter Linux. Wie meine Leser wissen, ist meine Politik dabei regelmäßig die, nicht nur eingehende sondern auch ausgehende Verbindungen kritischer Systeme so weit wie möglich einzuschränken.

Vor kurzem hatte ich deshalb die Möglichkeiten für Verbindungen von unseren (virtualisierten) Mailservern im LAN ins Internet weiter begrenzt. Das führte dann leider zum Ausfall des Spamassassin-Daemons auf einem Server; da auf dem betroffenen System AmaVis nicht lief, hatte das durchaus spürbare Konsequenzen. Der Grund für den Ausfall ist vielleicht auch für andere Freelancer und Entwickler interessant, die eigene Mailserver im Hausnetz betreiben.

Notwendige ausgehende Verbindungen

Zur Sicherstellung der Kern-Funktionalitäten eines Mailservers im eigenen LAN (mit virtualisierten Segmenten) sind ja eigentlich nur sehr wenige ausgehende Verbindungen erforderlich:

Man muss Mails von Servern diverser Provider oder ggf. auch eigenen Servern, die im Internet positioniert sind, abholen. Das ist unter Linux durch Nutzung von Fetchmail und des IMAP- oder POP-Protokolls möglich.
Man muss Mails über Relay-Server bei Providern (oder eigene Relay-SMTP-Server im Internet) versenden. Hierzu reicht ein eigener MTA im LAN, der SMTP beherrscht. Unter Linux kommt etwa Postfix in Frage.

Die zu adressierenden Server sind von ihrer Anzahl her in der Regel überschaubar. Die zu nutzenden Ports ergeben sich aus der Absicherung der Verbindung mit STARTTLS/TLS/SSL – je nachdem, was der Provider so anbietet. Also kein Problem mit der Begrenzung der Verbindungsaufnahme durch einen Perimeter-Paketfilter im Segment der Mailserver und/oder an der LAN-Grenze vor dem Router ins Internet. (Natürlich kann auch eine lokale Firewall auf dem Linux-Host die Filterung stützen.)

Natürlich sind das aber nicht alle ausgehenden Verbindungen eines MTA wie Postfix im LAN. Hinzu kommen ggf. auch Verbindungen zu MDAs im LAN – etwa Dovecut oder Cyrus. Oft genug sitzen die aber auf demselben Linux-Host oder im gleichen LAN-Segment. Diese zu Systemen im LAN ausgehenden sowie auch eingehende Verbindungen durch Mail-Clients sind in einem strukturierten, segmentierten LAN leicht durch interne Firewalls an der Grenze von Segmenten zu kontrollieren.

Zusätzliche HTTP- odr HTTPS-Verbindungen ins Internet

Leider reicht eine Begrenzung auf Zieladressen von definierten IMAP- und SMTP-Servern im Internet nicht. Der Hauptgrund ist folgender:

Der Linux-Host, der den Unterbau für die Mail-Server-Komponenten (z.B. Postfix, Dovecot, Cyrus, Amavis, Spamassassin, Virusfilter plus TLS-, SASL-, LDAP-Komponenten) stellt, muss als solcher regelmäßig upgedated werden. Für den Freelancer ist die bequemste Methode sicher die, sich die notwendigen Pakete über einen Paketmanager von Repository-Servern des Linux-Distributors seines Vertrauens oder entsprechenden Mirror-Servern herunter zu laden. Auch die Anzahl solcher Server ist begrenzt. Typischerweise kommt für den Pakettransfer das HTTP- oder das HTTPS-Protokoll zum Einsatz. Aus Sicherheitsperspektive gibt es Vor- und Nachteile beider Varianten (sichere Server-Authentifizierung vs. nicht mehr einsehbarer Datentransfer über die nach außen initiierte Verbindung). In jedem Fall kann man aber auch hier die Zieladressen begrenzen UND den HTTP(S)-Kanal ggf. nur zeitweise öffnen. Konsequenterweise habe ich ausgehende HTTP(S)-Verbindungen der Mail-Server beschnitten.

Erwähnt sei an dieser Stelle der Vollständigkeit halber auch Folgendes:

Je nach Ausbau der Spam- und Virus-Bekämpfung kommen weitere Komponenten ins Spiel (Spamassassin, Razor2, Pyzor, …). Dieser Punkt ist dann schon ein wenig heikler. So schön und hilfreich Razor2 und Pyzor von ihrer Grundidee her auch sein mögen; man überträgt bei der Nutzung
solcher Dienste mindestens mal Hashes, die Inhalte in besonderer Weise differenzieren und vergleichbar machen; im ersten Fall an kommerzielle und abgeschottete Server in den USA. Genutzt werden im Fall von Razor2 die TCP-Ports 2703, ggf. auch Port 7 für ausgehende Verbindungen. Pyzor nutzt UDP-Port 24441; ich möchte mich hier nicht weiter über Sicherheitsprobleme offener UDP-Kanäle auslassen :-). Im Zuge der DSGVO sollte man beim Mailaustausch mit Kunden aber in jedem Fall mal nachfragen, ob der Kunde das Versenden von Hashes zu seinen Mails an Razor/Pyzor-Server erlauben will.

Ich betreibe deswegen eine eigene (virtualisierte) Mail-Server-Instanz für den kritischen Mailaustausch mit Kunden; dabei nutze ich ferner spezielle Mail-Accounts bei Providern. Der zugehörige MTA unterlässt den Einsatz von Razor2/Pyzor; das ist sinnvoll, da von Kunden ja hoffentlich keine SPAM-Mails kommen.

Hinweis: Deaktivieren muss man die Nutzung von Razor/Pyzor im Rahmen von Spamassassin übrigens durch Auskommentieren folgender Zeilen in der Konfigurationsdatei “/etc/mail/spamassassin/v310.pre” (die kennt auch nicht jeder 🙂 ):
“loadplugin Mail::SpamAssassin::Plugin::Razor2” und “loadplugin Mail::SpamAssassin::Plugin::Pyzor”.

Von Bedeutung sind bei Einsatz eines Virenscanners auch auch ausgehende HTTP(S)-Verbindungen zu Update-Servern. Im Fall von clamav etwa zu: db.de.clamav.net, database.clamav.net.

Fehler beim Start des Spamassassin-Dämons “spamd” wegen “sa-learn”

Wir nutzen auf unseren Linux-Systemen primär Spamasassin zur Spam-Bekämpfung. Abhängig von der Linux-Distribution, vom Einsatz von AmaVis und eigenen Einstellungen wird auf einem Mailserver dabei der Spamassassin-Daemon gestartet oder nicht.

Amavis nutzt Spamassassin über ein eigenes internes (perl-basiertes) Modul (siehe https://wiki.centos.org/HowTos/Amavisd oder https://help.ubuntu.com/community/PostfixAmavisNew). Obwohl spamd für Amavis nicht gestartet werden muss, greift AmaVis aber auf die vorhandene Spamassassin-Installation und bestimmte zugehörige Einstellungen zu; u.a. Update-Einstellungen (s.u.). Man darf die Spamassassin-Pakete also nicht deinstallieren, wenn man AmaVis in seinen MTA integriert.

Zudem bietet eine zusätzlich laufende Instanz des Spamasassin-Daemons weitere Filtermöglichkeiten, die man etwa im Rahmen von procmail nutzen kann. (Siehe hierzu etwa https://serverfault.com/questions/845712/what-to-do-with-spamassassin-after-installing-amavis und die dortigen Kommentare).

Auf einem meiner Mail-Server wird der Spamassassin-Dämon “spamd” jedenfalls explizit gestartet. Nach dem letzten Hochfahren der betreffenden (virtuellen) Server-Instanz musste ich allerdings ein komisches Verhalten der gesamten Mail-Installation feststellen, was sich nach ein wenig Analyse auf den gescheiterten Start des “spamd“-Dämons zurückführen ließ. Was war da los?

Ich musste eine Weile graben, bis mir klar wurde, dass das Starten des spamd-Services auf einem Suse-Unterbau einige Dinge ablaufen, die man bei der KOnfiguration von Paketfilter-Einstellungen besser nicht vergessen oder ignorieren sollte:

So zeigt die systemd-Service-Konfiguration für “spamd” etwa Folgendes:

mymails # cat /usr/lib/systemd/system/spamd.service
# This file is part of package amavisd.
#
# Copyright (c) 2011 SuSE LINUX Products GmbH, Germany.
# Author: Werner Fink
# Please send feedback to http://www.suse.de/feedback
#
# Description:
#
#  Used to start the spamd the daemonized version of spamassassin
#       spamassassin adds a header line that shows if the mail has been
#       determined 
spam or not. This way, you can decide what to do with the
#       mail within the scope of your own filtering rules in your MUA (Mail
#       User Agent, your mail program) or your LDA (Local Delivery Agent).
#

[Unit]
Description=Daemonized version of spamassassin
Wants=remote-fs.target network.target
After=remote-fs.target network.target
Before=mail-transfer-agent.target

[Service]
Type=forking
PIDFile=/var/run/spamd.pid
ExecStartPre=/bin/bash -c "sa-update || true"
EnvironmentFile=-/etc/sysconfig/spamd
ExecStart=/usr/sbin/spamd $SPAMD_ARGS -r /var/run/spamd.pid
ExecReload=/usr/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

Die interessante Zeile ist hier “ExecStartPre=/bin/bash -c “sa-update || true”. Sie legt fest, dass “sa-update” vorab gestartet werden soll.

“sa-update” hat den Zweck, bestimmte Regeln für “spamassassin” automatisch zu erneuern. Dazu wird auf sog. “channels” zugegriffen, hinter denen bestimmte Server im Internet stehen. Siehe hierzu:
https://spamassassin.apache.org/full/3.1.x/doc/sa-update.html. Die Regeln etc. – und damit deren Erneuerung – werden übrigens auch von AmaVis genutzt!. “sa-update” ist somit auch für ein MTA-System mit AmaVis-Dienst relevant; ggf. verlässt sich die Installation dabei aber auf einen durch cron veranlassten Start!

Leider hatte ich “sa-update” in meinem Kopf völlig verdrängt. Da sich die Fehlermeldungen, die ein “systemctl status spamd.service” ausspuckte u.a. auf die nicht erreichbare Adresse “spamassassin.apache.org” und auch “1.4.3.updates.spamassassin.org” bezogen, war klar, dass ich die Begrenzung der ausgehenden Verbindungen wohl etwas übertrieben hatte.

sa-update kann man auch manuell ausführen. Zum Testen eignet sich der Befehl “sa-update –refreshmirrors -vvvvv -D“:

mymails # sa-update --refreshmirrors -vvvvv -D
Oct 20 13:44:59.164 [4580] dbg: logger: adding facilities: all
Oct 20 13:44:59.164 [4580] dbg: logger: logging level is DBG
Oct 20 13:44:59.164 [4580] dbg: generic: SpamAssassin version 3.4.1
Oct 20 13:44:59.164 [4580] dbg: generic: Perl 5.018002, PREFIX=/usr, DEF_RULES_DIR=/usr/share/spamassassin, LOCAL_RULES_DIR=/etc/mail/spamassassin, LOCAL_STATE_DIR=/var/lib/spamassassin
Oct 20 13:44:59.164 [4580] dbg: config: timing enabled
Oct 20 13:44:59.171 [4580] dbg: config: score set 0 chosen.
Oct 20 13:44:59.181 [4580] dbg: generic: sa-update version svn1652181
Oct 20 13:44:59.181 [4580] dbg: generic: using update directory: /var/lib/spamassassin/3.004001
.....
....
Oct 20 13:44:59.588 [4580] dbg: channel: attempting channel updates.spamassassin.org
Oct 20 13:44:59.588 [4580] dbg: channel: using existing directory /var/lib/spamassassin/3.004001/updates_spamassassin_org
Oct 20 13:44:59.589 [4580] dbg: channel: channel cf file /var/lib/spamassassin/3.004001/updates_spamassassin_org.cf
Oct 20 13:44:59.589 [4580] dbg: channel: channel pre file /var/lib/spamassassin/3.004001/updates_spamassassin_org.pre
Oct 20 13:44:59.590 [4580] dbg: channel: metadata version = 1844313, from file /var/lib/spamassassin/3.004001/updates_spamassassin_org.cf
....
.....

Das oben ist ein Auszug aus einer funktionierenden Konfiguration. Auf dem fehlerhaften System kamen dagegen viele Fehlermeldungen zu diversen durchprobierten channel-Verbindungen. Das betraf dann doch eine ganze Reihe von Servern – wie sich später herausstellte waren das Mirror-Server. Ein Blick in das Protokoll der Firewalls offenbarte dann, dass zu all diesen Servern der Reihe nach wiederholte “http”-Anfragen abgesetzt wurden. Das bestätigte, dass eine zu rigorose Unterbindung von ausgehenden HTTP-Verbindungen für einen Mail-Server mit Spamassassin nicht unbedingt gut ist.

Welche Haupt- und Mirror-
Server für “sa-update” sind in der Firewall zu berücksichtigen?

Nun will man ja nicht unbedingt Fehlerprotokolle und Firewall-Logs studiweren, um herauszufinden, zu welchen sa-channel-Servern man HTTP-Verbindungen zulassen sollte. Dass der Hauptchannel “updates.spamassassin.org” sein soll, geht aus der sa-update Doku zwar hervor, nicht aber der oder die zugehörige Server. Ein “ping updates.spamassassin.org” bringt jedenfalls nichts.

Der aktuelle Server, der im Moment tatsächlich (per DNS!) angefragt wird, ist vielmehr “1.4.3.updates.spamassassin.org”. Darauf sollte man sich aber auf Dauer nicht verlassen. Was ist also bzgl. der Firewall zu tun?

Folgende DNS-Namen sind für Firewall-Regeln wichtig:

Ein relevanter Server für HTTP ist in jedem Fall: spamassassin.apache.org
Weitere Mirror-Server für ausgehende HTTP-Verbindungen sind z.Z. gelistet unter
“/var/lib/spamassassin/3.004001/updates_spamassassin_org/MIRRORED.BY”.
Da kann man ja einige vertrauenswürdige DNS-Namen zulassen. Z.B. “sa-update.spamassassin.org”.

Die zuständige Firewall sollte so konfiguriert werden, dass die IP-Adressen für oben angegebenen DNS-Namen zur Laufzeit aufgelöst werden können. Mit iptables ist das problemlos möglich.

Bzgl. der rules und der Datei MIRRORED.BY noch ein Tip: Man sollte nach Server-Updates oder gar Upgrades immer mal kontrollieren, ob unter “/var/lib/spamassassin/” nicht ein neueres Verzeichnis des Typus 3.00X000Y mit Konfigurations- und Rules-Files aufgetaucht ist. Das relevante Verzeichnis verändert sich im Lauf der Jahre; die alten Verzeichnisse werden oft aber nicht gelöscht.

Bzgl. der Firewall ist in jedem Falle aber auch Folgendes zu beachten:

Der Mailserver muss grundsätzlich DNS-Abfragen stellen können.

Nicht nur, um ggf. die obigen Namen auflösen zu können; sondern auch weil die Überprüfung der Update-Notwendigkeit über DNS-Abfragen erfolgt. Man schaue sich mal mit Wireshark die Kommunikation an! Das Zulassen von DNS in der Firewall erscheint auf den ersten Blick trivial. Aber wer die Kommunikation seines Mail-Server begrenzt, arbeitet ggf. bzgl. SMTP, IMAP, etc. zu externen Servern direkt mit IP-Adressen und lässt DNS außen vor. Im besonderen, wenn diese Server im Internet selbst verwaltete sind.

Dauerhaft laufender Mail-Server und sa-update

Natürlich muss der Update Dienst auch auf einem dauerhaft laufenden System funktionieren. Das wird mittels des “cron”-Dienstes erledigt. Hier ist z.B. auch unter Opensuse ein Eintrag für “sa-update” vorhanden. Vorgaben, die das Update-Verhalten für spamd steuern, findet man bei Opensuse übrigens in der Datei “/etc/sysconfig/spamd”.

Fazit

Bei der Einschränkung ausgehender Verbindungen von E-Mail-Servern sollte man nicht vergessen, dass Spamassassin ebenso wie Virenscanner automatische Updates von grundlegenden Filterregeln vornehmen. In vielen Linux-Distributionen ist ein solcher Update sogar täglich vorgesehen. Für Spamassassin sind in diesem Zusammenhang HTTP-Verbindungen zu “channel”-Servern zu ermöglichen. Daneben sind DNS-Abfragen zuzulassen. Das Scheitern des Startens von “spamd” beim Hochfahren von Mailservern aufgrund von Firewall-Beschränkungen ist möglich.

Links

https://wiki.centos.org/HowTos/Amavisd
https://help.ubuntu.com/community/PostfixAmavisNew
https://serverfault.com/questions/845712/what-to-do-with-spamassassin-after-installing-amavis
https://www.howtoforge.com/community/threads/
spamassassin-automated-update.72393/
https://www.faqforge.com/linux/keep-the-spamassassin-filter-rules-up-to-date-in-ispconfig-3/

VMware Workstation – Virtual Ethernet failed – sometimes for good!

Posted on 16. October 2018 by Ralph Mönchmeyer

I use VMware Workstation as a hypervisor for hosting a few MS Windows guests, which I need in customer projects. As I do not trust Windows systems I sometimes place such guests in different virtual and isolated host-only networks on my workstation or a dedicated server. On other systems I run virtualized Linux server systems for production and tests with the help of KVM/QEMU, LXC and libvirt. Some time ago I learned the hard way that I have to keep track of the different “local” virtual networks more thoroughly. As the VMware side was affected by a misconfiguration in a rather peculiar way I thought this experience might be interesting for others, too.

Host interface to VMware virtual network sometimes not available?

The whole thing coincided with an upgrade of one of my Opensuse workstations. I am always a bit nervous whether and how such an upgrade may impact the VMware WS installation. Quite often I experienced problems with the automatic compilation in the past. In addition the start of some modules lead to secondary errors. However, at a first test VMware WS 14 compiled without problems on a system with Opensuse Leap 15. No wonder, I thought, as the kernel version 4.12 of Leap15 is relatively old.

Then I had to set up a Windows guest. I gave it an IP address within a virtual VMware host-only network, which covered an IPv4 address range of a class C network. VMware uses a kind of virtual bridge to support such a network; normally one associates a virtual network device on the host with it, to which you can assign a certain IP address in the net’s address ange. For a C-net VMware uses yyy.yyy.yyy.1 as a default on the host’s interface (with yyy.yyy.yyy defining the C-network). In my case the device on the host was named “vmnet6”. In the beginning this virtual interface also worked as expected.

However, during the following days I noticed trouble with “vmnet6”. In a normal host configuration the VMware WS initialization script (“/etc/init.d/vmware”) is started as a LSB service by systemd. The script loads VMware modules and configures the defined virtual networks. Unfortunately, relatively often, my “vmnet6” interface did not become directly available after the start of the Linux host. Some experiments showed however that I could circumvent this problem by some “dubious” action:

I had to enter the “Virtual Network Editor” with root rights and save the already existing virtual network again. Then ifconfig or the ip command enlisted interface “vmnet6” – which also seemed to work normally.

You get a strange feeling when you are forced to remedy something as root …. However, I ignored this problem, which I could not solve directly, for a while. I also noticed that the problem did not occur all the time. This should have rang a bell … but I was too stupid. Until, yesterday, when I needed to set up another special MS WIN guest in the same address range.

Virtual Ethernet failed

In addition I upgraded to VMware WS Version 14.1.3. The (automatic) compilation of the VMware WS modules on Opensuse Leap 15 again worked without major problems. But, when I manually (re-) started the VMware modules via “/etc/init.d/vmware restart” I saw an error message:

“Virtual Ethernet failed”.

Such a message makes you nervous because you assume some real big problem with the VMware modules. However, starting some of my VMware guests (not the ones linked to vmnet6) proved that these guests operated flawlessly and could communicate via their and the hosts network devices. But my “vmnet6” did not appear in the output of “ip a s”.

The problem “Virtual Ethernet failed” is reported in some Internet articles; however without a real solution. See e.g. https://ubuntuforums.org/showthread.php?t=1592977; https://www.linuxquestions.org/questions/slackware-14/vmware-workstation-unable-to-start-services-4175547448/; https://communities.vmware.com/thread/264235.

Now, I seriously began to wonder whether and how this problem was related to the already known problem with my virtual device “vmnet6”. Some tests quickly showed that the error message disappeared when I eliminated the host-only network related to “vmnet6”. Then I tried a new host-only test network with a device “vmnet7” and a different IP range. Also then the “virtual Ethernet” started flawlessly. So, what was wrong with “vmnet6” and/or its IP range? Some residual garbage from old installations? A search for configuration files gave me no better ideas.

vnetlib-log

After a minute I thought: Maybe VMware is right. A look into the log-file “/var/log/vnetlib” revealed:

Oct 14 14:22:49 VNL_Load - LOG_ERR logged
Oct 14 14:22:49 VNL_Load - LOG_WRN logged
Oct 14 14:22:49 VNL_Load - LOG_OK logged
Oct 14 14:22:49 VNL_Load - Successfully initialized Vnetlib
...
Oct 14 14:22:49 VNL_StartService - Started "Bridge" service for vnet: vmnet0
Oct 14 14:22:49 VNLPingAndCheckSubnet - Return value of vmware-ping: 0
...
Oct 14 14:22:50 VNL_CheckSubnetAvailability - Subnet: xxx.xxx.xxx.xxx on vnet: vmnet2 is available
...
Oct 14 14:22:49 VNL_CheckSubnetAvailability - Subnet: yyy.yyy.yyy.yyy on vnet: vmnet6 is not available
Subnet on vmnet6 is no longer available for usage, please run the network editor to reconfigure different subnet
Oct 14 14:22:50 VNL_CheckSubnetAvailability - Subnet: xxx.xxx.xxx.xxx on vnet: vmnet7 is available
Oct 14 14:22:50 VNL_CheckSubnetAvailability - Subnet: xxx.xxx.xxx.xxx on vnet: vmnet8 is available
.....

(I have replaced real addresses by xxx and yyy.) So, obviously VMware at startup performs a kind of ping check beyond the locally defined virtual devices and networks. Interesting! Why do they ping?

The answer is trivial: When you set up the (virtual) internal bridge for the host-only network you may want to guarantee that the IP-address for the respective host device (“vmnet6”) does not exist anywhere else in the reachable network – especially as the host’s interface of a host-only network may be used for a host controlled routing (and NAT) of the otherwise isolated VMware guests to the outside world.

Address overlap

And this resolved my problem: A manual ping for the address of the planned host’s interface address yyy.yyy.yyy.1 indeed gave me a result. I located the source on another server, where I sometimes perform tests of different virtual network configurations with KVM guests, LXC containers and libvirt. Unfortunately, I had not disabled all of my test networks after my last tests. Due to default DHCP-settings a virtual interface with address yyy.yyy.yyy.1 got established on the test server whenever it was booted. This also explained the finding that the VMware WS problem did not occur always – it only happened when the test server was active in my physical LAN!

Then I actually remembered that I had had a similar problem once in 2016, whilst experimenting with QEMU/VMware-bridge-coupling. (See: Opensuse/Linux – KVM, VMware WS – virtuelle Brücken zwischen den Welten). So, I should have known better and planned my “local” virtual experiments a bit more carefully to avoid address overlaps with other potential virtual networks on different hosts in the LAN.

Stupid me … The VMware WS startup script was absolutely right to not establish a second address
of that kind on my workstation! This lead to the error message “Virtual ethernet failed” – which in my opinion should include a bit more detailed information or a hint to a log-file. But the fault clearly was on my side: One should not think in terms of a local host environment when using VMware WS for virtual networks, but consider the global network configuration.

Still, the whole VMware handling of a possible IP address overlap left me a bit puzzled :

Why can you by saving the questionable virtual network again establish a host interface despite the fact that another interface with the same address is running somewhere in the network? OK, you are root when you do it – but why is no warning given at this point? (VMware could do a ping check there, too …. )

A second question also worried me: Why did the existence of two devices with the same IP-address did not lead to more chaos in the network? One reason probably was that I had allowed pinging but no general TCP-transport to the virtual device on the test server. And explicit routes were otherwise defined properly on the different hosts. However, I could bet on some problems on the ARP-level when the test server was up and running. Anyway – such a basic misconfiguration in a a network may lead to security holes, too, and should, of course, be avoided.

A third question that came up for a second was: How do you avoid overlaps in case one wants to assign KVM/QEMU-guests and VMware guests IP addresses within the same network address space on one and the same virtualization host? Such a scenario is not at all as far fetched as it may seem at first sight. One reason for such a configuration could be the EU GDPR (DSGVO):

If you need to guarantee a customer confidentiality and are nevertheless forced to use a standard MS Windows client, you have a problem as MS may in an uncontrollable way transfer data to their own servers (outside the EU). Just read the license and maintenance agreements you sign with the operation of a standard Win 10 client! Therefore, you may want to isolate such clients drastically and only allow for communication with certain IP-addresses on the Internet (and NOT with MS servers). You may allow communication with some (virtualized) Linux machines in the same sub-network and one of them may serve as a gateway and perimeter firewall with strict filters. You can build up such scenarios by coupling a QEMU-virtual bridge to a VMware virtual bridge. At the same time, however, you need full control over the DHCP-systems on both sides (besides a bit of scripting) to avoid address overlaps. But this is actually easy: the DHCP-control files on the VMware side are found under the directory “/etc/vmware/vmnetX/dhcpd”, with “X” standing for a virtual VMware interface number. On the QEMU side you find the files at “/etc/libvirt/qemu/networks”. There you can control your IP assignments (or even no assignments for some special interfaces). Ok, but this is the beginning of another story.

Conclusion

Never think “local” or host based when working with virtual networks! Always include local virtual test networks in your documentation of your global network landscape! An do not always just accept the standard address assignment for host interfaces to virtual networks without thinking.

Eclipse Photon 2018-09 für PHP – endlich gute Content Assist Performance

Posted on 26. September 2018 by Ralph Mönchmeyer

Ende August war ich auf die “R”-Version von Eclipse Photon umgestiegen. Der Start der IDE erwies sich zwar als elend langsam; aber immerhin erschien mir das Zusammenspiel mit GTK3 gegenüber Eclipse Oxygen deutlich verbessert. Auch eine Java10-Runtime-Umgebung ließ sich gut verwenden.

Etwas hat mich jedoch zunehmend bei der Arbeit behindert:
Bei großen PHP-Projekten erwies sich die Content-Assist-Funktion [CA] des PHP Editors als bodenlos langsam. Wenn man den Heap beobachtete, so wechselte der während des Wartens auf eine CA-Antwort ständig die Größe. Gegenüber den in Eclipse ablaufenden Hintergrundsprozessen (Neu-Indizierung vieler Files? Update diverser Oberflächenelemente? …) kam die CA-Funktionalität offenbar erst mit geringer Priorität zum Tragen. Das war völlig irregulär. Mal erhielt man eine Anwtort zu einer lokalen Variable sofort; mal musste man 5- 6 Sekunden warten. Die Meldung “Computing proposals ….” ging mir zunehmend auf den Zeiger. Wer will beim Tippen schon 5 Sek warten, bis der erste Vorschlag kommt? Leider hat keine der angebotenen Optionen unter “Preferences >> PHP >> Editor >> Content Assist” viel geholfen. Auch nicht zur “asynchron Code Completion”. Ich war kurz davor, wieder auf Oxygen zurück zu gehen.

Heute habe ich heute aber die neue “September-Version von Eclipse ( “eclipse-php-2018-09-linux-gtk-x86_64.tar.gz” ) installiert, Updates der “webkit2”-Bibliotheken unter Opensuse vorgenommen und auch die “~/.eclipse”-Eintragungen neu erstellen lassen.

Und siehe da: Die Performance der CA-Funktionalität im PHP-Editor ist nun wieder so wie in alten Zeiten. nach nun 6 Stunden Arbeit kann ich sagen: Das Programmieren mit PHP und Eclipse macht wieder Freude ….

Linux-Blog – Dr. Mönchmeyer / anracon

Notes about Linux, ML and some simple math …

Shrinking an encrypted partition with LVM on LUKS

My disk layout

A sequence of 16 steps ….

Conclusion

Links

Cryptsetup close – not working for LVM on LUKS – device busy

Ausgehende HTTP-Anfragen von E-Mail Servern, Spamassassin und sa-learn

VMware Workstation – Virtual Ethernet failed – sometimes for good!

Eclipse Photon 2018-09 für PHP – endlich gute Content Assist Performance