Upgrade of a mail server to Leap 15.1 – problems with SSSD and clamd

I use email servers based on Postfix (smtp), Cyrus (imap) in combination with an LDAP server for authentication purposes and fetchmail to access external mail provider services. Both the mail servers and the LDAP server are virtualized guests on KVM host servers with LUKS-encrypted disks/partitions. Due to a series of security measures to become compliant to DSGVO and EU-GDPR based customer contracts the whole setup is relatively complicated. However, authentication for mail clients to the different servers is of central importance. Communication of each mail server to the LDAP server is performed via an TLS connection and SSSD. The mail client systems access the mail servers via TLS; login to the client systems partially also depends on LDAP.

Whenever a full upgrade of the server is required I, therefore, first test it on copies of the KVM host installation and each KVM instance. (The “dd” command is of good service during these tests.) One experiences some unwelcome surprises from time to time – and then you may need a quick restauration of a workings system.

When I switched everything to Opensuse Leap 15.1 some days ago I stumbled once again across small problems. It is interesting that one of the problems had to do with SSSD – again.

Previous problems with SSSD during upgrades to Opensuse Leap 15.0

Some time ago 1 described a problem with PAM control files for imap and smtp services on the mail server when I upgraded to Leap 15.0. See:
Mail-server-upgrade to Opensuse Leap 15 – and some hours with authentication trouble
The PAM files included directives for SSSD. The file were unfortunately replaced (without backups) during upgrade from OS 42.3 to Leap 15.0. This hampered all authentication of mail clients via authentication requests from the imap and smtp services to the LDAP system. The cause of the resulting problems at the side of the email clients, namely authetication trouble, was not easy to identify.

New SSSD problem during upgrades to Opensuse Leap 15.1

This time I ran once again into authentication trouble – and suspected some mess with the PAM files again. Yet, this was not the case – the PAM files were all intact and correct. (SuSE learns!) However, after an hour of testing I saw that the SSSD service did not what it should. Checking the status of the service with “systemctl status sssd.service” I got a final status line saying “Backend is offline”.

What did this mean? I had no real clue. You naturally assume that LDAP would be my backend in my server configuration; this is reflected in the file /etc/sssd/sssd.conf:

[sssd]
config_file_version = 2
services = nss,pam
domains = default
[nss]
filter_groups = root
filter_users = root
[pam]
[domain/default]
ldap_uri = ldap://myldap.mydomain.de
ldap_search_base = dc=mydc,dc=de
ldap_schema = rfc2307bis
id_provider = ldap
ldap_user_uuid = entryuuid
ldap_group_uuid = entryuuid
ldap_id_use_start_tls = True
enumerate = True
cache_credentials = False
ldap_tls_cacertdir = /etc/ssl/certs
ldap_tls_cacert = /etc/ssl/certs/mydomainCA.pem
chpass_provider = ldap
auth_provider = ldap

I checked – the LDAP service was active in its KVM machine. Of course, NSS must also be working for SSSD to become functional. No problem there. I checked whether the LDAP service could be reached through the firewalls of the different KVM instances and their hosts. Yes, this worked, too. So, what the hack was wrong?

Eventually, I found some interesting contribution in a Fedora mailing list: See here. What if the problem had its origin really in some systemd glitch? Wouldn’t be the first time.

So, I first made a copy of the original file “/usr/lib/systemd/system/sssd.service” and after that tried a modification of the original file linked by “sss.service” in “/etc/systemd/system/multi-user.target.wants”. I simply added a line “After=network.service” to guarantee a full network setup before sssd was started.

[Unit]
Description=System Security Services Daemon
# SSSD must be running before we permit user sessions
Before=systemd-user-sessions.service nss-user-lookup.target
Wants=nss-user-lookup.target
After=network.service

[Service]
Environment=DEBUG_LOGGER=--logger=files
EnvironmentFile=-/etc/sysconfig/sssd
ExecStart=/usr/sbin/sssd -i ${DEBUG_LOGGER}
Type=notify
NotifyAccess=main
PIDFile=/var/run/sssd.pid

[Install]
WantedBy=multi-user.target

And guess what? This was successful! The reason being that at the point in time when the sssd.service starts name resolution (i.e. the evaluation of resolv.conf and access to DNS-servers ) may not yet be guaranteed!

Hint:

Note that there may be multiple reasons for such a delay; one you could think of is a firewall which is started at some point and requires time to establish all rules. Your server may not get access to any of the defined DNS-servers up to the point where the firewalls rules are working. Then, depending on when exactly you start your firewall service, you may have to use a different “After”-rule than mine.

Important point:
You should not permanently change the files in “/usr/lib/systemd”. So, after such a test as described you should restore the original systemd file for a specific service in “/usr/lib/systemd/system/” with all its attributes! The correct mechanism to add modifications to systemd service configuration files is e.g. described here “askubuntu.com : how-do-i-override-or-configure-systemd-services“.

So, in my case we need to execute “systemctl edit sssd” on the command line and then (in the editor window) add the lines

[Unit]
After=network.service

This leads to the creation of a directory “/etc/systemd/system/sssd.service” with a file “override.conf” which contains the required entries for service startup modification.

An additional problem with clamd – timeout during the start of the clamd service

One of my anti-virus engines integrated with amavis is clamav. More precisely the daemon based variant, i.e. the “clamd” service. However, when I tested amavis for mail scanning I saw that it used to job instances of “clamscan” instead of “clamdscan”. The impact of Amavis’ using two parallel clamscan threads was an almost 100% CPU utilization for some time.

It took me a while to find out what the cause of this problem was: clamd requires time to start up. And due to whatever reasons this time is now a bit bigger on my mail system than the standard timeout of 90 secs systemd provides. This can be compensated by “systemctl edit sssd” and adding lines as

[Service]
TimeoutSec=3min

After this change clamd ran again as usual. Note however that clamav does not provide sufficient protection on professional mail servers, especially when your email clients are based on a Windows installations. Then you need at least one more advanced (and probably costly) antivirus solution.

Links

how-to-troubleshoot-backend.html
fedora archive contribution
www.clearos.com community : clamd-start-up-times-out
unix.stackexchange.com : how-to-change-systemd-service-timeout-value

Mail-server-upgrade to Opensuse Leap 15 – and some hours with authentication trouble …

Some of my readers know that the mail server in my private LAN resides in an encrypted KVM/qemu guest. It provides IMAP- and SMTP-services to mail-clients – and receives emails from several hosted servers on the Internet via a fetchmail system in a DMZ. It was/is my policy that any access to a mail service account on my private mail server requires authentication against user entries on a separate LDAP-server. The accounts the IMAP/SMTP-service-daemons deal with are not to be confused with Linux user accounts: On the mail server NO Linux user accounts are required for mail handling. I regard the absence of standard Linux user accounts on the mail-server as a small, but effective contribution to security. No mail user can login into the mail server as a Linux user. Nevertheless authentication for the use of mail services is required.

Upgrade top Leap 15 resulted in unusable mail services

The manual upgrade of the mail server system from OS Leap 42.3 to Leap 15 seemed to work perfectly. I got no serious warnings. As root I could login to the virtual KVM guest via ssh – and systemd had started all required services: Fetchmail operated on external servers in the DMZ and transferred new mails to the upgraded mail server. There my postfix queues and secondary services for virus and spam checks seemed to work perfectly. Regarding IMAP I could also see that new entries for new mails appeared in various email accounts (more precisely: in related spool directories). Everything looked great …

However, the big surprise came pretty soon: No mail user could login to the IMAP-service – neither with Kmail, nor Mutt, nor Thunderbird. Access to the SMTP service for sending email was not possible either. System messages appeared on the GUIs saying that there was an authentication error. A very unpleasant situation which required analysis ….

In the meantime I had to start a backup copy of the old mailserver guest installation. On this front
virtualization proved its strengths and simplicity – I had made a copy of the whole KVM guest before the upgrade and just had to include the copy as a virtualization domain into the KVM setup of the virtualization host. But, as the upgraded server had already processed some new mails, I needed to transfer these mails between the servers and reconstruct and reindex some IMAP accounts … A simple, but a task which can be done by some scripts ….

Error analysis

It took me quite a while to figure out where the origin of the upgrade problem was located. I first checked my local firewall protocols both on the mail server and on the LDAP server. Nothing special there … Then I checked the LDAP access protocols – there were successful data requests from various systems – but astonishingly not from the mail server. So it seemed that the mail services did not even try to authenticate their users … strange ….
I then created a new IMAP test user based on a new local Linux user account, i.e. with an entry in “/etc/passwd” and “/etc/shadow”. Guess what? This user could work without any problems. So, obviously the communication of the mail services with the LDAP service for authentication was not triggered or failed in a strange way. I, therefore, tried a manual data request to the LDAP server from the mail server – and got a perfect answer. So, there seemed to be no fundamental problem with the required sssd-daemon and client-configurations. What else was wrong? A bit of despair was in the air …

The role of PAM in the game

Then I tried to remember what I had done in the past to separate the email accounts from the Linux user accounts. Over the last years I had actually forgotten how I had treated this problem. After some reading in old diaries my attention eventually turned to PAM …

The PAM-layer offers Linux admins the possibility of fine-tuning access/authentication conditions for the usage
of a service; among other things you can request certain PAM-modules to authenticate a user via given credentials against defined resources. A sequential series of required, sufficient or optional criteria can be set up. ( When a service needs additional Linux user account information during a session NSS can in addition request information from name resolution backends as “/etc/passwd”. However, this is NOT required for mail services. )

Linux mail services often use the SASL-framework and the saslauthd daemon to perform authentication. SASL can communicate with a variety of authentication backends directly – but it can also use PAM as an intermediate layer. And, in fact, I had used PAM to access my independent LDAP-server with mail account credentials via the PAM-module for SSSD.

I have described my simple approach in a previous article
https://linux-blog.anracom.com/2014/03/15/cyrus-imap-mit-sasl-pam-sssd-und-ldap-opensuse-12-313-1-ii/
Unfortunately, only in German. I, therefore, summarize the basic ideas shortly (see also the graphics in the referenced article).

The things one has to do to force IMAP and/or a SMTP services to use LDAP, but other services to use local authentication resources is:

  • We direct local login-services, ssh-services, etc. on the server via PAM to local resources, only.
  • We eliminate any UIDs corresponding to email account names from local resources as “/etc/passwd” and “/etc/shadow” (and NIS if used).
  • We do not create any local home-directories for email account names.
  • We eliminate LDAP as a usable NSS resource from nsswitch.conf; i.e. we eliminate all LDAP references there.
  • We setup special PAM files in “/etc/pam.d” for the IMAP-service and the SMTP-service. These files will necessarily include pam-modules for the “sss“-service (pam_sss.so).
  • We select entries on the LDAP server for users which for any reasons shall NOT or NO LONGER get access to the mail service accounts, set the “host”-attribute for these entries and configure sssd to use the host-attribute. Or deny certain user names access to the IMAP-service by the help of IMAP -configuration files; cyrus e.g offers the maintenance of a list “userdeny_db”.

This strategy worked very conveniently already on an Opensuse 12.3 platform. Examples for the required special files “/etc/pam.d/imap” and “/etc/pam.d/smtp” are given in the article named above. The mix and sequence of the statements there is due to the fact that locally defined system users as “cyrus” must get access to the IMAP-service, too.

What had happened during the Leap 15 upgrade?

My configuration had survived multiple upgrades of the virtualized server in the past. But not the one to Leap 15! What had happened?

After having remembered the importance of the PAM configuration, I eventually had a look into the directory “/etc/pam.d” on the upgraded system and compared it to the respective directory on the original system. Whereas during previous upgrades “personal” PAM configuration files for services had been respected, the upgrade to Leap 15 simply had removed my PAM configuration files for SMTP and IMAP – without a backup! Incredible, but true! Actually and astonishingly, the contents of some other standard PAM files had been transferred to a backup ….

A restoration of my PAM files for the mail services lead to an immediate success – authentication for the access of mail accounts on the upgraded server was possible again. Some hours in the hell for Linux admins came to an end.

Conclusion

Never (!) trust a standard upgrade of a whole Linux system procedure to keep configuration files in “/etc/pam.d” untouched. Keep an admin or system diary for unusual approaches. And – of course: Full
backups of critical systems before backups are a MUST ….