Leap 15.6, Nvidia driver 580 – resume from S3 deep sleep state after suspend to RAM?

Some months ago I have written a post about a Nvidia option, which at that time helped to overcome a problem with a suspend-to-RAM [leading to a deep S3 sleep state] and a later incomplete resume operation from the S3 state to the original desktop session. The setting “nvidia_modeset vblank_sem_control=0” at least remedied a result which had been described by many users of different Linux flavors: The resume process ended up in a black screen with only a mouse cursor being visible. This situation was difficult to control – and typically one had to reboot the system.

In my experience things have changed a bit since the named post. Therefore, I give a brief summary of other settings below. These settings appear to support a relatively problem-free suspend-to-ram and resume operations on my Leap 15.6 system with my X11-based KDE Plasma a newer Nvidia driver of version 580.

Card, driver, services, grub_cmdline

I describe the settings for just one of my systems. The board is an older ASRock Z170 extreme+. The Nvidia card is a 4060 TI. My present driver is the proprietary Nvidia driver of version 580.105.08. It was installed from the usual repository. In particular the The RPM “nvidia-driver-G06-kmp-default” is installed.

Note: I have not checked whether things work on other systems and with Wayland, yet.

The default grub cmdline contains the parameter setting

nvidia-drm.modeset=1

The services

  • nvidia-suspend.service
  • nvidia-hibernate.service
  • nvidia-resume.service
  • nvidia-persistenced.service
  • nvidia-powerd.service

are all enabled.

Nvidia configuration settings

In the directory “/etc/modprobe.d”, I have a file 50-nvidia.conf”, which contains the following option

blacklist nouveau
options nouveau modeset=0

options nvidia NVreg_DeviceFileUID=0
options nvidia NVreg_DeviceFileGID=33
options nvidia NVreg_DeviceFileMode=0660
options nvidia NVreg_TemporaryFilePath=/var/tmp
options nvidia NVreg_EnableS0ixPowerManagement=1
options nvidia NVreg_PreserveVideoMemoryAllocations=1
options nvidia-drm modeset=1
#
options nvidia_modeset vblank_sem_control=0

Dependency on other things – like the resume-button and the sleep-time

I usually start suspending to RAM by options offered on the KDE Plasma desktop. But things work also from the command line. My ASRock-board shows that memory contents is written to the system’s root filesystem by indicating a respective storage-activity ahead of reaching the S3 state. When the deep suspend state S3 is reached the power button starts blinking.

The default keyboard-button for triggering a resume from sleep is the “blank” key. Using the keyboard to trigger a resume operation seems to work flawlessly so far. I get to the Plasma lock/login-screen for my running KDE session, which is restored in the state before the suspend-process – with all interrupted processes running again. Independent of the duration of the sleep time. Logs show that a time jump is recognized and that the system adapts to it.

However and notably:

Resume does not work always by pressing the power button on my case briefly. Actually, pressing the power button shortly after a long sleep time >≈ 15 minutes sometimes leads to a resume that ends up in a SDDM login-screen for a new KDE Plasma session. The logs then show that the old Plasma session could not be restored due to some error (see below). The system then obviously turns into workaround: It forgets the old KDE-session, restarts into the default graphical target and starts SDDM for a new graphical user session.

This strange behavior does not seem to happen when pressing the default keyboard button to trigger a resume operation. I cannot explain this – but it may be specific for my board and its settings. But maybe the statistics of my suspend/resume experiments is too insignificant, still.

Race conditions during resume?

I have checked old logs and found that some resume processes have led to a situation were the X11-session appeared to be broken for certain starting applications as pulseaudio and gkrellm. This may be the result of race conditions during resume. In such a case the restart ofth edefault graphical target happened. I can not yet exclude that such problems may still be lurking in the background. The number of my suspend/resume tests with varying sleep time durations may have been to small, yet, to detect such events.

Conclusion

The suspend/resume behavior on Leap-systems obviously changes depending on kernel and Nvidia’s driver versions. Presently, I have reached a relatively stable status regarding suspend-to-RAM and resume for KDE Plasma on a X11-basis. I hope this post helps some readers who experience instability problems with suspend/resume on Leap systems with Nvidia graphics cards.

 

Leap 15.6, Nvidia-driver – problems due to dependency on original kernel-default-devel package

After some updates of Opensuse’s Leap 15.6 systems via a variety of repositories (including the SLES 15.6 repo as the main and leading repository), one of my installed packages was removed. I am talking about a very central one, namely “kernel-default-devel version 6.4.0-160600.21.3-x86_64”. (It can be found in the main repository of the 15.6 distribution.) This was in so far reasonable as I have no longer any respective kernel installed. All used kernels are of version “6.4.0-156000.23.xxx”.

However, the removal of the old kernel-default-devel package caused difficulties with the Nvidia drivers in Opensuse’s respective repositories. The installation and compilation of the required diver module delivered by package “nvidia-driver-G06-kmp-default” would fail. It requires the original kernel-default-devel version 6.4.0-150600.21. As a consequence on the affected systems the Nvidia driver could no longer be loaded.

A supplemental installation of the original kernel-default-devel package (coming with the distribution) remedied the problems. Hope this helps others who experience similar problems during system updates.

 

Leap 15.6, Nvidia driver 570 – resume from suspend to RAM not working / workaround

Hint: After some experiments and further Internet digging, this post was rewritten and supplemented on the 5th of March, 2025. Sorry for any inconvenience.
—–

Recently, I have upgraded Opensuse Leap to version 15.6 on 5 PC-systems – all with (different) Nvidia graphic cards. I use KDE/Plasma on all these systems.

My daily working system is equipped with a 4060 TI Nvidia card. Nvidia drivers of version 570.124.06-1 on this particular system came from the Nvidia CUDA repository for Opensuse system at

developer.download.nvidia.com ….

I sadly must say that the named particular driver, but also the present Nvidia drivers of version 570.86.16 on other systems, are at least in their corporation with the Linux kernel (6.4.0) and other components of the present Leap 15.6, unreliable or even buggy (for KDE/Plasma):

The resume process from “Suspend to RAM” does not work reliably on any of the systems.

Continue reading

NUMA node error for Nvidia cards on Linux PCs

You may have experienced it in various contexts: CUDA, Tensorflow, gaming applications or complex 3D graphics applications may warn you that your Nvidia card is associated with an unexpected negative NUMA value. The warning often refers to a value of “-1”. And the clever application replaces this value by a default value of “0”.

The problem is particularly annoying when dealing Machine Learning, e.g. in Jupyter notebooks. There warnings may repeatedly clatter the output of some cells – e.g. during the setup of the graphics card for some ML experiments.

Besides the question why the Nvidia drivers for Linux and/or CUDA drivers do not fix this problem by detecting just one NUMA node on the system and setting the value for the card to “0”, the question for us users is how we can get rid of the warnings.

A basic idea is that we set the right value by ourselves. I have described this simple measure in the sister blog, which unfortunately still is under construction. See:
Setting NUMA node to 0 for Nvidia cards on standard Linux PCs.

There I also briefly discuss what NUMA basically is thought for – and why it normally does not affect consumer PCs.

 

Opensuse Leap 15.5 – installation of CUDA 12.3 for Machine Learning

Working with Machine Learning and Deep Neural Networks not only requires GPU drivers, but in case of Nvidia GPUs also the installation of CUDA and cuDNN. This process is always a bit tricky as additional environment variables have to be set for IPython-based Jupyterlab or classic Jupyter Notebook. On an Opensuse system one must in addition take care of the right settings in /etc/alternatives.

I have described the necessary steps in a post at “machine-learning.anracom.com“.

I hope this helps people who want to use Leap 15.5 for Machine Learning with Nvidia GPUs, Keras/Tensorflow 2 and Jupyterlab.

Important addendum 01/27/2024:
Although the combination of CUDA 12.3, cuDNN 8.9.7, Tensorflow 2.15 and Nvidia drivers 545.29.06 works regarding AI-models, there is another major problem:
Nvidia’s driver 545.29.06 is buggy – at least for Leap 15.5, KDE/Plasma with multiple screens. The bug affects Suspend-to-RAM. Suspend-to-RAM seems to work in the suspend phase, and the system also comes up afterward in a seemingly proper state of your KDE/Plasma interface (on your screens).

However, the problems begin when you want to change to another virtual screen via Ctrl-Alt-Fx. You wait and wait and wait … The same for changing the run-level or systemd target state or when you want to shut the system down. This makes Suspend-to-RAM with driver 545.29.06 impossible to use.

Recommendation:
If you have a working older Nvidia driver (e.g. a stable 535 version) do not change to 545.29.06. Unfortunately, it is a mess on a multiscreen Leap 15.5 system to return to an older driver version. The Nvidia community repository does not offer you a choice. (Why by the way ????). Downloading an older proprietary driver from Nvidia and trying to install it afterward on a console terminal (after having stopped X11 or Wayland) did not work in my case – the screens displaying the terminal changed their resolution and froze afterward. So, you may have to completely uninstall the present driver 545 completely, go back to standard VGA and then try to install an older driver via Nvidias install mechanism. As I said: It is a mess …