Blender – even on old laptops a graphics card increases rendering performance

My present experiments with Blender on my old laptop take considerable time to render- especially animations. So, I got interested in whether rendering on the laptop’s old Nvidia card, a GT 645M, would make a difference in comparison to rendering on the available 8 hyperthreaded cores of the CPU. The laptop’s CPU is an old one, too, namely an i7-3632QM. The laptop’s operative system is Opensuse Leap 15.3. The system uses Optimus technology. To switch between the Nvidia card and the Intel graphics I invoke Suse’s Prime Select application on KDE.

I got a factor of 2 up to 5.2 faster rendering on the GPU in comparison to the CPU. The difference depends on multiple factors. The number of CPU cores used is an important one.

How to activate GPU rendering in Blender?

Basically three things are required: (1) A working recent Nvidia driver (with compute components) for your graphics card. (2) A certain setting in Blender’s preferences. (3) A setting for the Cycles renderer.

Regarding the CUDA toolkit I quote from Blender’s documentation

Normally users do not need to install the CUDA toolkit as Blender comes with precompiled kernels.

With respect to required Blender settings one has to choose a CUDA capable device via the menu point “Preferences >> System”:

You may also select both the GPU and the CPU. Then rendering will be done both on the GPU and the CPU. My graphics card unfortunately only understands a low level of CUDA instructions. The Nvidia driver I used is of version 470.103.01, installed via Opensuse’s Nvidia community repository:

In addition, you must set an option for the Cycles renderer:

With all these settings I got a factor of 2 up to > 6 faster rendering on the GPU in comparison to a CPU with multiple cores.

The difference in performance, of course, depends on

  • the number of threads used on the CPU with 8 (hyperthreaded) cores available to the Linux OS
  • tiling – more precisely the “tile size” – in case of the GPU and the CPU

All other render options with the exception of “Fast G” were kept constant during the experiments.

Scene Setup

To give the Blender’s Cylces renderer something to do I set up a scene with the following elements:

  • a mountain-like landscape (via the A.N.T Landscape Add-On) with a sub-dividion of 256 to 128 – plus subdivision modifier (Catmull-Clark, render level 2, limit surface quality 3) – plus simple procedural texture with some noise and bumps
  • a plane with an “ocean” modifier (no repetition, waves + noisy bump texture for the normal to simulate waves)
  • a world with a sky texture of the Nishita type ( blue sky by much oxygen, some dust and a sun just above the horizon)

The scene looked like

The central red rectangle marks the camera perspective and the area to be rendered. With 80 samples and a resolution of 1200×600 we get:

The hardest part for the renderer is the reflection on the water (Ocean with wave and texture). Also the “landscape” requires some time. The Nishita world (i.e. the sky with the sun), however, is rendered pretty fast.

Required time for rendering on multiple CPU cores

I used 40 samples to render – no denoising, progressive multi-jitter, 0 minimum bounces.
Other settings can be found here:


The number of threads, the tile size and the use of the Fast CI approximation were varied.
The resolution was chosen to be 1200×600 px.

All data below were measured on a flatpak installation of Blender 3.1.2 on Opensuse Leap 15.3.

tile size threads Fast GI time
64 2 no 82.24
128 2 no 81.13
256 2 no 81.01
32 4 no 45.63
64 4 no 43.73
128 4 no 43.47
256 4 no 43.21
512 4 no 44.06
128 8 no 31.25
256 8 no 31.04
256 8 yes 26.52
512 8 no 31.22

A tile size of 256×256 seems to provide an optimum regarding rendering performance. In my experience this depends heavily on the scene and the chosen image resolution.

“Fast GI” gives you a slight, but noticeable improvement. The differences in the rendered picture could only be seen in relatively tiny details of my special test case. It may be different for other scenes and illumination.

Note: With 8 CPU cores activated my laptop was stressed regarding CPU temperature: It went up to 81° Celsius.

Required time for rendering on the mobile GPU

Below are the time consumption data for rendering on the mobile Nvidia GPU 645M:

tile size Fast GI time
64 no 18.3
128 no 16.47
256 no 15.56
512 no 15.41
1024 no 15.39
1200 no 15.21
1200 yes 12.80

Bigger tile sizes improve the GPU rendering performance! This may be different for rendering on a CPU, especially for small scenes. There you have to find an optimum for the tile size. Again, we see an effect of Fast GI.

Note: The temperature of the mobile graphics card never rose above 58° Celsius. I measured this whilst rendering a much bigger image of 4800×2400 px. I therefore think that the temperature stress Blender rendering exerts on the GPU is relatively smaller in comparison to the heat stress on a CPU.

Required time for rendering both on the CUDA capable mobile GPU and the CPU

As the CPU is CUDA capable one can activate CUDA based rendering on the CPU in addition to the GPU in the “preferences” settings. With 4 CPU cores this brings you down to around 11 secs, with 8 cores down to 10 secs.

tile size threads Fast GI time
64 4 no 11.01
128 8 no 10.08

Conclusion

Even on an old laptop with Optimus technology it is worthwhile to use a CUDA capable Nvidia graphics card for Cycles based rendering in Blender experiments. The rise in temperature was relatively low in my case. The gain in performance may range from a factor 2 to 5 depending on how many CPU cores you can invoke without overheating your laptop.

Ceterum censeo: The worst living fascist and war criminal today, who must be isolated, denazified and imprisoned, is the Putler.

 

Switching between Intel and Nvidia graphics cards on an Optimus laptop with Leap 15.3

At my present location an old laptop with an Optimus system still provides me with good services. Recently, I installed Opensuse Leap 15.3 on it. Since then I have tried almost all I can think of to get Bumblebee running smoothly on it. In the end I gave up – sometimes “primusrun” worked, sometimes not – especially with more complex applications like Blender. And even if “primusrun” worked – “optirun” most often did not.

So, I had a look at “prime-select” installed by the packet “suse-prime” of Opensuse. Again I had mixed experiences: This worked sometimes from the command line and sometimes not. There were multiple things which were somehow interwoven:

  • bbswitch intervened or was used by prime-select when switching to Intel – and when it did and turned the nvidia card off, you could activate it again, but afterward the Nvidia device it was not recognized by the native Nvida driver or by prime-select. Most often I had to reboot.<
  • When I was lucky and after reboots got a configuration where the Nvidia driver was loaded in parallel to the i915 for Intel and used “prime-select” together with an init 3 / init 5 sequence the display manager sddm did not come up. Others had this problem, too.
  • To switch between the graphics cards it was absolutely necessary to login into my graphical KDE desktop, use a root terminal and switch to the Nvidia card with prime-select from there. Then I had to log out and if I was lucky the display manager came up and the Nvidia card was really active.

But do not ask me, what I had to do in which order to get the aspired result. All in all the behavior of “prime-select” was a bit unpredictable. Then after updating a lot of packets yesterday I could no longer get prime-select to do the right things any more. Most often I got the message:

ERROR: Unable to query GPU information

Or : No such device found.

Solution Part 1: Deinstallation of unrequired RPMs and reinstallation of required RPMs

In the end I deinstalled Bumblebee (as recommended by posts in Suse forums) and also bbswitch. Afterward I reinstalled the packets “suse-prime” and “plasma5-applet-suse-prime” from the Leap 15.3 Update-repository (i.e. the Update repo for the Enterprise system) and the Nvidia-drivers from Opensuse’s Nvidia repository. I rebooted and my “sddm” display-manager came up – with the Intel driver active – but the Nvidia driver was loaded already nevertheless. Checked with lsmod.

Later, I found that all files in “/etc/prime” had been rewritten. I suspect that I had something in there which was no longer compatible with the latest drivers and prime-versions. In addition the file “90-nvidia.conf” has been replaced in “/etc/X11/xorg.conf.d“.

Solution Part 2: Use the “Suse Prime Selector” applet!

Then I used the applet in the KDE Plasma desktop to switch to Nvidia. This seemed to work and led to no errors. I just had to log out of my KDE Plasma session and in again to work with KDE on a running Nvidia card. (Ignore the message that you have to reboot).
Afterward I also tested switching by using the command “prime-select” as root in a terminal window of the KDE desktop. Worked perfectly, too.

Conclusion

I am not sure what the ultimate reason was to get prime-select running. The deinstallation of Bumblebee and bbswitch? The new Prime and Nvidia configuration files? Or a new udev during other updates?

Anyway – using the Suse Prime Selector applet or just using the command “prime-select” as root in a terminal window inside KDE Plasma has become a very stable way on my Leap 15.3 laptop with KDE to switch between graphics cards reliably. I still have to log in and log out again of the KDE session – but so far I have never experienced any problems with the display-manager again.

Just for information: I am using the following Nvidia drivers packages:

The prime-select packet version is:

Links

https://www.reddit.com/ r/ openSUSE/ comments/ ib9buv/ opensuse_kde_suse_ prime_selector_ applet_for/
https://forums.developer.nvidia.com/ t/ kubuntu-18-04-gives-black-screen-when-sddm-starts-no-login/ 61034/11
https://wiki.archlinux.org/ title/ PRIME# Configure_applications_%20 to_render_using_GPU
https://www.opensuse-forum.de/ thread/ 63871-suse-prime-error-unable-to-query-gpu-information/
https://opensuse.github.io/ openSUSE-docs-revamped-temp/hybrid_graphics/

Ceterum censeo: The worst living fascist and war criminal, who must be isolated, denazified and imprisoned, is the Putler.

 

Laptop – SSD mit dm-crypt/Luks -Verschlüsselung und Opensuse Leap 15 – III – Zugriffs-Layer

In den letzten Beiträgen dieser Serie

Laptop – SSD mit dm-crypt/Luks -Verschlüsselung und Opensuse Leap 15 – I – Vorüberlegungen
Laptop – SSD mit dm-crypt/Luks -Verschlüsselung und Opensuse Leap 15 – II – Vorüberlegungen zur Virtualisierung

hatten wir Vorüberlegungen zur Verschlüsselung eines Laptops angestellt. Wie in den früheren Beiträgen auch bezeichne ich nachfolgend sowohl Partitionen als auch LVM-Volumes als “Volumes”. Wenn eine Unterscheidung notwendig ist, werde ich die explizit treffen. Für das root-Filesystem benutze ich die Abkürzung “/”-FS.

Das wichtigste Ergebnis unserer bisherigen Überlegungen war, dass wir das System, auf dem wir mit den zu schützenden Kundendaten arbeiten, voll verschlüsseln müssen. Das bedeutet, dass wir sowohl das Volume für das “/”-FS, als auch den SWAP sowie auch alle Volumes für Daten verschlüsseln. Es reicht nicht, mit Datencontainern zu arbeiten.

Unser zweites Ergebnis war: Wir sollten diesen Ansatz auf einem Virtualisierungs-System (LVM/QEMU, LXC) zur Sicherheit sowohl auf die Volumes anwenden, die der Host selbst nutzt, als auch auf alle Volumes, die von den Betriebssystemen der Gäste genutzt oder zusätzlich gemountet werden. Sprich: Auch der Virtualisierungshost muss voll-verschlüsselt werden (Root-Filesystem, Swap (!), Daten-Volumes). Es gibt zu viele Faktoren, durch die Inhalte der Gastsysteme auf Host-Filesysteme geraten können.

Wir hatten ferner diskutiert, dass im Prinzip sowohl das Host-System auf dem Laptop als auch die Gast-Systeme selbst ihren Teil zur Verschlüsselung beitragen können. Dieses Thema hat nicht zuletzt auch mit der Schichtung der Zugriffsverfahren auf Storage-Devices und Filesysteme zu tun: Jeder, der mit Linux professionell arbeitet, nutzt früher oder später LVM, um bzgl. der Anpassung der Größe von Filesystemen für Hosts und Gäste flexibel zu werden. Wie bringt man LVM mit der LUKS-Verschlüsselung zusammen? Wie stellt sich das im Fall von Virtualisierung dar?

Ein anderer für die Praxis wichtiger Punkt ist die Kompatibilität von Verschlüsselungsverfahren für Volumes mit Boot-Managern und auch systemd. Kann der unter Linux wohl hauptsächlich eingesetzte Grub2-Manager mit verschachtelten LVM/LUKS- oder LUKS/LVM-Strukturen auf Block-Devices umgehen?

Der nachfolgende Artikel geht aus grundlegender Perspektive auf diese Fragestellungen ein. Spätere Artikel widmen sich dann der praktischen Umsetzung.

Continue reading