Long boot time of Kali Linux after swap partition changes – swap settings for the initramfs

Recently, I reactivated an old KVM-based installation of Kali Linux from 2018. So, some hurdles to upgrade the Kali distribution to version 2020.4 had to be overcome. Actually, it was a mess. I had to solve multiple circle like problems with package dependencies (Python3, Ruby, special packages in the Discovery section, …). Sometimes I had to delete packages explicitly. The new Kali minimum installation and its enhancement via meta-packages contributed to the problems. In the end I also reinstalled a the Gnome desktop environment – supplemented by a few KDE applications. Now, my Kali was fully functional again. However, the remaining disk space had shrunk …

Resizing the qcow2-file for the virtual disk of the virtual machine for Kali Linux

During all the back and forth with installing meta-packages I came close to exhausting virtual disk space, before I could clean up and remove packages again (with “apt-get autoclean” and “apt-get autoremove”). The virtual hard disk was a qcow2-file in my case. For further experiments I had to expand its capacity. This could easily be done by

qemu-img resize PATH_TO_qcow2_FILE +NG

where I had to chose a suitable “N” (20). The extended file must, of course, still fit into its filesystem on the KVM host!

Extending and rearranging partitions within the Kali system

The more difficult part in my case was the rearrangement of the two partitions ( one for the “/”-fs and one for swap) on the virtual disk for my Kali guest system. I did this within the running Kali system. Bad habit; such operations are dangerous, of course, but I trusted in the abilities of gparted. An extension of a mounted “ext4”-formatted partition is in my experience no major problem as long as there is enough free space behind its current location …. But, I recommend, of course, to make a backups of your virtual machine, before you start with any potentially dangerous operations regarding the file-systems of your Linux machines.

As my old Kali installation unfortunately did not have a LVM-layout and the disk partition table was of the old MS-DOS type, I actually had to move a blocking swap partition which was located directly after the “/”-partition. Meaning: I had to delete and later recreate it at a different position. Of course, after having disabled the swap in the running Kali guest :-). The partition keeping the “/”-filesystem (ext4) could then be extended without any problems on the running system. The new swap partition afterwards got its place behind the “/”-partition. According to “gparted” everything was OK after the partition changes: Then I rebooted…

The problem: A boot time of almost 30 secs …

The next restart took about 28 secs! But the machine came up in the end – which is almost a wonder, after I understood the cause of the problem. A standard boot-process until login normally required about 4 secs, only, before my filesystem changes. A major discrepancy and a clear indications of a major problem! Looking at the “dmesg”-output I got the impression that the delay had to do with operations occurring at the very beginning of the boot process. So, I checked the “/etc/fstab”. And got a first glimpse of the cause: The entries there referred to the UUIDs of the partitions! Not unexpectedly – this is the standard these days for almost all major distributions.

So, stupid me: Of course, the UUID of the swap partition had changed during the mentioned operations! I adapted it to the new value (which I got from gparted). I also checked whether there was
any reference to the swap-partition in the Grub command line (check “/etc/default/grub” for a “resume”-parameter!) To be on the safe side I also checked the result of “update-grub” in the file “/boot/grub/grub.cfg”). I did not find any references there. So, I gladly restarted my Kali system. However, the problem had not disappeared … .

Solution: The initramfs-configuration includes an explicit setting for the swap-partition!

Now, at this point there were not many options left. I started suspecting that the initramfs had a wrong entry. Now, where do we find options regarding the swap for the initramfs on a Debian based systems? A bit of duckduckgoing pointed me to the file

/etc/initramfs-tools/conf.d/resume

And there I did find an entry like

RESUME=UUID=d222524c-5add-2fcf-82dd-4d1b7e528d0c

OK – I changed it to the new value. Then I used

update-initramfs -u

to update the initramfs. And, guess what: The problem disappeared! I got my 4 secs of boot time again.

Conclusion

Never forget to check the UUID-settings for partitions after major changes of the filesystem-layout on your disks. Do the necessary checks not only on real host systems, but also on virtualized systems. Check both the “/etc/fstab” AND the Grub2-configuration AND the initramfs-configuration for possible references to changed or moved partitions.

Off topic – but related: When copying the contents of a partition with “dd” into another partition on your hard disk, e.g. for a backup, you should also care about the fact that you have two partitions with the same UUID afterwards. This may lead to major problems for any active Grub2. Always change the UUID of the copied partition with tune2fs before any reboots. If the copied partition was a bootable one, you also take care of its “/etc/fstab”-entries and initramfs settings, if you want it to be bootable in its new place.

General warning: Whenever you move/copy partitions, write down and save the original UUIDs – you may need them in case of trouble.

 

Opensuse, KDE Plasma, X11, Nvidia – stop video and screen tearing

In these times of Corona, home-office and of increased Internet usage some of us Linux guys may experience an old phenomenon: screen and video tearing. In my case it happened with an Nvidia card and with X11 (Wayland does not yet work on my Opensuse Leap 15.1 – I am too lazy to investigate why). I have ignored the tearing already for some months – but now it really annoyed me. I saw tearing already some years ago; at that point in time activating triple buffering helped. But not these days …

Where did I see the tearing?

I observed tearing effects

  • when moving “wobbling windows” (one of KDE’s desktop effects) across the screen – strangely enough when moving them slowly,
  • when watching TV and video streams in browsers (independent of FF, Opera or Chromium) – mostly when major parts of the video changed quickly.

Not much, not always – but enough to find it annoying. So, I invested some time – and got rid of it.

Driver and contents of the xorg.conf file

Driver: Latest Nvidia driver from Opensuse’s NVidia Repository: nvidia-glG05, x11-video-nvidiaG05.

I have three screens attached to my NVidia card (GTX 960); two of them are of the same type, but one has a lower resolution than the others. The screens are configured to work together as a super wide screen via the Xinerama setting in the xorg configuration file. Below, you find the contents of the file “/etc/X11/xorg.conf” with details about the screen configuration and modes.

xorg.conf

# nvidia-settings: X configuration file generated by nvidia-settings
# nvidia-settings:  version 450.80.02


Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
    Option         "Xinerama" "0"
EndSection

Section "Files"
EndSection

Section "InputDevice"

    # generated from data in "/etc/sysconfig/mouse"
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "IMPS/2"
    Option         "Device" "/dev/input/mice"
    Option         "Emulate3Buttons" "yes"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"

    # HorizSync source: edid, VertRefresh source: edid
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "DELL U2515H"
    HorizSync       30.0 - 113.0
    VertRefresh     56.0 - 86.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 960"
EndSection

Section "Screen"

    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "Stereo" "0"
    Option         "nvidiaXineramaInfoOrder" "DFP-2"
    Option         "ForceFullCompositionPipeline"  "on"
#    Option         "ForceCompositionPipeline"  "on"
    Option         "metamodes" "DP-4: nvidia-auto-select +0+0, DP-0: nvidia-auto-select +2560+0, DVI-I-1: nvidia-auto-select +5120+0; DP-4: nvidia-auto-select +2560+0, DP-0: nvidia-auto-select +0+0, DVI-I-1: nvidia-auto-select +5120+0; DP-4: nvidia-auto-select +2560+0, DP-0: nvidia-auto-select +0+0, DVI-I-1: 1920x1080 +5120+0; DP-4: nvidia-auto-select +2560+0, DP-0: nvidia-auto-select +0+0, DVI-I-1: 1680x1050 +5120+0; DP-4: nvidia-auto-select 
+2560+0, DP-0: nvidia-auto-select +0+0, DVI-I-1: 1600x1200 +5120+0; DP-4: nvidia-auto-select +2560+0, DP-0: nvidia-auto-select +0+0, DVI-I-1: 1440x900 +5120+0; DP-4: nvidia-auto-select +2560+0, DP-0: nvidia-auto-select +0+0, DVI-I-1: 1280x1024 +5120+0; DP-4: nvidia-auto-select +2560+0, DP-0: nvidia-auto-select +0+0, DVI-I-1: 1280x960 +5120+0"
    Option         "SLI" "Off"
    Option         "TripleBuffer" "True"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

 

The most important statement regarding the suppression of tearing is

    Option         "ForceFullCompositionPipeline"  "on"

Alternatively,

    Option         "ForceCompositionPipeline"  "on"

seems to work equally well. Use the latter, if your graphics should react a bit sluggish.

We find more information about these options in the “nvidia-settings” application:

When you move your mouse over the option for “ForceCompositionPipeline” and “ForceFullCompositionPipeline”, you get

The Nvidia driver can use a composition pipeline to apply Xscreen transformations and rotations. “ForceCompositionPipeline” can be used to force the use of this pipeline, even when no transformations or rotations are applied to the screen. This option is implicitly set by ForceFullCompositionPipeline.

and, respectively,

“This option implicitly enables “ForceCompositionPipeline” and additionally makes use of the composition pipeline to apply ViewPortOut scaling.”

Important: If you want to test the setting via nidia-settings, you have to activate the options it for all three screens!.

When I first tested “ForceCompositionPipeline” I just set it on the page of “nvidia-settings” for the first screen of my three, wrongly assuming that this setting was applied in general. However, tearing did not disappear. I realized after some time that it still happened on 2 screens predominantly. I even suspected a different quality of the display-port cables to my screens to be the cause of tearing. Wrong … the ForceCompositionPipeline had been applied to one screen, only.

So, switch to the other screens by using the first combo-box on the “nvidia-settings”-page and set “ForceCompositionPipeline” for all screens. Do this before you eventually save the settings to a “xorg.conf”-file (as root). Your resulting xorg.conf file may look a bit different; the CompositionPipeline-settings might be included as a side-option of the meta-mode settings – and not in form of a special separate line as shown above.

Regarding Xvideo- and OpenGL-settings you should activate syncing;

KDE Plasma settings

KDE Plasma settings for the screens should be consistent with the “nvidia-settings”. You use KDE’s “system-settings” >> “Display and Monitor” >> “Displays” and “Compositor”.

The combination of all the settings discussed above worked in my case – the tearing disappeared for videos in browsers, in video applications as well as on the Xinerama KDE Plasma screen in general.

Conclusion

It is easy to suppress video and screen tearing on an Opensuse Leap system with KDE PLasma and a Nvidia graphics card. The most important point is to activate “ForceCompositionPipeline” on all individual screens via “nvidia-settings” or to activate this option globally for the Xinerama screen of a multi-monitor configuration.

KDE, Pulseaudio and Browsers – make the LADSPA equalizer the default sink

During these days of Covid-19, home-office and lock-downs browsers and other Internet streaming tools as VLC become important personal gates to the world. When streaming videos or songs a user, of course, wants to hear some sound. No problem with Linux – Alsa helped you already decades ago. But things used to become a bit complicated if you wanted to direct the output of multiple sound-sources through a global equalizer of your Linux desktop environment (in my case preferably KDE). An equalizer may help to compensate deficits of cheap speakers or hearing problems of elderly persons as me. Well, if you found a global desktop equalizer at all. With KDE, no chance – it always was a strange policy of the KDE-people to assume that an equalizer is none of their responsibilities. So, a standard Linux user depended on application specific equalizers – which at least many Linux sound and video players offered. But what about browsers?

This is, where “Pulseaudio” and the related “Ladspa” based equalizer really were of help to a common user. As a matter of fact, I have never been a real friend of “Pulseaudio” [PA]; you can find some critical posts regarding PA in this blog. However, I gladly admit that Pulseaudio and its control interfaces have become substantially better with the years. At some point in the past PA started to work reasonably well even with multi-channel soundcards. It is now also much better integrated with KDE’s “Phonon” system than some years ago. Today, you can define e.g. a central volume control without destroying the relative volume ratios of different output channels of a sound card. And: We have a well integrated equalizer as a desktop-wide, global tool to improve the sound quality. So, why a post about it?

A problem with (automatically) changing streams and an assignment to a default sink

A problem with KDE and Pulseaudio in the past was the following: Only some applications (as e.g. “Clementine) ” gave/give the user a chance to specify a sink of the sound environment to which the sound output of the application is transferred for further processing.

A sound sink is a kind of sound module which accepts a sound stream as input, processes it and may send an output to other processing modules or an amplifier. On KDE you may find some available sinks for your sound card or cards under “system-settings >> Multimedia”. An important sink in our present context is the PA equalizer. See https://doc.qt.io/archives/qt-4.8/phonon-overview.html for the inclusion of media objects and sinks into a sound flow model (“graphs”) for KDE.

However, a lot of applications as e.g. browsers do not offer any settings to modify the primary sound sink. Instead they address a “default sink” of the system. What the “default sink” was, was either non-transparent to the user, or some related settings within your KDE desktop were just ignored, or you had to dive deep into the unhandy Alsa and the PA-configuration options. This led to major inconveniences for normal users:

When a new sound stream was activated a default sound sink was chosen by many applications which often did not correspond to the preferred one – namely the equalizer.

This problem could only partially be overcome by using “pavucontrol”, a PA-tool to control volume settings on channels and sinks in the system. “pavucontrol”actually allowed and allows the user to assign sinks to running applications and their sound streams. However, when the application switched from one stream to another – e.g. automatically in a media-player with a list of songs or on the web (youtube changing videos) – then the newly selected stream fell back to the default sink. Driving the user nuts ….

Setting of the default sink for the KDE desktop

nI use Opensuse Leap 15.1/2 with KDE as my main working environment (besides Debian and Kali with Gnome 🙂 ). By chance I recently found something which did not work for me in previous installations. In KDE we have a specific sound system – “Phonon” – which allows the user to organize the priority of “devices” (sinks) for certain kinds of applications. In my case you see the settings for “music” applications:

You see that I have 2 sound cards available – but to make things simpler I deactivated one of them for this blog post. The first device listed is the PA’s LADSPA equalizer:

It got the highest priority for music streams – more precisely for applications which follow the Qt/Phonon-API-rules when playing music streams. But, what about browsers (FF, Chromium, Opera, …), what about applications designed for Gnome and GTK3? You often can direct them to use PA, but what does PA respect as a default sink in a KDE environment with Phonon?

Well the simple “trick” which I found working recently is to set the priorities for all audio in KDE’s Phonon-settings:

Then we get the following PA-settings (install and start the pulseaudio-manager application “paman”):

This is what we need! And this setting is (now) respected by browsers and other applications that seek a default sink.

So: KDE, Pulseaudio and Phonon settings actually give a common KDE user the chance to direct all sound through the Ladspa equalizer as a default sink.

If your media-player offers its own equalizer you can of course combine both equalizers.

By the way: Common volume control

In the above picture on Phonon settings the sink “Simultaneous output to …” directs multiple sound sources to one or multiple sound devices. As we direct all sound through the equalizer first, we give the “Simultaneous output …“-device second priority.

We can use it for a common volume control in KDE’s Kmix: If you right-click on the Kmix symbol or open it you get an option to choose the main output channel :

Now, this setting assigns the desktop’s global volume control to this sink – which leaves all other volume settings, e.g. for the relative volumes of the sound-card channels, untouched:

You may find that this settings is transported to the sound control keys of a keyboard with a media control bar (e.g. on a Cherry keyboard).

Conclusion

With the help of KDE’s system-settings and Pulseaudio we can direct the output of all audio applications through a desktop wide equalizer, which we define by Phonon settings as a default sink. This is simply done by giving PA’s LADSPA equalizer the highest priority for all audio. You do not need to dive into PA configuration depths or the command line for changing PA’s device and sink graphs for sound flows.
The “Simultaneous output ….” device (or sink) allows for a global volume control which respects other volume settings controlled e.g. via PA’s “pavucontrol”.

A simple CNN for the MNIST dataset – XII – filter visualization for maps of the first two convolutional layers

In the last article of this series

A simple CNN for the MNIST dataset – XI – Python code for filter visualization and OIP detection
A simple CNN for the MNIST dataset – X – filling some gaps in filter visualization
A simple CNN for the MNIST dataset – IX – filter visualization at a convolutional layer
A simple CNN for the MNIST dataset – VIII – filters and features – Python code to visualize patterns which activate a map strongly
A simple CNN for the MNIST dataset – VII – outline of steps to visualize image patterns which trigger filter maps
A simple CNN for the MNIST dataset – VI – classification by activation patterns and the role of the CNN’s MLP part

I provided some code to create visualizations of “OIPs” (original input pattern). An OIP is a characteristic pattern in an input image to which a selected map of the deepest convolutional layer of CNN reacts strongly. Most of the patterns we found showed some overall large scale structure with sub-structures of smaller dimensions. In many cases the patterns were repeated two or even more times with some spatial distance across the images’s surface. That we got unique and relatively big patterns for the last and deepest Conv layer is not surprising because the maps of this layer cover the original image area with just a few neurons, i.e. with coarse resolution. The related convolutional filters work across relatively large distances. The astonishing ability of deeper layers to detect unique large scale patterns in input images is based on the weighted superposition of filters working on smaller scales together with a reduction of resolution.

To get images of OIPs we fed the trained CNN with input images whose pixel values were statistically distributed. We optimized the pixel values of the input images for a maximum response of selected maps of the third, i.e. deepest convolutional layer. We can, however, apply the same methods also to maps of the first and the second convolutional layers of our CNN. Then we get much simpler patterns – in the sense of repetitions of many small scale elements.

Below I just provide images of OIPs triggering maps of the first two convolution layers without much further comments. I refer to the layer names as discussed in previous articles of this series.

Input image patterns which lead to a maximum activation of the maps of the convolutional layer Conv2D-1

Layer “Conv2D-1” has 32 maps. With simple fluctuations on the length scale of one to two pixels, we can easily create OIP-images for each of the maps.

Most of these images were actually derived from one and the same input image.

Input image patterns which lead to a maximum
activation of the maps of the convolutional layer Conv2D-2

Layer “Conv2D-2” has 32 maps. With simple fluctuations on the length scale of one to two pixels plus some experiments with pixel value fluctuation son longer scales, I could produce OIP-images for we can easily create OIP-images for 51 of the maps. Experiments for the other 13 maps took to long time; the systematic approach with large scale fluctuations, which we discussed thoroughly in previous articles did not help on layer 2. If you look at the images below, you see that it i more likely that we need specific short and middle-scale fluctuations. However, the amount of possible data combinations is just too big for a systematic investigation.

Conclusion

In the course of the last articles we got a nice overview over the kind of patterns to which the maps of the different convolutional layers of a CNN react to. We are well prepared now to turn back to the question of what the ominous “features” of objects in input images really are. In the meantime have a look at another application of filter visualization in the realm of Deep Dreams, which I recently started to discuss in another article series of this blog. Stay tuned and wear masks to avoid the Corona virus! Stay healthy!

Other (previous) articles in this series

A simple CNN for the MNIST dataset – IV – Visualizing the activation output of convolutional layers and maps
A simple CNN for the MNIST dataset – III – inclusion of a learning-rate scheduler, momentum and a L2-regularizer
A simple CNN for the MNIST datasets – II – building the CNN with Keras and a first test
A simple CNN for the MNIST datasets – I – CNN basics

 

Deep Dreams of a CNN trained on MNIST data – I – a first approach based on one selected map of a convolutional layer

It is fun to play around with Convolutional Neural Networks [CNNs] on the level of an dedicated amateur. One of the reasons is the possibility to visualize the output of elementary building blocks of this class of AI networks. The resulting images help to understand CNN algorithms in an entertaining way – at least in my opinion. The required effort is in addition relatively limited: You must be willing to invest a bit of time into programming, but on a quite modest level of difficulty. And you can often find many basic experiments which are in within the reach of limited PC capabilities.

A special area where the visualization of CNN guided processes is the main objective is the field of “Deep Dreams“. Anyone studying AI methods sooner or later stumbles across the somewhat psychedelic, but none the less spectacular images which Google presented in 2016 as a side branch of their CNN research. Today, you can download DeepDream generators from GitHub.

When I read a bit more about “DeepDream” experiments, I quickly learned that people use quite advanced CNN architectures, like Google’s Inception CNNs, and apply them to high resolution images (see e.g. the Book of F. Chollet on “Deep Learning with Keras and Python” and ai.googleblog.com, 2015, inceptionism-going-deeper-into-neural). Even if you pick up an already trained version of an Inception CNN, you need some decent GPU power to do your own experiments. Another questionable point for an interested amateur is: What does one actually learn from applying “generators”, which others have programmed, and what from just following a “user guide” without understanding what a DeepDream SW actually does? Probably not much, even if you produce stunning images after some time…

So, I asked myself: Can one study basic methods of the DeepDream technology with self programmed tools and a simple dataset? Could one create a “DeepDream” visualization with a rather simply structured CNN trained on MNIST data?
The big advantage of the MNIST data set is that the individual samples are small; and the amount of numerical operations, which a related simple CNN must perform on input images, fits well to the capabilities of PC technology – even if the latter is some years old.

After a first look into DeepDream algorithms, I think: Yes, it should be possible. In a way DeepDream experiments are a natural extension of the visualization of CNN filters and maps which I have already discussed in depth in another article series. Therefore, DeepDream visualizations might even help us to better understand how the internal filter of CNNs work and what “features” are. However, regarding the creation of spectacular images we need to reduce our expectations to a reasonably low level:

A CNN trained on MNIST data works with gray images, low resolution and only simple feature patterns. Therefore, we will never produce such impressive images as published by DeepDream artists or by Google. But, we do have a solid chance to grasp some basic principles and ideas of this side-branch of AI with very simplistic tools.

As always in this blog, I explore a new field step-wise and let you as a reader follow me through the learning process. Throughout most of this new series of articles we will use a CNN created with the help of Keras and filter visualization tools which were developed in another article series of this blog. The CNN has been trained on the MNIST data set already.

In this first post we are going to pick just a single selected feature or response map of a deep CNN layer and let it “dream” upon a down-scaled image of roses. Well, “dream“, as a matter of fact, is a misleading expression; but this is true for the whole DeepDream
business – as we shall see. A CNN does not dream; “DeepDream” creation is more to be seen as an artistic discipline using algorithmic image enhancement.

The input image which we shall feed into our CNN today is shown below:

As our CNN works on a resolution level of 28×28 pixels, only, the “dreaming” will occur in a coarse way, very comparable to hallucinations on the blurred vision level of a short-sighted, myopic man. More precisely: Of a disturbed myopic man who works the whole day with images of digits and lets this poor experience enter and manipulate his dreamy visions of nicer things :-).

Actually, the setup for this article’s experiment was a bit funny: I got the input picture of roses from my wife, who is very much interested in art and likes flowers. I am myopic and in my soul still a theoretical physicist, who is much more attracted by numbers and patterns than by roses – if we disregard the interesting fractal nature of rose blossoms for a second :-).

What do DeepDreams based on single maps of trained MNIST CNNs produce?

To rouse your interest a bit or to disappoint you from the start, I show you a typical result of today’s exercise: “Dreams” or “hallucinations” based on MNIST and a selected single map of a deep convolutional CNN layer produce gray scale images with ghost-like “apparitions”.


When these images appeared on my computer screen, I thought: This is fun, indeed! But my wife just laughed – and said “physicists” with a known undertone and something about “boys and toys” …. I hope this will not stop you from reading further. Later articles will, hopefully, produce more “advanced” hallucinations. But as I said: It all depends on your expectations.

But, lets focus: How did I create the simple “dream” displayed above?

Requirements – a CNN and analysis and visualization tools described in another article series of this blog

I shall use results and methods, which I have already explained in another article series. You need a basic understanding of how a CNN works, what convolutional layers, kernel based filters and cost functions are, how we can build simple CNNs with the help of Keras, … – otherwise you will be lost from the beginning.
A simple CNN for the MNIST datasets – I – CNN basics
We also need a CNN, already trained on the MNIST data. I have shown how to build and train a very simple, yet suitable CNN with the help of Keras and Python; see e.g.:
A simple CNN for the MNIST datasets – II – building the CNN with Keras and a first test
A simple CNN for the MNIST dataset – III – inclusion of a learning-rate scheduler, momentum and a L2-regularizer
In addition we need some
code to create input image patterns which trigger response maps or full layers of a CNN optimally. I called such pixel patterns “OIPs”; others call them “features”. I have offered a Python class in the other article series which offers an optimization loop and other methods to work on OIPs and filter visualization.
A simple CNN for the MNIST dataset – XI – Python code for filter visualization and OIP detection

We shall extend this class by further methods throughout our forthcoming work. To develop and run the codes you should have a working Jupyter environment, a virtual Python environment, an IDE like Eclipse with PyDev for building larger code segments and a working Cuda installation for a NVidia graphics card. My 960GTX proved to be fully sufficient for what we are going to do.

Deep “Dream” – or some funny image manipulation?

As it unfortunately happens so often with AI topics: Also in case of the term “DeepDream” the vocabulary is exaggerated and thoroughly misleading. A simple CNN neither thinks nor “dreams” – it is a software manifestation of the results of an optimization algorithm applied to and trained on selected input data. If applied to new input, it will only detect patterns for which it was optimized before. You could also say:

A CNN is a manifestation of learned prejudices.

CNNs and other types of AI networks filter input according to encoded rules which serve a specific purpose and which reflect the properties of the selected training data set. If you ever used the CNN of my other series on your own hand-written images after a training only on the (US-) MNIST images you will quickly see what I mean. The MNIST dataset reflects an American style of writing digits – a CNN trained on MNIST will fail relatively often when confronted with image samples of digits written by Europeans.

Why do I stress this point at all? Because DeepDreams reveal such kinds of “prejudices” in a visible manner. DeepDream technology extracts and amplifies patterns within images, which fit the trained filters of the involved CNN. F. Chollet correctly describes “DeepDream” as an image manipulation technique which makes use of algorithms for the visualization of CNN filters.

The original algorithmic concept for DeepDreams consists of the following steps:

  • Extend your algorithm for CNN filter visualization (= OIP creation) from a specific map to the optimization of the response of complete layers. Meaning: Use the total response of all maps of a layer to define contributions to your cost function. Then mix these contributions in a defined weighted way.
  • Take some image of whatever motive you like and prepare 4 or more down-scaled versions of this image, i.e. versions with different levels of size and resolution below the original size and resolution.
  • Offer the image with the lowest resolution to the CNN as an input image.
  • Loop over all prepared image sizes :
    • Apply your algorithm for filter visualization of all maps and layers to the input image – but only for a very limited amount of epochs.
    • Upscale the resulting output image (OIP-image) to the next level of higher resolution.
    • Add details of the original image with the same resolution to the upscaled OIP-image.
    • Offer the resulting image as a new input image to your CNN.

Readers who followed me through my last series on “a simple CNN for MNIST” should already raise their eyebrows: What if the CNN expects a certain fixed size of of the input image? Well, a good question. I’ll come back to it in a second. For the time being, let us say that we will concentrate more on resolution than on an
actual image size.

The above steps make it clear that we manipulate an image multiple times. In a way we transform the image slowly to improve a layer’s response and repeat the process with growing resolution. I.e., we apply pattern detection and amplification on more and more details – in the end using all available large and small scale filters of the CNN in a controlled way without fully eliminating the original contents.

What to do about the low resolution of MNIST images and the limited capability of a CNN trained on them?

MNIST images have a very low resolution, real images instead a significantly higher one. With our CNN specialized on MNIST input the OIP-creation algorithm only works on (28×28)-images (and with some warnings, maybe, on smaller ones). What to do about it when we work with input images of a size of e.g. 560×560 pixels?

Well, we just work on the given level of resolution! We have three options:

  • We can downsize the input image itself or parts of it to the MNIST dimensions – with the help of a bicubic interpolation. Then our OIP-algorithm has the chance to detect OIPs on the coarse scale and to change the downsized image accordingly. Then we can upscale the result again to the original image size – and add details again.
  • We can split the input image into tiles of size (28×28) and offer these tiles as input to the CNN.
  • We can combine both of the above options.

Its like what a shortsighted human would do: Work with a blurred impression of the full scale image or look at parts of it from a close distance and then reassemble his/her impressions to larger scales.

A first step – apply only one specific map of a convolutional layer on a down-scaled image version

In this article we have a very limited goal for which we do not have to change our tools, yet:

  • Preparation:
    • We choose a map.
    • We downscale the original image to (28×28) by interpolation, upscale the result again by interpolating again (with loss) and calculate the difference to the original full resolution image (all interpolations done in a bicubic way).
  • Loop (4 times or so):
    • We apply the OIP-algorithm on the downscaled input image for a fixed amount of epochs
    • We upscale the result by bicubic interpolation to the original size.
    • We re-add the difference in details.
    • We downscale the result again.

With this approach I try to apply some of the elements of the original algorithm – but just on one scale of coarse resolution. I shall discuss the code for realizing the recipe given above with Python and Jupyter in the next article. For today let us look at some of the ghost like apparitions in the dreams for selected maps of the 3rd convolutional layer; see:
A simple CNN for the MNIST dataset – IX – filter visualization at a convolutional layer

DeepDreams based on selected maps of the 3rd convolutional layer of a CNN trained on MNIST data

With the image sections displayed below I have tried to collect results for different maps which focus on certain areas of the input image (with the exception of the first image section).

The first two images of each row display the detected OIP-patterns on the (28×28) resolution level with pixel values encoded in a (viridis) color-map; the third image in gray scale. The fourth image reveals the dream on the blurry vision level – up-scaled and interpolated to the original image size. You may still detect traces of the original rose blossoms i these images. The last two images of each row display the results
after re-adding details of the original image and an adjustment of the width of the value distribution. The detected and enhanced pattern then turns into a whitey, ghostly shadow.

I have given each section a fancy headline.

I never promised you a rose garden …

“Getting out …”

“Donut …”

“Curls to form a 3 …”

“Two of them …”

“The creepy roots of it all …”

“Look at me …”

“A hidden opening …”

“Soft is something different …”

“Central separation …”

Conclusion: A CNN
detects patterns or parts of patterns it was trained for in any kind of offered input …

You can compare the results to some input patterns (OIPs) which strongly trigger individual maps on the 3rd convolutional layer; you will detect similarities. E.g. four OIP- or feature patterns to which map 56 reacts strongly, look like:

Filter visualization 1 for CNN map 56Filter visualization 2 for CNN map 56Filter visualization 3 for CNN map 56Filter visualization 4 for CNN map 56

This explains the basic shape of the “apparition” in the first “dream”:

This proves that the filters of a trained CNN actually detect patterns, which proved to be useful for a certain training purpose, in any kind of input which shows some traces of such patterns. A CNN simply does not “know” better: If you only have a hammer to interact with the world, everything becomes a nail to you in the end – this is the level of stupidity on which a CNN algorithm works. And it actually is a fundamental ingredient of DeepDream image manipulation – a transportation of learned patterns or prejudices to an environment outside the original training context.

In the next article
Deep Dreams of a CNN trained on MNIST data – II – some code for pattern carving
I provide the code for creating the above images.

Further articles in this series

Deep Dreams of a CNN trained on MNIST data – II – some code for pattern carving
Deep Dreams of a CNN trained on MNIST data – III – catching dream patterns at smaller length scales