Nupro X3000 RC – a solid high quality supplement to your Linux Audio

A friend asked me what sound equipment I use on my Linux machine. She wanted to to buy some new decent speakers. I had to make a similar decision a year ago. Coming to a conclusion back then became a more difficult process than I had expected.

I admit that I am a total amateur regarding sound equipment. I have not changed my sound cards (Asus Sonar D2X, Creative X-Fi Titanium, Onboard High Definition GM206) for a long, long time. And I do not hear as well as in my younger years. But during Corona and home office times I became really discontent with my old Creative speakers. One cannot all the time wear headphones. So some new speakers for my Linux workstation became a topic on my private agenda.

Questions ahead of a decision for some speakersfor your PC

When I seriously started thinking about some investment the following questions came up:

A surround system? Active or passive boxes? Suitable for a shelf or standing on the floor? Do you want to use the speakers later also in other contexts than just as a background equipment in your working room? What is appropriate for your room size? Connections cable (copper, optical?) based or WiFi or Bluetooth based? In my age when hearing capabilities are reduced: Will high end properties make a difference at all? And the most limiting factor: budget.

Taking all these factors into account will certainly lead to very personal decisions. So, when I make an explicit recommendation here - take it with caution and a grain of salt.

Guidelines to choosing speakers for a non-professional PC environment

Here are the personal guidelines which I followed - after I had read reviews, listened to Teufel and Edifier speakers at friends and listened to a relative expensive Logitech surround system at my nephew. You may have other references, other budgets and hear much better and more differentiated than I do. So relax if you come to other conclusions.

And do not forget: I am talking about sound equipment on a PC for background music enjoyment in a working room - not for professional objectives and High End specialists.

  • Recommendation 1: If you are interested in sound quality and are a music enthusiast - forget about surround systems. Quantity (many speakers) almost always enforces quality compromises, which you are going to hear in the end. Better invest your money into a 2.0 or 2.1 system which fits the (probably) limited size of your working room.
  • Recommendation 2: If your room size is up to 30 square meters, invest into relatively small speakers - but of studio quality. They will give you a much more pronounced and positioned sound than surround systems. Regarding money think of speakers which you later can supplement with a sub-woofer - e.g. in case you want to move the speakers to a larger room sometime in the future.
  • Recommendation 3: Regarding bass: I am a heavy metal friend - sometimes. I have my phases and periods regarding music ... Sometimes I like Jazz, only. Bass in the named two cases has a different meaning to me - but in any case I do not like resonances of my speakers. The stereo speakers alone should already provide a solid, broad and resonance free bass fundament - without a sub-woofer. A sub-woofer can deliver an extra feeling in the case of metal - but for Jazz and classical music I would not consider a sub-woofer as really relevant. So go for some solid speakers with the option of adding a sub-woofer in the future.
  • Recommendation 4: Do not underestimate the effect (or limitations) of the DAC in your sound card! At a certain quality level of your future speakers you are probably going to hear differences. So - if you are lucky and can invest into expensive speakers rethink your sound card equipment, too.
  • Recommendation 5: Do not underestimate the effect of the boxes' positions in the room. Also in small rooms you will experience bass line effects around 100 Hz or so if you place your boxes in the room's corners. This leads to the point that you may want some equalizer option to optimize the bass base a bit. Well, Linux or at least most music applications for Linux supply you with equalizers; but it is a nice option to be able to do something at the (active) boxes themselves to get a basic "direction" into your sound environment. And here we would also like to have the option of defining some "presets".
  • Recommendation 6: Active boxes or amplifier? A very difficult question! In a PC and mobile environment I would tend to active speakers, but ... The amplifier technique today is so good that at least in my case my hearing deficits are certainly more important.
  • Recommendation 7: Wifi? personally, I would say: Yes, you should have this option. But if so: Go for a 5 GHz band. And check whether your router offers you the option to define the precise band it should work on or whether the router automatically adapts the precise channel to avoid disturbances with other sources.
  • Personal opinion some people certainly would like to crucify me for: Teufel speakers seem to be a bit overestimated. Personally I do not think that the quality-price relation is convincing. After having heard to a standing speaker pair I think that the balance between bass and mid-range frequency sound is strange. Very vague in a way.

Nupro X3000 speakers as a solid option for a reasonable price

Taking all these aspects into account I ended up with a decision for (active) Nupro X3000 RC speakers from the producer "Nubert electronic GmbH".

So far, I have not regretted this decision for a second. These boxes did not disappoint me - neither with Classical music, Jazz nor Heavy Metal.

Though admittedly, if you want to feel bass and drumming these boxes improve their performance in larger rooms certainly a bit when combined with a sub-woofer (which I personally use at a second sound card). But this happens at rare occasions ...

Ease of setup?

The setup of the active boxes is very simple; the explanations on the accompanying leaflets are fully sufficient. You define everything by a 4 direction control button on one of the speakers. The button and a small display are hidden behind magnetically attached front panels.

Basically, you just have to define a master and a slave speaker in the first setup round and choose a connection to your sound source - here to the output connectors of a PC soundcard. In the end I used the "aux" entry and still live with an analog cable based connection between the sound card and the main box plus a digital coax cable between the boxes. (Due to the speakers' distance I had to buy an additional coax cable. It disappears behind a shelf).

But a WiFi connection between the speakers works very well, too. I could see no major conflict with the 5 GHz channels occupied by the WLAN routers in my surroundings.

The basic connection options to your PC and sound card are manifold: The USB-interface of the Nupro sound processor appears as an USB sound card on your PC; this "sound card" is well supported on my Opensuse and KDE based Linux systems. You just have to chose the SPDIF stereo variant of the two options offered in the KDE/Phonon sound settings.

Besides an USB cable the connection cables delivered with the speakers include an optical cable with TOSLink adapters, a SPDIF cable and analog cables with cinch connectors. And eventually there also is the option of a Bluetooth connection - if your PC has such a device.

In the end I personally heard no major difference between analog and digital signal handling. Neither with USB nor the optical connection to my old ASUS Xonar D2X sound card or the optical connection to the X-FI Titanium nor the onboard GM206 High Definition soundcard. The TI-Burr-Brown DAC of the Asus card still seems to be relatively good - at least for my ears.

I also have an additional X-FI Titanium card from Creative in my PC. I like the sound of the Asus card better with my Sennheiser headphones. Regarding the Nupro X3000 I was actually in doubt: For some music I find the sound slightly crispier with the X-Fi. However, whether this is a sign of quality is questionable. I change the sound card from time to time, just for fun - and still have no real preference.

Regarding distances the analog cable option for the connection to your PC's sound card may be the most reasonable solution - as the optical, SPDIF coax and USB cables coming with the speakers are of limited length.

There is even a possibility to realize a pure Wifi connection from your PC to the X3000 RC speakers. Such a solution, however, requires a special transceiver (135 €) from the producer Nubert; see below. I have no tested this type of connection, yet.

They speakers offer you some basic options regarding the sound balance. A very positive feature is the integrated 5 band equalizer. As said above this allows for a basic adjustment of the sound signature. Not unimportant in my age. In addition the handheld remote control device allows for a change of the relative basic balance between bass and treble.

You can also define a lower cut-off frequency for the bass and the transition frequency to a sub-woofer. Furthermore you can set 6dB a gain of certain analog input channels.

Disappointments ?

Something which disappointed me was the Bluetooth connection of the X3000 RC to my old Samsung smartphone - here I got periodic dropouts. I have not clarified this problem up to now. I do not exclude problems with the Bluetooth and the VLC player on my phone. In reviews I have not read about any such dropouts - but you have been warned. I recently tried a Bluetooth connection from my laptop, too. This one worked flawless. So, I do not know ...

Another major disappointment was and is Nubert's "X-Remote App". In my case it simply does not work on my Android 6 device. It gets stopped by Android just after granting permission to determine the geo-location. Which by the way is something I do not like in general. I got in contact with the Nubert company recently. They affirmed that they do not collect data, but that it is Google which enforces the explicit accept for geo-location when building up Wifi connections. Had to be expected, we know this stupid problem already from the mess with the German Corona App on Android. BBG again - Big Brother Google ... No further comments required.

I had no real need for the App so far. After the basic setup of all the speaker's internal settings (e.g. the equalizer) I can control the most needed adjustments via the handheld remote control accompanying the speakers. The "room calibration" feature of the App would have been nice - but it requires buying an additional piece of microphone equipment from Nubert for Android smartphones.

Sound quality

Do not expect a solid sound quality review from me. I have neither equipment nor objective, trained ears for such a review. I can only describe an impression - very much in analogy to wine - a sort of personal sound "taste and feeling" after having heard a lot of music on the speakers. Do I like them with different kinds of music, vocals and instruments?

In a nightlong session I have also compared the Nupro X3000 capabilities with my old Elac 4π (4 Pi) speakers in the living room. They are controlled by NAD pre- and end-amplifiers plus a NAD CD player. I did the comparison with music pieces of very different styles. I really was astonished how good the the small Nupro 3000x speakers could follow the 4π (4 Pi) Elac speakers and fill the room with sound and a solid bass base! Well, of course the Elacs do a better job with the bass at some point, but no wonder regarding their dimensions. Still, this first impression of the Nupro speakers was very convincing.

Then I moved the Elacs and Nupros boxes into my smaller working room - well, the Nupro X3000 at once felt much more adequate. They positioned different sound origins in the stereo sound cloud much more precisely - which is no wonder either. And they filled the whole room with music easily.

A hint: As the speakers work with a bass reflex opening at their backside you should not position the boxes directly at at wall - but leave some space.

Meanwhile, I have listened to a broad spectrum of music on these speakers - ranging from Eberhard Weber, Jan Gabarek, Kjetil Bjørnstad (with an without vocals), Laurie Anderson to compositions of Steve Reich, Rihm, Arvo Pärt and to recent recordings of classical music as of the Danish String Quartet or Sol Gabetta. Intermixed with stuff from Riverside, Korn, Linkin Park, Amorphis, Insomnium, Dark Tranquility, In Flames and Rammstein. As well as a lot of classical symphony and opera recordings. And - as a very welcome side effect - I have re-detected the wonders in the songs of Tom Waits.

You know what: All of it was pure joy - taking into account the sometimes strange intentional distorted mix you find in some heavy metal pieces.

In my opinion the balance between bass, mid-range and treble of the X3000 RC speakers is very good. You (almost) never loose the resolution of instruments covering different frequency regions. Some critics in the audio press was directed to problems in the mid-range frequency area. Personally, I cannot confirm this. If there is some problem, I would bet it appears in larger rooms. But this is not the target environment of these speakers. In my working room the mid range appears very present - both with vocals and classical instruments. But, probably I do not know what high end sound really is ... 🙂

I could not hear any bass resonances so far - with standard settings. But when you place the speakers close to a wall or corner you may want to reduce the low bass (< 100 Hz) a bit.

Summary: I very seldom use my Sennheiser headphones these days. I really do like the sound of these speakers.

Are there weaknesses? Well, the X3000 speakers have a little weakness at very low volume in my opinion - the relative weight of mid-range vs. bass changes to bass. May have to do with reflections in the room (or my hearing). But the advantage is that I have so far not felt any need for setting the loudness option to on.

Future options?

Now, I come to a point which makes the Nupro boxes also an investment into some future wireless audio infrastructure: For 135€ you get the NuConnect trX Wireless transceiver (https://www.nubert.de/nuconnect-trx/p4210/). This little brick allows eg. for multi-room wireless solutions, but also for a transmission of digital signals from your PC or other sources to the active speakers.

Alternatively, you could also think about a combination of the trX Transceiver with the "NuControl 2 pre-amplifier" or (a cheaper) AmpX amplifier - both interesting products of Nubert. The latter amplifier uses in my understanding the same amplifying bricks as the active speakers, but now combined and supplemented with other electronics and thus turned into a full amplifier. The critics of this 700 € amplifier are surprisingly good (see: https://www.nubert.de/nuconnect-ampx/p3646/?category=225).

So, the speakers mark an entrance into a much broader eco-system. In my case a completely digitized audio center on a Linux workstation combined with the trX transceiver, the X3000 speakers, the AmpX and other already existing audio equipment in different rooms appears on the horizon.

Sound support on my Linux system

Working with two soundcards
As I have two sound cards available I kept the three front speakers and the subwoofer box of my old Creative speaker set. The front speakers are placed on my working table - the subwoofer on the floor. This allows for astonishing surround feelings even with stereo sound. A little contribution of these desktop speakers to the louder sound coming from the X3000 in the background and you "swim in an extended audio space". Interesting for some kinds of music. Here the Pulseaudio mixer (pavucontol) on a Linux system is of advantage to balance sound contributions between the different channels of the active sound cards accurately and al gusto.

Regarding the Linux sound support in general
As a Linux user I have made my peace with Pulseaudio, pavucontrol, the Ladspa equalizer and KDE's Phonon over the years. It is sometimes still a mess to reproduce working settings for multiple multi-channel sound cards after system upgrades - but once PA and Phonon do work as expected, they do their work well.

The last time when strange things happened was when I upgraded to Opensuse Leap 15.2. Reason: Substantial changes to the Phonon user interface combined with a loss of differentiated setting options. As a result I had to manipulate the directives in the PA configuration files locally in my home directory and below /etc/pulse to get everything right again. The loss or hiding of options is a sickness that has spread itself over central KDE applications during the last years .... I always make a backup of my personal PA settings in my home directory and central Alsa and PA settings, now.

A major topic always is to find working settings which direct all sound output of any application through the Ladspa equalizer and then its output to multiple sound cards. On a KDE desktop such settings have to be consistent with Phonon settings - or the system will forget and overwrite your preferences with the next system start. Then you know that you have to manually change entries in the configuration files ...

Be careful with your new speakers when experimenting and switching to new sound configurations - e.g. from analog to digital signals or changes of the the sound card or moving from PA to pure Alsa. The resulting sound and, in some cases, also distortions may be louder than you expect! Always turn the volume of your external speakers to a minimum ahead of such experiments - and also reduce the volume of sound sources to a very low level.

During the last three to four years I have used the PA mixer "pavucontrol" to control the relative volumes of sound sources (i.e. applications) and the audio channels of the different sound cards on my system. But be careful with your settings here, too. In the past Pulseaudio did some strange things with audio signals from the system - e.g. turning them suddenly to 100%. I have not experienced such things in the past 3 years, but Nupro X boxes are too expensive to risk any accidental damage.

The 15-band PA Ladspa equalizer helps to define some basic sound presets with very slight adjustments - the Nupro speakers basically do not need any significant changes from a flat frequency curve of the equalizer.

Note that changes of the equalizer's settings may be accompanied by a general volume reduction on pavucontrol and a loss of relative channel weights there. Saving (and loosing) presets of the equalizer is no fun either. Some mess will probably always remain with PA ... You just need to invest some time into balanced presets - and then do not touch the central equalizer again.

The good thing is that you can change the direction of the output of applications to a sound sink directly with pavucontrol. So, you can configure the sound output of music applications to run through an equalizer or not. Again - be careful with the impact of such changes on the volume.

My favorite player still is Clementine at 48.000 or 96.000 Hz sampling rate. It offers its own equalizer. If you want to fiddle with an equalizer than use this one.

Sound extraction from CD recordings I do with K3B to "lossless" Ogg Vorbis or Flac encoding.

Conclusion

The active Nupro X 3000 RC speakers are worth the money you have to pay for them. They suit any Linux workstation well. The connection options to sound sources are manifold. Basic analog cable connections work, of course. An USB connection was directly supported on my Opensuse Linux. Optical and SPDIF coax connections to respective output connectors of sound cards work well, too. The possibility to create a full Wifi based solution with some extra (135€) equipment from Nubert is an additional goody.

The setup and the configuration of a speaker pair were very simple. You get an included 5 band equalizer in each speaker, which allows for basic room and position adjustment.

The general sound quality is in my opinion and for my ears excellent. The speakers easily fill small and even rooms up to 40 square meters with sound and provide a solid bass. The balance between bass, mid range and treble fits my ears. Single instruments in complicated arrangements are well distinguished. The positioning of sources in the stereo range is very good.

Links

https://www.igorslab.de/en/welcher-passt-besser-nubert-nupro-x-3000-rc-oder-nupro-x-4000-rc-und-die-qual-der-wahl-2/4/
https://www.lite-magazin.de/2018/11/aktivlautsprecher-nubert-nupro-x-3000-kompakte-komplettloesung-auf-audiophilem-niveau/
https://www.technic3d.com/ article/ audio/ lautsprecher/2087-test-aktive-kompaktbox-nubert-nupro-x-3000-rc/1.htm

Enforcing specific command arguments for a selected user with sudo

In one of my last articles I needed to enforce the execution of a command with certain arguments - specific for the present user. I.e., I wanted to take away the freedom of the user to set arbitrary command argument(s) as he/she liked.

This had to be done in addition to another set of rules - namely a bunch of iptables filter rules which also depended on the UID. So the command had to be run with the UID of the user him/herself and not with the UID of root.

The solution came with sudo. This may appear a bit surprising to some readers. The reason is that sudo normally is used to allow selected users to run commands with another UID than the one they have themselves. I call "the other user" the "effective user" below. In a sudo context the effective user corresponds to a SUDO_UID variable in addition to the UID environment variable. The predominant example for invoking the sudo-mechanism certainly is to allow users to run a command as root. But it can be extended to any other user (made harmless by taking away hsi/her login shell or being especially privileged due to membership in a special group).

In my case I needed to enforce command execution with an effective user being identical to the original user him/herself - but with special arguments. In such a situation you have to take take the permission to execute/read the original command completely away from the user. Otherwise he/she could use it with any argument. But sudo requires that the defined effective user is able to read and execute the command. This seemingly contradictory situation can be solved by invoking a special user-group.

Maybe the recipe described below helps some readers to enforce command execution with specific arguments in other contexts.

A simple scenario

Let us assume you have developed a program "myprog" which accesses special web-service that you have installed on some web-servers in your Intranet. Let us further assume that some specific users shall be restricted to access the service on a defined server only - and there only with certain arguments. Such parameters may reduce or give rights to access certain data the service could in principle provide. All this is regulated by 2 arguments to the myprog-program: a FQDN for the host and a "level". "level 0" allows free access to a very basic service version. "level 1" invokes the program with personalized options and requires a login. But your people have started to play around with the Login. So, you want a group of users to issue the command with a certain "host" and "level 0" only. Let us assume that user "mark" is one of those users who should invoke the command only in the form

mark@mytux:~>myprog -h myserv.anraconc.de -L 0

How can we achieve this with sudo?

Restricting command use to specific arguments

Below I discuss modifications of the "/etc/sudoer" file. This is risky in very many ways - not only regarding security.

Disclaimer: I take no responsibility whatever for the consequences of the sudo approach described below and its application to your computers. The sudoer rules have to be tested carefully before the are used in a production environment and their setup must be supervised by an expert.

I assume that you have installed your program at the path "/usr/bin/myprog" with standard rights

-rwxr-xr-x 1 root root 334336 17. Mai 2020  /usr/bin/myprog

Then one can follow the following recipe (as root) to get "mark" under control:

  • Step 1: Create a special group for the command in question, e.g. "mygroup". Ensure that mark does not become a member of this group.
  • Step 2: Change the ownership and access rights of "/usr/bin/myprog" according to
    • chown root.mygroup /usr/bin/myprog
    • chmod 750 /usr/bin/myprog
  • Step 3: Add some lines to "/etc/sudoers" (with visudo):
     
    ....
    Defaults env_reset
    Defaults env_keep = "LANG LC_ADDRESS LC_CTYPE LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE LC_ATIME LC_ALL LANGUAGE LINGUAS XDG_SESSION_COOKIE"
    
    Defaults:mark env_keep += "DISPLAY"
    
    #Defaults targetpw   # ask for the password of the effective target user e.g. root
    #ALL   ALL=(ALL) ALL
    
    mark ALL=(mark:mygroup) /usr/bin/myprog -h mysrv.anraconc.de -L 0
    ....
    

Explanation:
Due to step 2 user "mark" cannot read, change or execute the command directly any longer. The rest depends a bit on ensuring the "mark" never becomes a member of group "mygroup". (But other users which you trust may become members.)

Regarding the sudoer rules I assumed that you reset the environment of a sudo user as a default. Keeping up the "DISPLAY" variable helps to get around some access problems with the present X11-screen of "mark". I also assumed that you use the sudo-mechanism in a way which requests that the user enters his/her password. The last line allows "mark" on all hosts/terminals to execute "/usr/bin/myprog" as him/herself, but with the GID of group "mygroup" and exactly with the options "-h mysrv.anraconc.de -L 0".

Note that sudo compares the command including arguments as one string!

User "mark" must run the command myprog from now on in the form:

sudo -u mark -g mygroup  myprog -h myserv.anraconc.de -L 0

and enter his password. Any deviation will be blocked by the sudoer mechanism.

Some practical hints

After carefully evaluating security implications you can make life easier for "mark" in two ways:

  • Let him execute the command (with the defined arguments) without providing a password. Use the NOPASSWD attribute; see the man pages for the sudoers file.
  • Write a script which encapsulates the described sudo command with the options.

By such measures, you may save "mark" some typing time.

Another point is to keep the file permissions up in the future. This may become a problem if and when you apply the described mechanism to some standard Linux commands which are installed and updated by some package administration tool of your distribution. You have to carefully check that the installation routines do not overwrite the permission settings! A handwritten systemd service or a cron job may help you with this task.

With some reading or experience it should be easy to extend the described recipe to groups of users and to other commands.

There are multiple ways to allow other users to execute the command freely if this should be required. The sudoer file knows about a logical NOT operator (!); this helps to add a sudoer rule for all users but NOT "mark". Another simple approach would be to add all users but "mark" to the group "mygroup".

Conclusion

The sudoer-mechanism is a mighty Linux tool. We can not only allow users to execute commands as another user, but also with the permissions of another group. AND we can enforce the usage of commands with predefined arguments for selected users or user groups.

As fiddling with the sudoer mechanism is always a bit risky: Please, write me a mail if you find some major mistake or security problem of my approach.

A simple CNN for the MNIST dataset – VIII – filters and features – Python code to visualize patterns which activate a map strongly

Our series about a simple CNN trained on the MNIST data turns back to some coding.

A simple CNN for the MNIST dataset – VII – outline of steps to visualize image patterns which trigger filter maps
A simple CNN for the MNIST dataset – VI – classification by activation patterns and the role of the CNN’s MLP part
A simple CNN for the MNIST dataset – V – about the difference of activation patterns and features
A simple CNN for the MNIST dataset – IV – Visualizing the activation output of convolutional layers and maps
A simple CNN for the MNIST dataset – III – inclusion of a learning-rate scheduler, momentum and a L2-regularizer
A simple CNN for the MNIST datasets – II – building the CNN with Keras and a first test
A simple CNN for the MNIST datasets – I – CNN basics

In the last article I discussed an optimization algorithm which should be able to create images of pixel patterns which trigger selected feature maps of a CNN strongly. In this article we shall focus on the required code elements. I again want to emphasize that I apply and modify some basic ideas which I read in a book of F. Chollet and in a contribution of a guy called Mohamed to a discussion at kaggle.com (see my last article for references). A careful reader will notice differences not only with respect to coding; there are major differences regarding the normalization of intermediate data and the construction of input images. To motivate the latter point I first want to point out that OIPs are idealized technical abstractions and that not all maps may react to purely statistical data on short length scales.

Images of OIPs which trigger feature maps are artificial technical abstractions!

In the last articles I made an explicit distinction between two types of patterns which we can analyze in the context of CNNs:

  • FCP: A pattern which emerges within and across activation maps of a chosen (deep) convolutional layer due to filter operations which the CNN applied to a specific input image.
  • OIP: A pattern which is present within the pixel value distribution of an input image and to which a CNN map reacts strongly.

Regarding OIPs there are some points which we should keep in mind:

  • We do something artificial when we create an image of an elementary OIP pattern to which a chosen feature map reacts. Such an OIP is already an abstraction in the sense that it reflects an idealized pattern - i.e. a specific geometrical correlation between pixel values of an input image which passes a very specific filter combinations. We forget about all other figurative elements of the input image which may trigger other maps.
  • There is an additional subtle point regarding our present approach to OIP-visualization:
    Our algorithm - if it works - will lead to OIP images which trigger a map's neurons maximally on average. What does "average" mean with respect to the image area? A map always covers the whole input image area. Now let us assume that a filter combination of lower layers reacts to an elementary pattern limited in size and located somewhere on the input image. But some filters or filter combinations may be capable of detecting such a pattern at multiple locations of an input image.
    One example would be a crossing of two relatively thin lines. Such a crossing could appear at many places in an input image. In fact, a trained CNN has seen several thousand images of handwritten "4"s where the crossing of the horizontal and the vertical line actually appeared at many different locations and may have learned about this when establishing filters. Under such conditions it is obvious that a map gets optimally activated if a relatively small elementary pattern appears multiple times within the image which our algorithm artificially constructs out of initial random data.
    So our algorithm will with a high probability lead to OIP images which consist of a combination or a superposition of elementary patterns at multiple locations. In this sense an OIP image constructed due to the rule of a maximum average activation is another type of idealization. In a real MNIST image the re-occurrence of elementary patterns may not be present at all. Therefore, we should be careful and not directly associate the visualization of a pattern created by our algorithm with an elementary OIP or "feature".
  • The last argument can in some cases also be reverted: There may be unique large scale patterns which can only be detected by filters of higher (i.e. deeper) convolutional levels which filter coarsely and with respect to relatively large areas of the image. In our approach such unique patterns may only appear in OIP images constructed for maps of the deepest convolutional layer.

Independence of the statistical data of the input image?

In the last article I showed you already some interesting patterns for certain maps which emerged from randomly varying pixel values in an input image. The fluctuations were forced into patterns by the discussed optimization loop. An example of the resulting evolution of the pixel values is shown below: With more and more steps of the optimization loop an OIP-pattern emerges out of the initial "chaos".

Images were taken at optimization steps 0, 9, 18, 36, 72, 144, 288, 599 of a loop. Convergence is given by a change of the loss values between two different steps divided by the present loss value :
3.6/41 => 3.9/76 => 3.3/143 => 2.3/240 => 0.8/346 => 0.15/398 => 0.03/418

As we work with gradient ascent in OIP detection a lower loss means a lower activation of the map.

If we change the wavelength of the initial input fluctuations we get a somewhat, though not fundamentally different result (actually with a lower loss value of 381):

This gives us confidence in the general usability of the method. However, I would like to point out that during your own experiments you may also experience the contrary:

For some maps and for some initial statistical input data varying at short length scales, only, the optimization process will not converge. It will not even start to do so. Instead you may experience a zero activation of the selected map during all steps of the optimization for a given random input.

You should not be too surprised by this fact. Our CNN was optimized to react to patterns present in written digits. As digits have specific elements (features?) as straight lines, bows, circles, line-crossings, etc., we should expect that not all input will trigger the activation of a selected map which reacts on pixel value variations at relatively large length scales. Therefore, it is helpful to be able to vary the statistical input pattern at different length scales when you start your hunt for a nice visualization of an OIP and/or elementary feature.

All in all we cannot exclude a dependency on the statistical initial input image fluctuations. Our algorithm will find a maximum with respect to the specific input data fluctuations presented to him. Due to this point we should always be aware of the fact that the visualizations produced by our algorithm will probably mark a local maximum in the multidimensional parameter or representation space - not a global one. But also a local maximum may reveal significant sub-structures a specific filter combination is sensitive to.

Libraries

To build a suitable code we need to import some libraries, which you first must install into your virtual Python environment:

  
import numpy as np
import scipy
import time 
import sys 
import math

from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow import keras as K
from tensorflow.python.keras import backend as B 
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras import regularizers
from tensorflow.keras import optimizers
from tensorflow.keras.optimizers import schedules
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import mnist

from tensorflow.python.client import device_lib

import matplotlib as mpl
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap
import matplotlib.patches as mpat 

import os 
from os import path as path

 

Basic code elements to construct OIP patterns from statistical input image data

To develop some code for OIP visualizations we follow the outline of steps discussed in the last article. To encapsulate the required functionality we set up a Python class. The following code fragment gives an overview about some variables which we are going to use. In the comments and explanations I sometimes used the word "reference" to mark a difference between

  • addresses to some intermediate Tensorflow 2 [TF2] objects handled by a "watch()"-method of TF2's GradientTape-object for eager calculation of gradients
  • and eventual return objects (of a function based on keras' backend) filled with values which we can directly use during the optimization iterations for corrections.

This is only a reminder of complex internal TF2 background operations based on complex layer models; I do not want to stress any difference in the sense of pointers and objects. Actually, I do not even know the precise programming patterns used behind TF2's watch()-mechanism; but according to the documentation it basically records all operations involving the "watched" objects. The objective is the ability to produce gradient values of defined functions with respect to any variable changes instantaneously in eager execution later on.

 
# class to produce images of OIPs for a chosen CNN-map
# *******************************************************
class My_OIP:
    '''
    Version 0.2, 01.09.2020
    ~~~~~~~~~~~~~~~~~~~~~~~~~
    This class allows for the creation and the display of OIP-patterns 
    to which a selected map of a CNN-model reacts   
    
    Functions:
    ~~~~~~~~~~
    1) _load_cnn_model()             => load cnn-model
    2) _build_oip_model()            => build an oip-model to create OIP-images
    3) _setup_gradient_tape()        => Implements TF2 GradientTape to watch input data
                                        for eager gradient calculation
    4) _oip_strat_0_optimization_loop():
                                     => Method implementing a simple strategy to create OIP-images 
                                        based on superposition of random data on large distance scales
    5) _oip_strat_1_optimization_loop():
       (NOT YET DEVELOPED)           => Method implementing a complex strategy to create OIP-images 
                                        based on partially evolved intermediate image 
                                        getting enriched by small scale fluctuations
    6) _derive_OIP():                => Method used externally to start the creation of 
                                        an OIP for a chosen map 
    7) _build_initial_img_data()     => Method to construct an initial image based on 
                                        a superposition by random date on 4 different length scales 
    
    
    Requirements / Preconditions:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In the present version 
    * a CNN-model is assumed which works with standardized (!) input images,
    * a CNN-Modell trained on MNIST data is assumed ,
    * exactly 4 length scales for random data fluctations are used
      to compose initial statistical image data 
      (roughly with a factor of 2 between them) 
      
    
    '''
    def __init__(self, cnn_model_file = 'cnn_best.h5', 
                 layer_name = 'Conv2D_3', map_index = 0, 
                 img_dim = 28, 
                 b_build_oip_model = True  
                ): 
        '''
        Input: 
        ~~~~~~
            cnn_model_file:     Name of a file containing  a full CNN-model
                                can later be overwritten by _load_cnn_model()
            layer_name:         We can define a layer name we are interested in already when starting; s. below
            map_index:          We may define a map we are interested in already when starting; s. below
            img_dim:            The dimension of the assumed quadratic images 
        
        Major internal variables:
        **************************
            _cnn_model:             A reference to the CNN model object
            _layer_name:            The name of convolutional layer 
                                    (can be overwritten by method _build_oip_model() ) 
            _map_index:             index of the map in the layer's output array 
                                    (can later be overwritten by other methods) 
            _r_cnn_inputs:          A reference to the input tensor of the CNN model (here: 1 image - NOT a batch of images)
            _layer_output:          Tensor with all maps of a certain layer
           
            _oip_submodel:          A new model connecting the input of the cnn-model with a certain map
            
            _tape:                  An instance of TF2's GradientTape-object 
                                    Watches input, output, loss of a model 
                                    and calculates gradients in TF2 eager mode 
            _r_oip_outputs:         A reference to the output of the new OIP-model 
            _r_oip_grads:           Reference to gradient tensors for the new OIP-model 
            _r_oip_loss:            Reference to a loss defined by operations on the OIP-output  
            _val_oip_loss:          Reference to a loss defined by operations on the OIP-output  
            
            _iterate:               Keras backend function to invoke the new OIP-model 
                                    and calculate both loss and gradient values (in TF2 eager mode) 
                                    This is the function to be used in the optimization loop for OIPs
            
            Parameters controlling the optimization loop:
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            _oip_strategy:          0, 1 - There are two strategies to evolve OIP patterns out of statistical data - only the first strategy is supported in this version  
                                    0: Simple superposition of fluctuations at different length scales
                                    1: Evolution over partially evolved images based on longer scale variations 
                                       enriched with fluctuations on shorter length scales 
                                    Both strategies can be combined with a precursor calculation 
            
            
            _b_oip_precursor:       True/False - Use a precursor analysis of long range variations 
                                    regarding loss => search for optimum variations for a given map
                                    (Some initial input images do not trigger a map at all or 
                                    sub-optimally => We test out multiple initial fluctuation patterns). 
            
            _ay_epochs:             A list of 4 optimization epochs to be used whilst 
                                    evolving the img data via strategy 1 and intermediate images 
            _n_epochs:              Number of optimization epochs to be used with strategy 0 
            
            _n_steps:               Defines at how many intermediate points we show images and report 
                                    on the optimization process 
            
            _epsilon:               Factor to control the amount of correction imposed by 
                                    the gradient values of the OIP-model 
            
            Input image data of the OIP-model and references to it 
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            _initial_precursor_img  The initial image to start a precursor optimization with.
                                    Would normally be an image of only long range fluctuations. 
            _precursor_image:       The evolved image updated during the precursor loop 
            
            _initial_inp_img_data:  A tensor representing the data of the input image 
            _inp_img_data:          A tensor representing the data of the input img during optimization  
            _img_dim:               We assume quadratic images to work with 
                                    with dimension _img_dim along each axis
                                    For the time being we only support MNIST images 
            
            Parameters controlling the composition of random initial image data 
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            _li_dim_steps:          A list of the intermediate dimensions for random data;
                                    these data are smoothly scaled to the image dimensions 
            _ay_facts:              A Numpy array of 4 factors to control the amount of 
                                    contribution of the statistical variations 
                                    on the 4 length scales to the initial image
                                   
        '''    
        
        # Input data and variable initializations
        # ****************************************
        
        # the model 
        self._cnn_model_file = cnn_model_file
        self._cnn_model      = None 
        
        # the layer 
        self._layer_name = layer_name
        # the map 
        self._map_index  = map_index
        
        # References to objects and the OIP sub-model
        # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        self._r_cnn_inputs  = None # reference to input of the CNN_model
                                   # also used in the OIP-model  
        
        self._layer_output  = None
        self._oip_submodel  = None
        
        self._tape          = None # TF2 GradientTape variable
        # some "references"
        self._r_oip_outputs = None # output of the OIP-submodel to be watched 
        self._r_oip_grads   = None # gradients determined by GradientTape   
        self._r_oip_loss    = None 
        # respective values
        self._val_oip_grads = None
        self._val_oip_loss  = None
        
        # The Keras function to produce concrete outputs of the new OIP-model  
        self._iterate       = None
        
        # The strategy to produce an OIP pattern out of statistical input images 
        # ~~~~~~~~~~~~~~~~~~~~~~~~~--------~~~~~~
        # 0: Simple superposition of fluctuations at different length scales 
        # 1: Move over 4 interediate images - partially optimized 
        self._oip_strategy = 0
        
        # Parameters controlling the OIP-optimization process 
        # ~~~~~~~~~~~~~~~~~~~~~~~~~--------~~~~~~
        # Use a precursor analysis ? 
        self._b_oip_precursor = False
       
        # number of epochs for optimization strategy 1
        self._ay_epochs    = np.array((20, 40, 80, 400))
        len_epochs         = len(self._ay_epochs)

        # number of epochs for optimization strategy 0
        self._n_epochs     = self._ay_epochs[len_epochs-1]   
        self._n_steps      = 7   # divides the number of n_epochs into n_steps 
                                 # to produce intermediate outputs
        
        # size of corrections by gradients
        self._epsilon       = 0.01 # step-size for gradient correction
        
        
        # Input images and references to it 
        # ~~~~~~~~
        # precursor image
        self._initial_precursor_img = None
        self._precursor_img         = None
        # The evetually used input image - a superposition of initial random fluctuations
        self._initial_inp_img_data  = None  # The initial data constructed 
        self._inp_img_data          = None  # The data used and varied for optimization 
        # image dimension
        self._img_dim               = img_dim   # = 28 => MNIST images for the time being 
        
        
        # Parameters controlling the setup of an initial image 
        # ~~~~~~~~~~~~~~~~~~~~~~~~~--------~~~~~~~~~~~~~~~~~~~
        # The length scales for initial input fluctuations
        self._li_dim_steps = ( (3, 3), (7,7), (14,14), (28,28) )
        # Parameters for fluctuations  - used both in strategy 0 and strategy 1  
        self._ay_facts     = np.array( (0.5, 0.5, 0.5, 0.5) )
        
        
        # ********************************************************
        # Model setup - load the cnn-model and build the oip-model
        # ************
        
        if path.isfile(self._cnn_model_file): 
            # We trigger the initial load of a model 
            self._load_cnn_model(file_of_cnn_model = self._cnn_model_file, b_print_cnn_model = True)
            # We trigger the build of a new sub-model based on the CNN model used for OIP search 
            self._build_oip_model(layer_name = self._layer_name, b_print_oip_model = True ) 
        else:
            print("<\nWarning: The standard file " + self._cnn_model_file + 
                  " for the cnn-model could not be found!\n " + 
                  " Please use method _load_cnn_model() to load a valid model")
            
        return
 

 
The purpose of most of the variables will become clearer as we walk though the class's methods below.

Loading the original trained CNN model

Let us say we have a trained CNN-model with all final weight parameters for node-connections saved in some h5-file (see the 4th article of this series for more info). We then can load the CNN-model and derive sub-models from its layer elements. The following method performs the loading task for us:

 
    #
    # Method to load a specific CNN model
    # **********************************
    def _load_cnn_model(self, file_of_cnn_model='cnn_best.h5', b_print_cnn_model=True ):
        
        # Check existence of the file
        if not path.isfile(self._cnn_model_file): 
            print("<\nWarning: The file " + file_of_cnn_model + 
                  " for the cnn-model could not be found!\n" + 
                  "Please change the parameter \"file_of_cnn_model\"" + 
                  " to load a valid model")

        # load the CNN model 
        self._cnn_model_file = file_of_cnn_model
        self._cnn_model = models.load_model(self._cnn_model_file)

        # Inform about the model and its file file 7
        # ~~~~~~~~~~~~~~~~~~~~~~~~~~~
        print("Used file to load a ´ model = ", self._cnn_model_file)
        # we print out the models structure
        if b_print_cnn_model:
            print("Structure of the loaded CNN-model:\n")
            self._cnn_model.summary()

        # handle/references to the models input => more precise the input image 
        # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        #    Note: As we only have one image instead of a batch 
        #    we pick only the first tensor element!
        #    The inputs will be needed for buildng the oip-model 
        self._r_cnn_inputs = self._cnn_model.inputs[0]  # !!! We have a btach with just ONE image 
        
        # print out the shape - it should be known fro the original cnn-model
        print("shape of cnn-model inputs = ", self._r_cnn_inputs.shape)
        
        return

 
Actually, I used this function already in the class' "__init__()"-method - provided the standard file for the last CNN exists. (In a more advanced version you would in addition check that the name of the standard CNN-model meets your expectations.)

The code is straightforward. You see that the structure of the original CNN-model is printed out if requested by the user.

Note also that we assigned the first element of the input tensor of the CNN-model, i.e. a single image tensor, to the variable "self._r_cnn_inputs". This tensor will become a major ingredient in a new Keras model which we are going to build in a minute and which we shall use to calculate gradient components of a loss function with respect to all pixel values of the input image. The gradient's component values will in turn be used during gradient ascent to correct the pixel values. Repeated corrections should lead to a systematic approach of a maximum of our loss function, which describes the map's activation. (Remember: Such a maximum may depend on the input image fluctuations).

Build a new Keras model based on the input tensor and a chosen layer

The next method is more interesting. We need to build a new Keras layer "model" based on the input layer and a deeper layer of the original CNN-model. (We already used the same kind of "trick" when we tried to visualize the activation output of a convolutional layer of the CNN.)

 
    #
    # Method to construct a model to optimize input for OIP-detection 
    # ***************************************
    def _build_oip_model(self, layer_name = 'Conv2D_3', b_print_oip_model=True ): 
        '''
        We need a Conv layer to build a working model for input image optimization. 
        We get the Conv layer by the layer's name. 
        The new model connects the first input element of the CNN to 
        the output maps of the named Conv layer CNN 
        '''
        # free some RAM - hopefully 
        del self._oip_submodel
        
        self._layer_name = layer_name
        if self._cnn_model == None: 
            print("Error: cnn_model not yet defined.")
            sys.exit()
        # We build a new model based ion the model inputs and the output 
        self._layer_output = self._cnn_model.get_layer(self._layer_name).output
        
        # We do not acre at the moment about the composition of the input 
        # We trust in that we handle only one image - and not a batch
        model_name = "mod_oip__" + layer_name 
        self._oip_submodel = models.Model( [self._r_cnn_inputs], [self._layer_output], name = model_name)                                    
        
        # We print out the oip model structure
        if b_print_oip_model:
            print("Structure of the constructed OIP-sub-model:\n")
            self._oip_submodel.summary()
        return

 
We use the tensor for a single input image and the output of layer (= a collection of maps) of the original CNN-model as the definition elements of the new Keras model.

TF2's GradientTape() - watch the change of variables which have an impact on model gradients and the loss function

What do we have so far? We have defined a new Keras model connecting input and output data of a layer of the original model. TF2 can determine related gradients by the node connections defined in the original model. However, we cannot rely on a graph analysis by Tensorflow as we were used to with TF1. TF2 uses eager mode - i.e. it calculates gradients directly. What does "directly" mean? Well - as soon as changes to variables occur which have an impact on the gradient values. This in turn means that "something" has to watch out for such changes. TF2 offers a special object for this purpose: tf.GradientTape. See:
https://www.tensorflow.org/guide/eager
https://www.tensorflow.org / api_docs/python/tf/GradientTape

So, as a next step, we set up a method to take care of "GradientTape()".

    #
    # Method to watch gradients 
    # *************************
    def _setup_gradient_tape(self):
        '''
        For TF2 eager execution we need to watch input changes and trigger gradient evaluation
        '''   
        # Working with TF2 GradientTape
        self._tape = None
        
        # Watch out for input, output variables with respect to gradient chnages
        with tf.GradientTape() as self._tape: 
            # Input
            # ~~~~~~~
            self._tape.watch(self._r_cnn_inputs)
            # Output 
            # ~~~~~~~
            self._r_oip_outputs = self._oip_submodel(self._r_cnn_inputs)
            
            # Loss 
            # ~~~~~~~
            self._r_oip_loss = tf.reduce_mean(self._r_oip_outputs[0, :, :, self._map_index])
            print(self._r_oip_loss)
            print("shape of oip_loss = ", self._r_oip_loss.shape)

 
Note that the loss definition included in the code fragment is specific for a chosen map. This implies that we have to call this method every time we chose a different map for which we want to create OIP visualizations.

The advantage of the above code element is that "_tape()" can produce gradient values for the relation of the input data and loss data of a model automatically whenever we change the input image data. Gradient values can be called by

  self._r_oip_grads  = self._tape.gradient(self._r_oip_loss, self._r_cnn_inputs)

Gradient ascent

As already discussed in my last article we apply a gradient ascent method to our "loss" function whose outcome rises with the activation of the neurons of a map. The following code sets up a method which first calls "_setup_gradient_tape()" and afterwards applies a normalization to the gradient values which "_tape()" produces. It then defines a convenience function and eventually calls a method which runs the optimization loop.

    #        
    # Method to derive OIP from a given initial input image
    # ********************
    def _derive_OIP(self, map_index = 1, n_epochs = None, n_steps = 4, 
                          epsilon = 0.01, 
                          conv_criterion = 5.e-4, 
                          b_stop_with_convergence = False ):
        '''
        V0.3, 31.08.2020
        This method starts the process of producing an OIP of statistical input image data
        
        Requirements:    An initial input image with statistical fluctuations of pixel values 
        -------------    must have been created. 
        
        Restrictions:    This version only supports the most simple strategy - "strategy 0":  
        -------------    Optimize in one loop - starting from a superposition of fluctuations
                         No precursor, no intermediate evolution of input image data 
        
        Input:
        ------
        map_index:       We can chose a map here       (overwrites previous settings)
        n_epochs:        Number of optimization steps  (overwrites previous settings) 
        n_steps:         Defines number of intervals (with length n_epochs/ n_steps) for reporting
                         This number also sets a requirement for providing n_step external axis-frames 
                         to display intermediate images of the emerging OIP  
                         => see _oip_strat_0_optimization_loop()
        epsilon:         Size for corrections by gradient values
        conv_criterion:  A small threshold number for (difference of loss-values / present loss value )
        b_stop_with_convergence: 
                         Boolean which decides whether we stop a loop if the conv-criterion is fulfilled
                         
        '''
        self._map_index = map_index
        self._n_epochs  = n_epochs   
        self._n_steps   = n_steps
        self._epsilon   = epsilon
        
        # number of epochs for optimization strategy 0 
        # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        if n_epochs == None:
            len_epochs = len(self._ay_epochs)
            self._n_epochs   = self._ay_epochs[len_epochs-1]
        else: 
            self._n_epochs = n_epochs
            
        # Rest some variables  
        self._val_oip_grads = None
        self._val_oip_loss  = None 
        self._iterate       = None 
        
        # Setup the TF2 GradientTape watch
        # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        self._setup_gradient_tape()
        print("GradientTape watch activated ")
    
        # Gradient handling - so far we only deal with addresses 
        # ~~~~~~~~~~~~~~~~~~
        self._r_oip_grads  = self._tape.gradient(self._r_oip_loss, self._r_cnn_inputs)
        #print("shape of grads = ", self._r_oip_grads.shape)
        
        # normalization of the gradient 
        self._r_oip_grads /= (B.sqrt(B.mean(B.square(self._r_oip_grads))) + 1.e-7)
        #grads = tf.image.per_image_standardization(grads)
        
        # define an abstract recallable Keras function 
        # producing loss and gradients for corrected img data 
        # the first list of addresses points to the input data, the last to the output data 
        self._iterate = B.function( [self._r_cnn_inputs], [self._r_oip_loss, self._r_oip_grads] )
        
        # get the initial image into a variable for optimization 
        self._inp_img_data = None
        self._inp_img_data = self._initial_inp_img_data
        
        # Optimization loop for strategy 0 
        # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        if self._oip_strategy == 0: 
            self._oip_strat_0_optimization_loop( conv_criterion = conv_criterion, 
                                                b_stop_with_convergence = b_stop_with_convergence )
    

 
Gradient value normalization is done here with respect to the L2-norm of the gradient. I.e., we adjust the length of the gradient to a unit length. Why does such a normalization help us with respect to convergence? You remember from a previous article series about MLPs that we had to care about a reasonable balance of an adaptive application of gradient values and a systematic reduction of the learning rate when approaching the global minimum of a cost function. The various available adaptive gradient methods care about speed and a proper deceleration of the steps by which we move across the cost hyperplane. When we approached a minimum we needed to care about overshooting. In our present gradient ascent case we have no such adaptive fine-tuning available. What we can do, however, is to control the length of the gradient vector such that changes remain of the same order as long as we do not change the step-size factor (epsilon). I.e.:

  • We do not wildly accelerate our path on the loss hyperplane when some local pixel areas want to drive us into a certain direction.
  • We reduce the chance of creating pixel values out of normal boundaries.

The last point fits well with a situation where the CNN has been trained on normalized or standardizes input images. Due to the normalization the size of pixel value corrections now depends significantly on the factor "epsilon". We should choose it small enough with respect to the pixel values.

Another interesting statement in the above code fragment is

        self._iterate = B.function( [self._r_cnn_inputs], [self._r_oip_loss, self._r_oip_grads] )

Here we use the Keras Backend to define a convenience function which relates input data with dependent outputs, whose calculations we previously defined by suitable statements. The list which is used as the first parameter of this function "_iterate()" defines the input variables, the list used as a second parameter defines the output variables which will internally be calculated via the GradientTape-functionality defined before. The "_iterate()"-function makes it much easier for us to build the optimization loop.

The optimization loop for the construction of images that visualize OIPs and features

The optimization loop must systematically correct the pixel values to approach a maximum of the loss function. The following method "_oip_strat_0_optimization_loop()" does this job for us. (The "0" in the method's name refers to a first simple approach.)

    #        
    # Method to optimize an emerging OIP out of statistical input image data 
    # (simple strategy - just optimize, no precursor, no intermediate pattern evolution 
    # ********************************
    def _oip_strat_0_optimization_loop(self, conv_criterion = 5.e-4, 
                                            b_stop_with_convergence = False ):
        
        '''
        V 0.2, 28.08.2020 
        
        Purpose: 
        This method controls the optimization loop for OIP reconstruction of an initial 
        input image with a statistical distribution of pixel values. 
        It also provides intermediate output in the form of printed data and images.
        
        Parameter: 
        conv_criterion:  A small threshold number for (difference of loss-values / present loss value )
        b_stop_with_convergence: 
                         Booelan which decides whether we stop a loop if the conv-criterion is fulfilled
        
        
        This method produces some intermediate output during the optimization loop in form of images.
        It uses an external grid of plot frames and their axes-objects. The addresses of the 
        axis-objects must provided by an external list "li_axa[]".  
        We need a seqence of >= n_steps axis-frames length(li_axa) >= n_steps).    
        With Jupyter the grid for plots can externally be provided by statements like 
        
        # figure
        # -----------
        #sizing
        fig_size = plt.rcParams["figure.figsize"]
        fig_size[0] = 16
        fig_size[1] = 8
        fig_a = plt.figure()
        axa_1 = fig_a.add_subplot(241)
        axa_2 = fig_a.add_subplot(242)
        axa_3 = fig_a.add_subplot(243)
        axa_4 = fig_a.add_subplot(244)
        axa_5 = fig_a.add_subplot(245)
        axa_6 = fig_a.add_subplot(246)
        axa_7 = fig_a.add_subplot(247)
        axa_8 = fig_a.add_subplot(248)
        li_axa = [axa_1, axa_2, axa_3, axa_4, axa_5, axa_6, axa_7, axa_8]
        
        '''
        
        # Check that we really have an input image tensor
        if ( (self._inp_img_data == None) or 
             (self._inp_img_data.shape[1] != self._img_dim) or 
             (self._inp_img_data.shape[2] != self._img_dim) ) :
            print("There is no initial input image or it does not fit dimension requirements (28,28)")
            sys.exit()
        
        # Print some information
        print("*************\nStart of optimization loop\n*************")
        print("Strategy: Simple initial mixture of long and short range variations")
        print("Number of epochs = ", self._n_epochs)
        print("Epsilon =  ", self._epsilon)
        print("*************")
        
        # some initial value
        loss_old = 0.0
       
        # Preparation of intermediate reporting / img printing
        # --------------------------------------
        # Number of intermediate reporting points during the loop 
        steps = math.ceil(self._n_epochs / self._n_steps )
        # Logarithmic spacing of steps (most things happen initially)
        n_el = math.floor(self._n_epochs / 2**(self._n_steps) ) 
        li_int = []
        for j in range(self._n_steps):
            li_int.append(n_el*2**j)
        print("li_int = ", li_int)
        # A counter for intermediate reporting  
        n_rep = 0
        # Array for intermediate image data 
        li_imgs = np.zeros((self._img_dim, self._img_dim, 1), dtype=np.float32)
        
        # Convergence? - A list for steps meeting the convergence criterion
        # ~~~~~~~~~~~~
        li_conv = []
        
        
        # optimization loop 
        # *******************
        # A counter for steps with zero loss and gradient values 
        n_zeros = 0
        
        for j in range(self._n_epochs):
            
            # Get output values of our Keras iteration function 
            # ~~~~~~~~~~~~~~~~~~~
            self._val_oip_loss, self._val_oip_grads = self._iterate([self._inp_img_data])
            
            # loss difference to last step - shuold steadily become smaller 
            loss_diff = self._val_oip_loss - loss_old 
            #print("loss_val = ", loss_val)
            #print("loss_diff = ", loss_diff)
            loss_old = self._val_oip_loss
            
            if j > 10 and (loss_diff/(self._val_oip_loss + 1.-7)) < conv_criterion:
                li_conv.append(j)
                lenc = len(li_conv)
                # print("conv - step = ", j)
                # stop only after the criterion has been met in 4 successive steps
                if b_stop_with_convergence and lenc > 5 and li_conv[-4] == j-4:
                    return
            
            grads_val     = self._val_oip_grads
            # the gradients average value 
            avg_grads_val = (tf.reduce_mean(grads_val)).numpy()
            
            # Check if our map reacts at all
            # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            if self._val_oip_loss == 0.0 and avg_grads_val == 0.0:
                print( "0-values, j= ", j, 
                       " loss = ", self._val_oip_loss, " avg_loss = ", avg_grads_val )
                n_zeros += 1
            
            if n_zeros > 10: 
                print("More than 10 times zero values - Try a different initial random distribution of pixel values")
                return
            
            # gradient ascent => Correction of the input image data 
            # ~~~~~~~~~~~~~~~
            self._inp_img_data += grads_val * self._epsilon
            
            # Standardize the corrected image - we won't get a convergence otherwise 
            # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            self._inp_img_data = tf.image.per_image_standardization(self._inp_img_data)
            
            # Some output at intermediate points 
            #     Note: We us logarithmic intervals because most changes 
            #           appear in the intial third of the loop's range  
            if (j == 0) or (j in li_int) or (j == self._n_epochs-1) :
                # print some info 
                print("\nstep " + str(j) + " finalized")
                #print("Shape of grads = ", grads_val.shape)
                print("present loss_val = ", self._val_oip_loss)
                print("loss_diff = ", loss_diff)
                # display the intermediate image data in an external grid 
                imgn = self._inp_img_data[0,:,:,0].numpy()
                #print("Shape of intermediate img = ", imgn.shape)
                li_axa[n_rep].imshow(imgn, cmap=plt.cm.viridis)
                # counter
                n_rep += 1
         
        return

 
The code is easy to understand: We use our convenience function "self._iterate()" to produce actual loss and gradient values within the loop. Then we change the pixel values of our input image and feed the changed image back into the loop. Note the positive sign of the correction! All complicated gradient calculations are automatically and "eagerly" done internally thanks to "GradientTape".

We said above that we limited the gradient values. How big is the size of the resulting corrections compared to the image data? If and when we use standardized image data and scale our gradient to unit length size then the relative size of the changes are of the order of the step size "epsilon" for the correction. In our case we set epsilon to 5.e-4.

The careful reader has noticed that I standardized the image data after the correction with the (normalized) gradient. Well, this step proved to be absolutely necessary to get convergence in the sense that we really approach an extremum of the cost function. Reasons are:

  • My CNN was trained on standardized MNIST input images.
  • We did not include any normalization layers into our CNN.
  • Without counter-measures our normalized gradient values would eventually drive unlimited activation values.

The last point deserves some explanation:
We used the ReLU-function as the activation function of nodes in the inner layers of our CNN. For positive input values ReLU actually is a linear function. Now let us assume that we have found a rather optimal input pattern which via a succession of basically linear operations drives an activation pattern of a map's neurons. What happens if we just add constant small values to each pixel per iteration step? The output values after the sequence of linear transformations will just grow! With our method and the ReLU activation function we walk around a surface until we reach a linear ramp and climb it up. Without compensatory steps we will not find a real maximum because there is none. The situation is very different at the convolutional layers than at the eventual output layer of the CNN's MLP-part.

You may ask yourself why we experienced nothing of this during the classification training? First answer: We did not optimize input data but weights during the training. Second answer: During training we did NOT maximize potentially unbound activation values but minimized a cost function based on output values of the last a MLP-layer. And these values were produced by a sigmoid function! The sigmoid function limits any input to the range ]0, +1[. In addition, the cost function (categorial_crossentropy) is designed to be convex for deviations of limited calculated values from a limited target vector.

The consequence is that we have to limit the values of the (corrected) input data and the related gradients in our present optimization procedure at the same time! This is done by the standardization of the image data. Remember that the correction values are around of the relative order of 5.e-4. In the end this is the order of the fluctuations which are unavoidable in the final OIP image; but now we have a chance to converge to a related small region around a real maximum.

The last block in the code deals with intermediate output - not only printed data on the loss function but also in form of intermediate images of the hopefully emerging pattern. These images can be provided in an external grid of figures in e.g. a Jupyter environment. The important point is that we define a suitable number of Matplotlib's axis-objects and deliver their addresses via an external array "li_axa[]". I am well aware of that the plotting solution coded here is a very basic one and it requires some ahead planning of the user - feel free to program it in a better way.

Initial input image data - with variations on different length scales

We lack just one further ingredient: We need a method to construct an input image with statistical data. I have discussed already that it may be helpful to vary data on different length scales. A very simple approach to tackle this problem manually is the following one:

We use squares - each with a different small and limited number of cells; e.g. a (4x4)-, a (7x7)-, a (14x14)- and (28x28)-square. Note that (28x28) is the original size of the MNIST images. For other samples we may need different sizes of such squares and more of them. We then fill the cells with random numbers between [-1, 1]. We smoothly scale the variations of the smaller squares up to the real input image size; here: (28x28). We can do this by applying a bicubic interpolation to the data. In the end we add up all random data and normalize or standardize the resulting distribution of pixel values. See the code below for details:

    # 
    # Method to build an initial image from a superposition of random data on different length scales 
    # ***********************************
    def _build_initial_img_data( self, 
                                 strategy = 0, 
                                 li_epochs    = (20, 50, 100, 400), 
                                 li_facts     = (0.5, 0.5, 0.5, 0.5),
                                 li_dim_steps = ( (3,3), (7,7), (14,14), (28,28) ), 
                                 b_smoothing = False):
        
        '''
        V0.2, 31.08.2020
        Purpose:
        ~~~~~~~~
        This method constructs an initial input image with a statistical distribution of pixel-values.
        We use 4 length scales to mix fluctuations with different "wave-length" by a simple  
        approach: 
        We fill four squares with a different number of cells below the number of pixels 
        in each dimension of the real input image; e.g. (4x4), (7x7, (14x14), (28,28) <= (28,28). 
        We fill the cells with random numbers in [-1.0, 1.]. We smootly scale the resulting pattern 
        up to (28,28) (or what ever the input image dimensions are) by bicubic interpolations 
        and eventually add up all values. As a final step we standardize the pixel value distribution.          
        
        Limitations
        ~~~~~~~~~~~
        This version works with 4 length scales. it only supports a simple strategy for 
        evolving OIP patterns. 
        '''

        self._oip_strategy = strategy
        self._ay_facts     = np.array(li_facts)
        self._ay_epochs    = np.array(li_epochs)
        
        self._li_dim_steps = li_dim_steps
        
        fluct_data = None

        
        # Strategy 0: Simple superposition of random patterns at 4 different wave-length
        # ~~~~~~~~~~
        if self._oip_strategy == 0:
            
            dim_1_1 = self._li_dim_steps[0][0] 
            dim_1_2 = self._li_dim_steps[0][1] 
            dim_2_1 = self._li_dim_steps[1][0] 
            dim_2_2 = self._li_dim_steps[1][1] 
            dim_3_1 = self._li_dim_steps[2][0] 
            dim_3_2 = self._li_dim_steps[2][1] 
            dim_4_1 = self._li_dim_steps[3][0] 
            dim_4_2 = self._li_dim_steps[3][1] 
            
            fact1 = self._ay_facts[0]
            fact2 = self._ay_facts[1]
            fact3 = self._ay_facts[2]
            fact4 = self._ay_facts[3]
            
            # print some parameter information
            # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            print("\nInitial image composition - strategy 0:\n Superposition of 4 different wavelength patterns")
            print("Parameters:\n", 
                 fact1, " => (" + str(dim_1_1) +", " + str(dim_1_2) + ") :: ", 
                 fact2, " => (" + str(dim_2_1) +", " + str(dim_2_2) + ") :: ", 
                 fact3, " => (" + str(dim_3_1) +", " + str(dim_3_2) + ") :: ", 
                 fact4, " => (" + str(dim_4_1) +", " + str(dim_4_2) + ")" 
                 )
            
            # fluctuations
            fluct1 =  2.0 * ( np.random.random((1, dim_1_1, dim_1_2, 1)) - 0.5 ) 
            fluct2 =  2.0 * ( np.random.random((1, dim_2_1, dim_2_2, 1)) - 0.5 ) 
            fluct3 =  2.0 * ( np.random.random((1, dim_3_1, dim_3_2, 1)) - 0.5 ) 
            fluct4 =  2.0 * ( np.random.random((1, dim_4_1, dim_4_2, 1)) - 0.5 ) 

            # Scaling with bicubic interpolation to the required image size
            fluct1_scale = tf.image.resize(fluct1, [28,28], method="bicubic", antialias=True)
            fluct2_scale = tf.image.resize(fluct2, [28,28], method="bicubic", antialias=True)
            fluct3_scale = tf.image.resize(fluct3, [28,28], method="bicubic", antialias=True)
            fluct4_scale = fluct4

            # superposition
            fluct_data = fact1*fluct1_scale + fact2*fluct2_scale + fact3*fluct3_scale + fact4*fluct4_scale
        
        
        # get the standardized plus smoothed and unsmoothed image 
        # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        #    TF2 provides a function performing standardization of image data function
        fluct_data_unsmoothed = tf.image.per_image_standardization(fluct_data) 
        fluct_data_smoothed   = tf.image.per_image_standardization(
                                    tf.image.resize( fluct_data, [28,28], 
                                                     method="bicubic", antialias=True) )

        if b_smoothing: 
            self._initial_inp_img_data = fluct_data_smoothed
        else:
            self._initial_inp_img_data = fluct_data_unsmoothed

        # There should be no difference 
        img_init_unsmoothed = fluct_data_unsmoothed[0,:,:,0].numpy()
        img_init_smoothed   = fluct_data_smoothed[0,:,:,0].numpy()
        
        ax1_1.imshow(img_init_unsmoothed, cmap=plt.cm.viridis)
        ax1_2.imshow(img_init_smoothed, cmap=plt.cm.viridis)

        print("Initial images plotted")
        
        return self._initial_inp_img_data    

 
The factors fact1, fact2 and fact3 determine the relative amplitudes of the fluctuations at the different length scales. Thus the user is e.g. able to suppress short-scale fluctuations completely.

I only took exactly four squares to simulate fluctuation on different length scales. A better code would make the number of squares and length scales parameterizable. Or it would work with a Fourier series right from the start. I was too lazy for such elaborated things. The plots again require the definition of some external Matplotlib figures with axis-objects. You can provide a suitable figure in a Jupyter cell.

Conclusion

In this article we have build a simple class to create OIPs for a specific CNN map out of an input image with a random distribution of pixel values. The algorithm should have made it clear that this a constructive work performed during iteration:

  • We start from the "detection" of slight traces of a pattern in the initial statistical pixel value distribution; the
    pattern actually is a statistical pixel correlation which triggers the chosen map,
  • then we amplify the recognized pattern elements
  • and suppress pixel values which are not relevant into a homogeneous background.

So the headline of this article is a bit misleading: We do not only "detect" a map related OIP; we also "create" it.

Our new Python class makes use of a given trained CNN-model and follows the outline of steps discussed in a previous article. The class has many limitations - in its present version it is only usable for MNIST images and the user has to know a lot about internals. However, I hope that my readers nevertheless have understood the basic ingredients. From there it is only a small step towards are more general and more capable version.

I have also underlined in this article that the images produced by the coded methods may only represent local maxima of the loss function for average map activation and idealized patterns composed of re-occuring elementary sub-structures. In the next article

A simple CNN for the MNIST dataset – IX – filter visualization at a convolutional layer

I am going to apply the code to most of the maps of the highest, i.e. inner-most convolutional layer of my trained CNN. We shall discover a whole zoo of simple and complex input image patterns. But we shall also confirm the suspicion that our optimization algorithm for finding an OIP for a specific map does not react to each and every kind of initial statistical pixel value fluctuation presented to the CNN.