Walking a directory tree with Python to eliminate unwanted files

Recently I got an a enormous amount of text and related image files of scanned texts (around 210.000). The whole dataset had a size around 25 GB. The text data should be analyzed with some machine learning [ML] algorithms. One of the first things to do in such a situation is to get rid of the jpg-files. Such files consume most of the disk space.

In may case I also got the data from a Mac machine. Hidden “._”-files were created on the Mac when the original data were downloaded from the Internet. These files control Mac security operations. I had to eliminate these files, too.

Due to the doubling of files and additional “.”-files the total number of files was around 830.000. The number of files really required was much smaller. To eliminate text files, which one does not need is an exercise which is often required in a Machine Learning context for text files.

In such a situation the function “os.walk()” in a Python environment.

os.walk

os.walk() allows us to walk recursively through a directory tree. We get a tupel back containing

  1. the path to the present directory,
  2. a list of all sub-directories,
  3. a list of all files (by their names) in the directory.

For typical applications this is enough information to perform analysis and file operations within the directories.

Application

In my case the usage was very simple. In a Jupyter cell the following code helped:

import os
import time

dir_path = "/py/projects/CA22/catch22/"

# use os.walk to recursively run through the directory tree
v_start_time = time.perf_counter()

for (dirname, subdirs, filesdir) in os.walk(dir_path): 
    print('[' + dirname + ']')
    for filename in filesdir:
        filepath = os.path.join(dirname, filename) 
        #print(filepath)
        if filename.endswith('.jpg') or filename.endswith('.db') or filename.endswith('-GS.txt'):
            os.remove(filepath) 
            
# extra loop as there are hidden "."-files also for ".jpg"-files      
for (dirname, subdirs, filesdir) in os.walk(dir_path): 
    print('[' + dirname + ']')
    for filename in filesdir:
        filepath = os.path.join(dirname, filename) 
        #print(filepath)
        if filename.startswith('._'):
            os.remove(filepath) 

v_end_time = time.perf_counter()
print("Total CPU time ", v_end_time - v_start_time)

If you do not print out file-paths this should be a matter of seconds only, on a SSD below a second.

Counting the remaining files

We can also use os.walk() to count the number of remaining files:

n = 0 
v_start_time = time.perf_counter()
for (dirname, subdirs, filesdir) in os.walk(dir_path): 
    print('[' + dirname + ']')
    n += len(filesdir)
v_end_time = time.perf_counter()
print('Number of files = ', n )
print("Total CPU time ", v_end_time - v_start_time)

On a Linux system you could also use

mytux:~ # find /py/projects/CA22/catch22/ -type f | wc -l
208123

Thus I could bring down the total size to 780 MB and the number of txt-files to be processed down to around 208.000.

Nupro X3000 RC – a solid high quality supplement to your Linux Audio

A friend asked me what sound equipment I use on my Linux machine. She wanted to to buy some new decent speakers. I had to make a similar decision a year ago. Coming to a conclusion back then became a more difficult process than I had expected.

I admit that I am a total amateur regarding sound equipment. I have not changed my sound cards (Asus Sonar D2X, Creative X-Fi Titanium, Onboard High Definition GM206) for a long, long time. And I do not hear as well as in my younger years. But during Corona and home office times I became really discontent with my old Creative speakers. One cannot all the time wear headphones. So some new speakers for my Linux workstation became a topic on my private agenda.

Questions ahead of a decision for some speakersfor your PC

When I seriously started thinking about some investment the following questions came up:

A surround system? Active or passive boxes? Suitable for a shelf or standing on the floor? Do you want to use the speakers later also in other contexts than just as a background equipment in your working room? What is appropriate for your room size? Connections cable (copper, optical?) based or WiFi or Bluetooth based? In my age when hearing capabilities are reduced: Will high end properties make a difference at all? And the most limiting factor: budget.

Taking all these factors into account will certainly lead to very personal decisions. So, when I make an explicit recommendation here – take it with caution and a grain of salt.

Guidelines to choosing speakers for a non-professional PC environment

Here are the personal guidelines which I followed – after I had read reviews, listened to Teufel and Edifier speakers at friends and listened to a relative expensive Logitech surround system at my nephew. You may have other references, other budgets and hear much better and more differentiated than I do. So relax if you come to other conclusions.

And do not forget: I am talking about sound equipment on a PC for background music enjoyment in a working room – not for professional objectives and High End specialists.

  • Recommendation 1: If you are interested in sound quality and are a music enthusiast – forget about surround systems. Quantity (many speakers) almost always enforces quality compromises, which you are going to hear in the end. Better invest your money into a 2.0 or 2.1 system which fits the (probably) limited size of your working room.
  • Recommendation 2: If your room size is up to 30 square meters, invest into relatively small speakers – but of studio quality. They will give you a much more pronounced and positioned sound than surround systems. Regarding money think of speakers which you later can supplement with a sub-woofer – e.g. in case you want to move the speakers to a larger room sometime in the future.
  • Recommendation 3: Regarding bass: I am a heavy metal friend – sometimes. I have my phases and periods regarding music … Sometimes I like Jazz, only. Bass in the named two cases has a different meaning to me – but in any case I do not like resonances of my speakers. The stereo speakers alone should already provide a solid, broad and resonance free bass fundament – without a sub-woofer. A sub-woofer can deliver an extra feeling in the case of metal – but for Jazz and classical music I would not consider a sub-woofer as really relevant. So go for some solid speakers with the option of adding a sub-woofer in the future.
  • Recommendation 4: Do not underestimate the effect (or limitations) of the DAC in your sound card! At a certain quality level of your future speakers you are probably going to hear differences. So – if you are lucky and can invest into expensive speakers rethink your sound card equipment, too.
  • Recommendation 5: Do not underestimate the effect of the boxes’ positions in the room. Also in small rooms you will experience bass line effects around 100 Hz or so if you place your boxes in the room’s corners. This leads to the point that you may want some equalizer option to optimize the bass base a bit. Well, Linux or at least most music applications for Linux supply you with equalizers; but it is a nice option to be able to do something at the (active) boxes themselves to get a basic “direction” into your sound environment. And here we would also like to have the option of defining some “presets”.
  • Recommendation 6: Active boxes or amplifier? A very difficult question! In a PC and mobile environment I would tend to active speakers, but … The amplifier technique today is so good that at least in my case my hearing deficits are certainly more important.
  • Recommendation 7: Wifi? personally, I would say: Yes, you should have this option. But if so: Go for a 5 GHz band. And check whether your router offers you the option to define the precise band it should work on or whether the router automatically adapts the precise channel to avoid disturbances with other sources.
  • Personal opinion some people certainly would like to crucify me for: Teufel speakers seem to be a bit overestimated. Personally I do not think that the quality-price relation is convincing. After having heard to a standing speaker pair I think that the balance between bass and mid-range frequency sound is strange. Very vague in a way.

Nupro X3000 speakers as a solid option for a reasonable price

Taking all these aspects into account I ended up with a decision for (active) Nupro X3000 RC speakers from the producer “Nubert electronic GmbH“.

So far, I have not regretted this decision for a second. These boxes did not disappoint me – neither with Classical music, Jazz nor Heavy Metal.

Though admittedly, if you want to feel bass and drumming these boxes improve their performance in larger rooms certainly a bit when combined with a sub-woofer (which I personally use at a second sound card). But this happens at rare occasions …

Ease of setup?

The setup of the active boxes is very simple; the explanations on the accompanying leaflets are fully sufficient. You define everything by a 4 direction control button on one of the speakers. The button and a small display are hidden behind magnetically attached front panels.

Basically, you just have to define a master and a slave speaker in the first setup round and choose a connection to your sound source – here to the output connectors of a PC soundcard. In the end I used the “aux” entry and still live with an analog cable based connection between the sound card and the main box plus a digital coax cable between the boxes. (Due to the speakers’ distance I had to buy an additional coax cable. It disappears behind a shelf).

But a WiFi connection between the speakers works very well, too. I could see no major conflict with the 5 GHz channels occupied by the WLAN routers in my surroundings.

The basic connection options to your PC and sound card are manifold: The USB-interface of the Nupro sound processor appears as an USB sound card on your PC; this
“sound card” is well supported on my Opensuse and KDE based Linux systems. You just have to chose the SPDIF stereo variant of the two options offered in the KDE/Phonon sound settings.

Besides an USB cable the connection cables delivered with the speakers include an optical cable with TOSLink adapters, a SPDIF cable and analog cables with cinch connectors. And eventually there also is the option of a Bluetooth connection – if your PC has such a device.

In the end I personally heard no major difference between analog and digital signal handling. Neither with USB nor the optical connection to my old ASUS Xonar D2X sound card or the optical connection to the X-FI Titanium nor the onboard GM206 High Definition soundcard. The TI-Burr-Brown DAC of the Asus card still seems to be relatively good – at least for my ears.

I also have an additional X-FI Titanium card from Creative in my PC. I like the sound of the Asus card better with my Sennheiser headphones. Regarding the Nupro X3000 I was actually in doubt: For some music I find the sound slightly crispier with the X-Fi. However, whether this is a sign of quality is questionable. I change the sound card from time to time, just for fun – and still have no real preference.

Regarding distances the analog cable option for the connection to your PC’s sound card may be the most reasonable solution – as the optical, SPDIF coax and USB cables coming with the speakers are of limited length.

There is even a possibility to realize a pure Wifi connection from your PC to the X3000 RC speakers. Such a solution, however, requires a special transceiver (135 €) from the producer Nubert; see below. I have no tested this type of connection, yet.

They speakers offer you some basic options regarding the sound balance. A very positive feature is the integrated 5 band equalizer. As said above this allows for a basic adjustment of the sound signature. Not unimportant in my age. In addition the handheld remote control device allows for a change of the relative basic balance between bass and treble.

You can also define a lower cut-off frequency for the bass and the transition frequency to a sub-woofer. Furthermore you can set 6dB a gain of certain analog input channels.

Disappointments ?

Something which disappointed me was the Bluetooth connection of the X3000 RC to my old Samsung smartphone – here I got periodic dropouts. I have not clarified this problem up to now. I do not exclude problems with the Bluetooth and the VLC player on my phone. In reviews I have not read about any such dropouts – but you have been warned. I recently tried a Bluetooth connection from my laptop, too. This one worked flawless. So, I do not know …

Another major disappointment was and is Nubert’s “X-Remote App”. In my case it simply does not work on my Android 6 device. It gets stopped by Android just after granting permission to determine the geo-location. Which by the way is something I do not like in general. I got in contact with the Nubert company recently. They affirmed that they do not collect data, but that it is Google which enforces the explicit accept for geo-location when building up Wifi connections. Had to be expected, we know this stupid problem already from the mess with the German Corona App on Android. BBG again – Big Brother Google … No further comments required.

I had no real need for the App so far. After the basic setup of all the speaker’s internal settings (e.g. the equalizer) I can control the most needed adjustments via the handheld remote control accompanying the speakers. The “room calibration” feature of the App would have been nice – but it requires buying an additional piece of microphone equipment from Nubert for Android smartphones.

Sound quality

Do not expect a solid sound quality review from me. I have neither equipment nor objective, trained ears for such a review. I can only describe an impression – very much in analogy to wine – a sort of personal sound “taste and feeling”
after having heard a lot of music on the speakers. Do I like them with different kinds of music, vocals and instruments?

In a nightlong session I have also compared the Nupro X3000 capabilities with my old Elac 4π (4 Pi) speakers in the living room. They are controlled by NAD pre- and end-amplifiers plus a NAD CD player. I did the comparison with music pieces of very different styles. I really was astonished how good the the small Nupro 3000x speakers could follow the 4π (4 Pi) Elac speakers and fill the room with sound and a solid bass base! Well, of course the Elacs do a better job with the bass at some point, but no wonder regarding their dimensions. Still, this first impression of the Nupro speakers was very convincing.

Then I moved the Elacs and Nupros boxes into my smaller working room – well, the Nupro X3000 at once felt much more adequate. They positioned different sound origins in the stereo sound cloud much more precisely – which is no wonder either. And they filled the whole room with music easily.

A hint: As the speakers work with a bass reflex opening at their backside you should not position the boxes directly at at wall – but leave some space.

Meanwhile, I have listened to a broad spectrum of music on these speakers – ranging from Eberhard Weber, Jan Gabarek, Kjetil Bjørnstad (with an without vocals), Laurie Anderson to compositions of Steve Reich, Rihm, Arvo Pärt and to recent recordings of classical music as of the Danish String Quartet or Sol Gabetta. Intermixed with stuff from Riverside, Korn, Linkin Park, Amorphis, Insomnium, Dark Tranquility, In Flames and Rammstein. As well as a lot of classical symphony and opera recordings. And – as a very welcome side effect – I have re-detected the wonders in the songs of Tom Waits.

You know what: All of it was pure joy – taking into account the sometimes strange intentional distorted mix you find in some heavy metal pieces.

In my opinion the balance between bass, mid-range and treble of the X3000 RC speakers is very good. You (almost) never loose the resolution of instruments covering different frequency regions. Some critics in the audio press was directed to problems in the mid-range frequency area. Personally, I cannot confirm this. If there is some problem, I would bet it appears in larger rooms. But this is not the target environment of these speakers. In my working room the mid range appears very present – both with vocals and classical instruments. But, probably I do not know what high end sound really is … 🙂

I could not hear any bass resonances so far – with standard settings. But when you place the speakers close to a wall or corner you may want to reduce the low bass (< 100 Hz) a bit.

Summary: I very seldom use my Sennheiser headphones these days. I really do like the sound of these speakers.

Are there weaknesses? Well, the X3000 speakers have a little weakness at very low volume in my opinion – the relative weight of mid-range vs. bass changes to bass. May have to do with reflections in the room (or my hearing). But the advantage is that I have so far not felt any need for setting the loudness option to on.

Future options?

Now, I come to a point which makes the Nupro boxes also an investment into some future wireless audio infrastructure: For 135€ you get the NuConnect trX Wireless transceiver (https://www.nubert.de/nuconnect-trx/p4210/). This little brick allows eg. for multi-room wireless solutions, but also for a transmission of digital signals from your PC or other sources to the active speakers.

Alternatively, you could also think about a combination of the trX Transceiver with the “NuControl 2 pre-amplifier” or (a cheaper) AmpX amplifier – both interesting products of Nubert. The latter amplifier uses in my understanding the same amplifying bricks as the active speakers, but now combined and supplemented with other electronics and thus turned into a full amplifier. The critics of this 700 € amplifier
are surprisingly good (see: https://www.nubert.de/nuconnect-ampx/p3646/?category=225).

So, the speakers mark an entrance into a much broader eco-system. In my case a completely digitized audio center on a Linux workstation combined with the trX transceiver, the X3000 speakers, the AmpX and other already existing audio equipment in different rooms appears on the horizon.

Sound support on my Linux system

Working with two soundcards
As I have two sound cards available I kept the three front speakers and the subwoofer box of my old Creative speaker set. The front speakers are placed on my working table – the subwoofer on the floor. This allows for astonishing surround feelings even with stereo sound. A little contribution of these desktop speakers to the louder sound coming from the X3000 in the background and you “swim in an extended audio space”. Interesting for some kinds of music. Here the Pulseaudio mixer (pavucontol) on a Linux system is of advantage to balance sound contributions between the different channels of the active sound cards accurately and al gusto.

Regarding the Linux sound support in general
As a Linux user I have made my peace with Pulseaudio, pavucontrol, the Ladspa equalizer and KDE’s Phonon over the years. It is sometimes still a mess to reproduce working settings for multiple multi-channel sound cards after system upgrades – but once PA and Phonon do work as expected, they do their work well.

The last time when strange things happened was when I upgraded to Opensuse Leap 15.2. Reason: Substantial changes to the Phonon user interface combined with a loss of differentiated setting options. As a result I had to manipulate the directives in the PA configuration files locally in my home directory and below /etc/pulse to get everything right again. The loss or hiding of options is a sickness that has spread itself over central KDE applications during the last years …. I always make a backup of my personal PA settings in my home directory and central Alsa and PA settings, now.

A major topic always is to find working settings which direct all sound output of any application through the Ladspa equalizer and then its output to multiple sound cards. On a KDE desktop such settings have to be consistent with Phonon settings – or the system will forget and overwrite your preferences with the next system start. Then you know that you have to manually change entries in the configuration files …

Be careful with your new speakers when experimenting and switching to new sound configurations – e.g. from analog to digital signals or changes of the the sound card or moving from PA to pure Alsa. The resulting sound and, in some cases, also distortions may be louder than you expect! Always turn the volume of your external speakers to a minimum ahead of such experiments – and also reduce the volume of sound sources to a very low level.

During the last three to four years I have used the PA mixer “pavucontrol” to control the relative volumes of sound sources (i.e. applications) and the audio channels of the different sound cards on my system. But be careful with your settings here, too. In the past Pulseaudio did some strange things with audio signals from the system – e.g. turning them suddenly to 100%. I have not experienced such things in the past 3 years, but Nupro X boxes are too expensive to risk any accidental damage.

The 15-band PA Ladspa equalizer helps to define some basic sound presets with very slight adjustments – the Nupro speakers basically do not need any significant changes from a flat frequency curve of the equalizer.

Note that changes of the equalizer’s settings may be accompanied by a general volume reduction on pavucontrol and a loss of relative channel weights there. Saving (and loosing) presets of the equalizer is no fun either. Some mess will probably always remain with PA … You just need to invest some time into balanced
presets – and then do not touch the central equalizer again.

The good thing is that you can change the direction of the output of applications to a sound sink directly with pavucontrol. So, you can configure the sound output of music applications to run through an equalizer or not. Again – be careful with the impact of such changes on the volume.

My favorite player still is Clementine at 48.000 or 96.000 Hz sampling rate. It offers its own equalizer. If you want to fiddle with an equalizer than use this one.

Sound extraction from CD recordings I do with K3B to “lossless” Ogg Vorbis or Flac encoding.

Conclusion

The active Nupro X 3000 RC speakers are worth the money you have to pay for them. They suit any Linux workstation well. The connection options to sound sources are manifold. Basic analog cable connections work, of course. An USB connection was directly supported on my Opensuse Linux. Optical and SPDIF coax connections to respective output connectors of sound cards work well, too. The possibility to create a full Wifi based solution with some extra (135€) equipment from Nubert is an additional goody.

The setup and the configuration of a speaker pair were very simple. You get an included 5 band equalizer in each speaker, which allows for basic room and position adjustment.

The general sound quality is in my opinion and for my ears excellent. The speakers easily fill small and even rooms up to 40 square meters with sound and provide a solid bass. The balance between bass, mid range and treble fits my ears. Single instruments in complicated arrangements are well distinguished. The positioning of sources in the stereo range is very good.

Links

https://www.igorslab.de/en/welcher-passt-besser-nubert-nupro-x-3000-rc-oder-nupro-x-4000-rc-und-die-qual-der-wahl-2/4/
https://www.lite-magazin.de/2018/11/aktivlautsprecher-nubert-nupro-x-3000-kompakte-komplettloesung-auf-audiophilem-niveau/
https://www.technic3d.com/ article/ audio/ lautsprecher/2087-test-aktive-kompaktbox-nubert-nupro-x-3000-rc/1.htm