Google Colab, RAM, VRAM and GPU usage limits – I – no clear conditions over multiple sessions

I am a retired physicist with a hobby: Machine Learning [ML]. I travel sometimes. I would like to work with my ML programs even when I only have a laptop available, with inadequate hardware. One of my ex-colleagues recommended Google Colab as a solution for my problem. Well, I am no friend of the tech giants and for all they offer as “free” Cloud services you actually pay a lot by giving them your personal data in the first place. My general experience is also that you sooner or later have to pay for resources a serious project requires. I.e. when you want and need more than just a playground.

Nevertheless, I gave Colab a try some days ago. My first impression of the alternative “Paperspace” was unfortunately not a good one. “No free GPU resources” is not a good advertisement for a first time visitor. When I afterward tried Google’s Colab I directly got a Virtual Machine [VM] providing a Jupyter environment and an optional connection to a GPU with a reasonable amount of VRAM. So, is everything nice with Google Colab? My answer is: Not really.

Google’s free Colab VMs have hard limits regarding RAM and VRAM. In addition there are unclear limits regarding CPU/GPU usage over multiple sessions in an unknown period of days. In this post series I first discuss some of these limits. In a second post I describe a few general measures on the coding side of ML projects which may help to make your ML project compatible with RAM and VRAM limitations.

The 12.7 GB RAM limit for the RAM of free Colab VMs

Even for mid-size datasets you soon feel the 12.7 GB limit on RAM as a serious obstacle. Some RAM (around 0.9 to 1.4 GB) is already consumed by the VM for general purposes. So, we are left with around 11 GB. My opinion: This is not enough for mid-size projects with either big amounts of text or hundreds of thousands of images – or both.

When I read about Colab I found articles on the Internet saying that 25 GB RAM was freely available. The trick was to drive the VM into a crash by an allocation of too much RAM. Afterward Google would generously offer you more RAM. Really? Nope! This does not work any more since July 2020. Read through the discussion here:

Google instead wants you to pay for Google Pro. But as reports on the Internet will tell you: You still get only 25 GB RAM with Pro. So as soon as you want to do some serious work with Colab you are supposed to pay – a lot for Colab Pro+. This is what many professional people will do – as it often takes more time to rework the code than just paying a limited amount per month. I shall go a different way ..

Why is RAM consumption not always negative?

I admit: When I work with ML experiments on my private PCs, RAM seldom is a resource I think about much. I have enough RAM (128 GB) on one of my Linux machines for most of the things I am interested in. So, when I started with Colab I naively copied and ran cells from one of my existing Jupyter notebooks without much consideration. And pretty soon I crashed the VMs due to an exhaustion of RAM.

Well, normally we do not use RAM to a maximum for fun or to irritate Google. The basic idea of having the objects of a ML dataset in a Numpy array or tensor in RAM is a fast transfer of batch junks to and from the GPU – you do not want to have a disk involved when you do the real number-crunching. Especially not for training runs of a neural network. But the limit on Colab VMs make a different and more time consuming strategy obligatory. I discuss elements of such a strategy in the next post.

15 GB of GPU VRAM

The GPU offer is OK from my perspective. The GPU is not the fastest available. However, 15 GB is something you can do a lot with. Still there are data sets, for which you may have to implement a batch based data-flow to the GPU via a Keras/TF2 generator. I discuss also this approach in more detail in the next post.

Sometimes: No access to a GPU or TPU

Whilst preparing this article I was “punished” by Google for my Colab usage during the last 3 days. My test notebook was not allowed to connect to a GPU any more – instead I was asked to pay for Colab Pro. Actually, this happened after some successful measures to keep RAM and VRAM consumption rather low during some “longer” test runs the day before. Two hours later – and after having worked on the VMs CPU only – I got access to a GPU again. By what criterion? Well, you have no control or a clear overview over usage limits and how close you have come to such a limit (see below). And uncontrollable phases during which Google may deny you access to a GPU or TPU are no conditions you want to see in a serious project.

No clear resource consumption status over multiple sessions and no overview over general limitations

Colab provides an overview over RAM, GPU VRAM and disk space consumption during a running session. That’s it.

On a web page about Colab resource limitations you find the following statement (05/04/2023): “Colab is able to provide resources free of charge in part by having dynamic usage limits that sometimes fluctuate, and by not providing guaranteed or unlimited resources. This means that overall usage limits as well as idle timeout periods, maximum VM lifetime, GPU types available, and other factors vary over time. Colab does not publish these limits, in part because they can (and sometimes do) vary quickly. You can relax Colab’s usage limits by purchasing one of our paid plans here. These plans have similar dynamics in that resource availability may change over time.”

In short: Colab users get no complete information and have no control about resource access – independent of whether they pay or not. Not good. And there are no price plans for students or elderly people. We understand: In the mindset of Google’s management serious ML is something for the rich.

The positive side of RAM limitations

Well, I am retired and have no time pressure in ML projects. For me the positive side of limited resources is that you really have to care about splitting project processes into cycles for scaleable batches of objects. In addition one must take care of Python’s garbage collection to free as much RAM as possible after each cycle. Which is a good side-effect of Colab as it teaches you to meet future resource limits on other systems.

My test case

As you see from other posts in this blog I presently work with (Variational) Autoencoders and study data distributions in latent spaces. One of my favorite datasets is CelebA. When I load all of my prepared 170,000 training images into a Numpy array on my Linux PC more than 20 GB RAM are used. (And I use already centered and cut images of a 96×96 pixel resolution). This will not work on Colab. Instead we have to work with much smaller batches of images and work consecutively. From my image arrays I normally take slices and provide them to my GPU for training or prediction. The tool is a generator. This should work on Colab, too.

One of my neural layer models for experiments with CelebA is a standard Convolutional Autoencoder (with additional Batch Normalization layers). The model was set up with the help of Keras for Tensorflow 2.

First steps with Colab – and some hints

The first thing to learn with Colab is that you can attach your Google MyDrive (coming with a Google account) to the VM environment where you run your Jupyter notebooks. But you should not interactively work with data files and data sets on the mounted disk (on /content/MyDrive on the VM). The mount is done over a network and not via a local system bus. Actually transfers to MyDrive are pretty slow – actually slower than what I have experienced with sshfs-based mounts on other hosted servers. So: Copy singular files to and from MyDrive, but work with such files on some directory on the VM (e.g. under /home) afterward.

This means: The first thing you have to take care of in a Colab project is the coding of a preparation process which copies your ML datasets, your own modules for details of your (Keras) based ML model architecture, ML model weights and maybe latent space data from your MyDrive to the VM.

A second thing which you may have to do is to install some helpful Python modules which the standard Colab environment may not contain. One of these routines is the Nvidia smi version for Python. It took me a while to find out that the right smi-module for present Python 3 versions is “nvidia-ml-py3”. So the required Jupyter cell command is:

!pip install nvidia-ml-py3

Other modules (e.g. seaborne) work with their standard names.

Conclusion

Google Colab offers you a free Jupyter based ML environment. However, you have no guarantee that you always can access a GPU or a TPU. In general the usage conditions over multiple sessions are not clear. This alone, in my opinion, disqualifies the free Colab VMs as an environment for serious ML projects. But if you have no money for adequate machines it is at least good for development and limited tests. Or for learning purposes.

In addition the 12 GB limit on RAM usage is a problem when you deal with reasonably large data sets. This makes it necessary to split the work with such data sets into multiple steps based on batches. One also has to code such that Python’s garbage collection can work on small time periods. In the next post I present and discuss some simple measures to control the RAM and VRAM consumption. It was a bit surprising for me that one sometimes has to manually care about the Keras Backend status to keep the RAM consumption low.

Links

Tricks and tests
https://damor.dev/your-session-crashed-after-using-all-available-ram-google-colab/
https://github.com/ googlecolab/ colabtools/ issues/253
https:// www.analyticsvidhya.com/ blog/ 2021/05/10-colab-tips-and-hacks-for-efficient-use-of-it/

Alternatives to Google Colab
See a Youtube video of an Indian guy who calls himself “1littlecoder” and discusses three alternatives to Colab: https:// www.youtube.com/ watch?v=xfzayexeUss

Kaggle (which is also Google)
https://towardsdatascience.com/ kaggle-vs-colab-faceoff-which-free-gpu-provider-is-tops-d4f0cd625029

Criticism of Colab
https://analyticsindiamag.com/ explained-5-drawback-of-google-colab/
https://www.reddit.com/ r/ GoogleColab/ comments/ r7zq3r/ is_it_just_me_ or_has_google_colab_ suddenly_gotten/
https://www.reddit.com/ r/ GoogleColab/ comments/ lgz04a/ regarding_ usage_limits_ in_colab_ some_common_sense/
https://github.com/ googlecolab/ colabtools/ issues/1964
https://medium.com/ codex/ can-you-use-google-colab-free-version-for-professional-work-69b2ba4392d2

 

Autoencoders and latent space fragmentation – VII – face images from statistical z-points close to the latent space region of CelebA

I continue with my analysis of the z-point and latent vector distribution a trained Autoencoder creates in its latent space for CelebA images. These images show human faces. To make the Autoencoder produce new face images from statistically generated latent vectors is a problem. See some previous posts in this series for reasons.

Autoencoders and latent space fragmentation – I – Encoder, Decoder, latent space
Autoencoders and latent space fragmentation – II – number distributions of latent vector components
Autoencoders and latent space fragmentation – III – correlations of latent vector components
Autoencoders and latent space fragmentation – IV – CelebA and statistical vector distributions in the surroundings of the latent space origin
Autoencoders and latent space fragmentation – V – reconstruction of human face images from simple statistical z-point-distributions?

These problems are critical for a generative usage of standard Autoencoders. Generative tasks in Machine Learning very often depend on a clear and understandable structure of the latent space regions an Encoder/Decoder pair uses. In general we would like to create statistical latent vectors such that a reasonable object creation (here: image creation) is guaranteed. In the last post

Autoencoders and latent space fragmentation – VI – image creation from z-points along paths in selected coordinate planes of the latent space

we saw that we at least get some clear face features when we make use of some basic information about the shape and location of the z-point distribution for the images the AE was trained with. This distribution is specific for an Autoencoder, the image set used and details of the training run. In our case the z-point distribution could be analyzed by rather simple methods after the training of an AE with CelebA images had been concluded. The number distribution curves per vector component revealed value limits per latent vector component. The core of the z-point distribution itself appeared to occupy a single and rather compact sub-volume inside the latent space. (The exact properties depend on the AE’s layer structure and the training run.) Of the N=256 dimensions of our latent space only a few determined the off-origin position of the center of the z-point distribution’s core. This multidimensional core had an overall ellipsoidal shape. We could see this both from the Gaussian like number distributions for the components and more directly from projections onto 2-dimensional coordinate planes. (We will have a closer look at these properties which indicate a multivariate normal distribution in forthcoming posts.)

As long as we kept the statistical values for artificial latent vector components within the value ranges set by the distribution’s core our chances that the AE’s Decoder produced images with new and clearly visible faces rose significantly. So far we have only used z-points along defined paths crossing the distributions core. In this post I will vary the components of our statistically created latent vectors a bit more freely. This will again show us that correlations of the vector components are important.

Constant probability for each component value within a component specific interval

In the first posts of this series I naively created statistical latent vectors from a common value range for the components. We saw this was an inadequate approach – both for general mathematical and for problem specific reasons. The following code snippets shows an approach which takes into account value ranges coming from the Gaussian-like distributions for the individual components of the latent vectors for CelebA. The arrays “ay_mu_comp” and “ay_mu_hw” have the following meaning:

  • ay_mu_comp: Component values of a latent vector pointing to the center of the CelebA related z-point distribution
  • ay_mu_hw: Half-width of the Gaussian like number distribution for the component specific values
num_per_row  = 7
num_rows     = 3
num_examples = num_per_row * num_rows

fact = 1.0

# Get component specific value ranges into an array 
li_b = []
for j in range(0, z_dim):  
    add_val = fact*abs(ay_mu_hw[j])
    b_l = ay_mu_comp[j] - add_val
    b_r = ay_mu_comp[j] + add_val
    li_b.append((b_l, b_r))
    
# Statistical latent vectors
ay_stat_zpts = np.zeros( (num_examples, z_dim), dtype=np.float32 )     
for i in range(0, num_examples): 
    for j in range(0, z_dim):
        b_l = li_b[j][0]
        b_r = li_b[j][1]
        val_c = np.random.uniform(b_l, b_r) 
        ay_stat_zpts[i, j] = val_c

# Prediction 
reco_img_stat = AE.decoder.predict(ay_stat_zpts)
# print("Shape of reco_img = ", reco_img_stat.shape)

The main difference is that we take random values from real value intervals defined per component. Within each interval we assume a constant probability density. The factor “fact” controls the width of the value interval we use. A small value covers the vicinity of the center of the CelebA z-point distribution; a larger fact leads to values at the border region of the z-point distribution.

Image results for different value ranges

fact=0.4

fact=0.5

fact=0.6

fact=0.7

fact=0.8

fact=0.9

fact=1.0

Selected individuals

Below you find some individual images created for a variety of statistical vectors. They are ordered by a growing distance from the center of the CelebA related z-point distribution.

Quality? Missing correlations?

The first thing we see is that we get problems for all factors fact. Some images are OK, but others show disturbances and the contrasts of the face against the background are not well defined – even for small factors fact. The reason is that our random selection ignores correlations between the components completely. But we know already that there are major correlations between certain vector components.

For larger values of fact the risk to place a generated latent vector outside the core of the CelebA z-point distribution gets bigger. Still some images interesting face variations.

Obviously, we have no control over the transitions from face to hair and from hair to background. Our suspicion is that micro-correlations of the latent vector components for CelebA images may encode the respective information. To understand this aspect we would have to investigate the vicinity of a z-point a bit more in detail.

Conclusion

We are able to create images with new human faces by using statistical latent vectors whose component values fall into component specific defined real value intervals. We can derive the limits of these value ranges from the real z-point distribution for CelebA images of a trained AE. But again we saw:

One should not ignore major correlations between the component values.

We have to take better care of this point in a future post when we perform a transformation of the coordinate system to align with the main axes of the z-point distribution. But there is another aspect which is interesting, too:

Micro-correlations between latent vector components may determine the transition from faces to complex hair and background-patterns.

We can understand such component dependencies when we assume that the superposition especially of small scale patterns a convolutional Decoder must arrange during image creation is a subtle balancing act. A first step to understand such micro-correlations better could be to have a closer look at the nearest CelebA z-point neighbors of an artificially created latent z-point. If they form some kind of pattern, then maybe we can change the components of our z-point a bit in the right direction?

Or do we have to deal with correlations on a much coarser level? What do the Gaussians and the roughly elliptic form of the core of the z-point distribution for CelebA images really imply? This is the topic of the next post.

Autoencoders and latent space fragmentation – VIII – approximation of the latent vector distribution by a multivariate normal distribution and ellipses

 

Autoencoders and latent space fragmentation – VI – image creation from z-points along paths in selected coordinate planes of the latent space

It is well known that standard (convolutional) Autoencoders [AEs] cause problems when you want to use them for creative purposes. An example: Creating images with human faces by feeding the Decoder of a suitably trained AE with random latent vectors does not work well. In this series of posts I want to identify the cause of this specific problem. Another objective is to circumvent some of the related obstacles and create reasonably clear images nevertheless. Note that I speak about standard Autoencoders, not Variational Autoencoders or transformer based Encoder/Decoder-systems. For basic concepts, terms and methods see the previous posts:

Autoencoders and latent space fragmentation – I – Encoder, Decoder, latent space
Autoencoders and latent space fragmentation – II – number distributions of latent vector components
Autoencoders and latent space fragmentation – III – correlations of latent vector components
Autoencoders and latent space fragmentation – IV – CelebA and statistical vector distributions in the surroundings of the latent space origin
Autoencoders and latent space fragmentation – V – reconstruction of human face images from simple statistical z-point-distributions?

So far I have demonstrated that randomly generated vectors most often do not hit the relevant regions in the AE’s latent space – if we do not take some data specific precautions. A relevant region is a confined volume which a trained Decoder fills with z-points for its training objects after the training has been completed. z-points and corresponding latent vectors are the result of an encoding process which maps digitized input objects into the latent space. Depending on the data objects we may get multiple relevant regions or just one compact region. In the case of a convolutional AE which I had trained with the CelebA dataset of human face images I found single region with a rather compact core.

In this post I want to create statistical latent vectors whose end-points are located inside the relevant region for CelebA images. Then I will create images from such latent vectors with the help of the AE’s Decoder. My hope is to get at least some images with clearly visible human faces. The basic idea behind this experiment is that the most important features of human faces are encoded by a few dominant vector components defining the overall position and shape of the multidimensional z-point region for CelebA images. We will see that the theory is indeed valid: Here is a first example for a vector pointing to an outer area of the core region for CelebA images in the latent space:

Our AE is a convolutional one. The number of latent space dimensions N was chosen to be N=256.
Note: We are NOT using a Variational Autoencoder, but a simple standard Autoencoder. The AE’s properties were discussed in previous posts.

What have we found out so far?

The Encoder of the convolutional AE, which I had trained with the CelebA dataset, mapped the human face images into a compact region of the latent space. The core of the created z-point distribution was located within or very close to a tiny hyper-volume of the latent space spanned by only a few coordinate axes. The confined multi-dimensional volume occupied by most of the z-points had an overall ellipsoidal shape with major extensions along a few main axes. We saw that some of the coordinates of the CelebA z-points and the components of the corresponding latent vectors were strongly correlated. In addition the value range of each of the latent vector components had specific individual limits – confining the angles and lengths of the vectors for CelebA. Therefore we had to conclude:

Whenever we base our method to create statistical vectors on the assumptions

  • that one can treat the vector components as independent statistical variables
  • that one can assign statistical values to the components from a common real value interval

the vectors will almost certainly not point to the relevant region. In addition one has to take into account unexpected mathematical properties of statistical vector distributions in high dimensional spaces. See the previous posts for more details. Indeed we could show that such a vector generation method missed the CelebA region.

Objective of this post

In this post I want to use some of the knowledge which we have gathered about the latent vector distribution for CelebA images. We shall use a very simple approach to probe the image reconstruction abilities of the Decoder for a defined variety of z-points:

We restrict the vectors’ component values such that most of the vectors point to the region formed by the bulk of CelebA z-points. To achieve this we define straight line segments which cross the ellipsoidal region of CelebA z-points. This is possible due to the known value intervals which we have identified for each of the components in a previous post. Then we place some artificial z-points onto our line segments. At least some of these z-points will fall into the relevant CelebA region. We then let the Decoder reconstruct images for the latent vectors corresponding to these z-points.

In some cases our paths will even respect some major component correlations, but for some paths I will explicitly disregard such correlations. Nevertheless our rather simple restrictions imposed on the vector-component values will already enable us to produce images with clearly recognizable face features.

Among other things our results confirm the idea that the real pixel correlations for basic face features are represented by relatively narrow limits for the angles and lengths of respective latent vectors. The extension and shape of the bulk region of CelebA z-points is defined by only a few latent vector components. These components apparently encode a prescription for the (convolutional) Decoder to create face features by a superposition of some elementary patterns extracted during the AE’s training.

A path from the latent space origin to the center of the relevant z-point region

How do we restrict latent vectors to the required value ranges? In the 2nd post we have seen that the number distribution curve for the values of each of the latent vector components was very similar to a Gaussian. We have identified the mean value and average value range for each component by analyzing its specific distribution curve. The mean values gave us the coordinates of the center of the relevant latent space region. In addition we, of course, know the coordinates of the origin of the latent space. So, for a first test, let us create a multi-dimensional line segment between the origin and the center of the CelebA z-point distribution. And let the A’s Decoder create images for latent vectors pointing to some intermediate z-points along this path.

The following plots show orthogonal projections of 5000 CelebA z-points (in blue) onto some 2-dimensional planes spanned by two selected coordinate axes. The yellow dot indicates the origin. The orange dot the center of the z-point distribution. Red dots indicate coordinates of points along the straight path between the origin and the distribution center.

Please, take note of the different scales on the x- and y-axes. Some distributions are much more elongated than the scaled images show. That some paths appear shorter than others is due to the projection of the diagonal line through the multi-dimensional space onto planes which are differently oriented with respect to this line. A simple 3D analog should make this clear. Some small wiggles in the positions of the red dots are due to resolution problems of the plot on the browser interface. We also see a reflection of the fact that the origin is located in a border region of the bulk.

Below you see a plot which shows the path in higher resolution (projected onto a particular plane):

Again: Take note of the different axis scales. The blue dot distribution is much more stretched in C1-direction than it appears in the plot.

Ok, now we have a multidimensional path and six well defined latent vectors for the end and intermediate points on this path. So let us provide these vectors as input to the our AE’s Decoder. The resulting images look like:

Success! Images in the surroundings of the center show a clearly visible face. And we also see: The average face at the center of the z-point distribution is female – at least according to the CelebA dataset. 🙂 However: In the vicinity of the origin of the latent space we get no images with reasonable face features.

Images along a path within a selected coordinate plane for two dominant vector components

I choose a different path within the plane spanned by the coordinates axes 151 and 195 now. This is depicted in the plot below:

A look into the second post shows you that the components 151, 195 were members of the group of dominant components. Those were components for which the number distribution showed a mean value at some distance from the origin of the latent space and also had a half-width bigger than 1.0 (as most of the other components). The images reconstructed by the Decoder from the latent vectors are:

Hey, we get some variation – as expected. Now, let us rotate the path in the plane:

Not so much of a difference. But we have learned that a variation of some vector component values within the allowed range of values may give us already some major variation in the faces’ expressions.

Images for other coordinate planes

The following images show the variations for paths in other coordinate planes. All of the paths have in common that they pass the center of the CelebA bulk region. For the first 4 examples I have kept the path within the core region of CelebA z-points. The last images show images for paths with z-points at the core’s border regions or a bit outside of it.

Plane axes: 5, 8

Plane axes: 17, 180

Plane axes: 44, 111

Plane axes: 55, 56

Plane axes: 15, 242

Plane axes: 58 202

Plane axes: 68, 178

Plane axes: 177, 202

Plane axes: 180, 242

The images for z-points farther away from the bulk’s center give you more interesting variations. But obviously in the outer areas of the CelebA region correlations between the latent vector components get more important when we want to avoid irregular and unrealistic disturbances. All in all we also get the impression that a much more subtle correlation of component values is a key for the reproduction of realistic transitions for the hairdos presented in the CelebA images and the transition to some realistic background patterns. The components of our latent vectors are still too uncorrelated for such details and an appropriate superposition of micro-patterns in the images created by the Decoder.

Conclusion

This blog shows that we do not need a Variational Autoencoder to produce images with recognizable human faces from statistical latent vectors. We can get image reproductions with varying face features also from the Decoder of a standard convolutional Autoencoder. A basic requirement seems to be that we keep the vector components within reasonable value intervals. The valid component specific value ranges are defined by the shape of the compact hyper-volume, which an AE’s Encoder fills with z-points for its training objects. So we need to construct statistical latent vectors which point to this specific sub-region of the latent space. Vectors with arbitrary components will almost certainly miss this region and give no interpretable image content.

In this post we have looked at vectors defining z-points along specific line segments in the latent space. Some of the paths were explicitly kept within the inner core regions of the z-point-distribution for CelebA images. From these z-points the most important face features were clearly reconstructed. But we also saw that some micro-correlations of the latent vector components seem to control the appearance of the background and the transition from the face to hair and from the hair to the background-environment.

I have not yet looked at line segments which do not cross the center of the bulk of the z-point distribution for CelebA images in the latent space. But in the next post

Autoencoders and latent space fragmentation – VII – face images from statistical z-points close to the latent space region of CelebA

I first want to look at z-points for which we relatively freely vary the component values within ranges given by the respective number distributions.