Autoencoders and latent space fragmentation – VII – face images from statistical z-points within the latent space region of CelebA

I continue with my analysis of the z-point and latent vector distribution a trained Autoencoder creates in its latent space for CelebA images. These images show human faces. To make the Autoencoder produce new face images from statistically generated latent vectors is a problem. See some previous posts in this series for reasons.

Autoencoders and latent space fragmentation – I – Encoder, Decoder, latent space
Autoencoders and latent space fragmentation – II – number distributions of latent vector components
Autoencoders and latent space fragmentation – III – correlations of latent vector components
Autoencoders and latent space fragmentation – IV – CelebA and statistical vector distributions in the surroundings of the latent space origin
Autoencoders and latent space fragmentation – V – reconstruction of human face images from simple statistical z-point-distributions?

These problems are critical for a generative usage of standard Autoencoders. Generative tasks in Machine Learning very often depend on a clear and understandable structure of the latent space regions an Encoder/Decoder pair uses. In general we would like to create statistical latent vectors such that a reasonable object creation (here: image creation) is guaranteed. In the last post

Autoencoders and latent space fragmentation – VI – image creation from z-points along paths in selected coordinate planes of the latent space

we saw that we at least get some clear face features when we make use of some basic information about the shape and location of the z-point distribution for the images the AE was trained with. This distribution is specific for an Autoencoder, the image set used and details of the training run. In our case the z-point distribution could be analyzed by rather simple methods after the training of an AE with CelebA images had been concluded. The number distribution curves per vector component revealed value limits per latent vector component. The core of the z-point distribution itself appeared to occupy a single and rather compact sub-volume inside the latent space. (The exact properties depend on the AE’s layer structure and the training run.) Of the N=256 dimensions of our latent space only a few determined the off-origin position of the center of the z-point distribution’s core. This multidimensional core had an overall ellipsoidal shape. We could see this both from the Gaussian like number distributions for the components and more directly from projections onto 2-dimensional coordinate planes.

As long as we kept the statistical values for artificial latent vector components within the value ranges set by the distribution’s core our chances that the AE’s Decoder produced images with new and clearly visible faces rose significantly. So far we have only used z-points along defined paths crossing the distributions core. In this post I will vary the components of our statistically created latent vectors a bit more freely. This will again show us that component correlations are important.

Constant probability for each component value within a component specific interval

In the first posts of this series I naively created statistical latent vectors from a common value range for the components. We saw this was an inadequate approach – both for general mathematical and for problem specific reasons. The following code snippets shows an approach which takes into account value ranges coming from the Gaussian-like distributions for the individual components of the latent vectors for CelebA. The arrays “ay_mu_comp” and “ay_mu_hw” have the following meaning:

  • ay_mu_comp: Component values of a latent vector pointing to the center of the CelebA related z-point distribution
  • ay_mu_hw: Half-width of the Gaussian like number distribution for the component specific values
num_per_row  = 7
num_rows     = 3
num_examples = num_per_row * num_rows

fact = 1.0

# Get component specific value ranges into an array 
li_b = []
for j in range(0, z_dim):  
    add_val = fact*abs(ay_mu_hw[j])
    b_l = ay_mu_comp[j] - add_val
    b_r = ay_mu_comp[j] + add_val
    li_b.append((b_l, b_r))
    
# Statistical latent vectors
ay_stat_zpts = np.zeros( (num_examples, z_dim), dtype=np.float32 )     
for i in range(0, num_examples): 
    for j in range(0, z_dim):
        b_l = li_b[j][0]
        b_r = li_b[j][1]
        val_c = np.random.uniform(b_l, b_r) 
        ay_stat_zpts[i, j] = val_c

# Prediction 
reco_img_stat = AE.decoder.predict(ay_stat_zpts)
# print("Shape of reco_img = ", reco_img_stat.shape)

The main difference is that we take random values from real value intervals defined per component. Within each interval we assume a constant probability density. The factor “fact” controls the width of the value interval we use. A small value covers the vicinity of the center of the CelebA z-point distribution; a larger fact leads to values at the border region of the z-point distribution.

Image results for different value ranges

fact=0.4

fact=0.5

fact=0.6

fact=0.7

fact=0.8

fact=0.9

fact=1.0

Selected individuals

Below you find some individual images created for a variety of statistical vectors. They are ordered by a growing distance from the center of the CelebA related z-point distribution.

Quality? Missing correlations?

The first thing we see is that we get problems for all factors. Some images are OK, but others show disturbances and the contrasts of the face against the background are not well defined – even for small factors fact. The reason is that our random selection ignores correlations between the components completely. But we know already that there are major correlations between certain vector components.

For larger values of fact the risk to place a generated latent vector outside the core of the CelebA z-point distribution gets bigger. Still some images interesting face variations.

Obviously, we have no control over the transitions from face to hair and from hair to background. Our suspicion is that micro-correlations of the latent vector components for CelebA images may encode the respective information. To understand this aspect we would have to investigate the vicinity of a z-point a bit more in detail.

Conclusion

We are able to create images with new human faces by using statistical latent vectors whose component values fall into component specific defined real value intervals. We can derive the limits of these value ranges from the real z-point distribution for CelebA images of a trained AE. But again we saw:

One should not ignore major correlations between the component values.

We have to take better care of this point in a future post when we perform a transformation of the coordinate system to align with the main axes of the z-point distribution. But there is another aspect which is interesting, too:

Micro-correlations between latent vector components may determine the transition from faces to complex hair and background-patterns.

We can understand such component dependencies when we assume that the superposition especially of small scale patterns a convolutional Decoder must arrange during image creation is a subtle balancing act. A first step to understand such micro-correlations better could be to have a closer look at the nearest CelebA z-point neighbors of an artificially created latent z-point. If they form some kind of pattern, then maybe we can change the components of our z-point a bit in the right direction? This is the topic of the next post.

 

Autoencoders and latent space fragmentation – V – reconstruction of human face images from simple statistical z-point-distributions?

In this post series we have so far studied the distribution of latent vectors and z-points for CelebA images in the latent space of an Autoencoder [AE]. The CelebA images show human faces. We want to reconstruct images with new faces from artificially created, statistical latent vectors. Our latent space had a number dimensions N=256. For basics of Autoencoders, terms like latent space, z-points, latent vectors etc. see the first blog of this series:

Autoencoders and latent space fragmentation – I – Encoder, Decoder, latent space

During the experiments and analysis discussed in the other posts

Autoencoders and latent space fragmentation – II – number distributions of latent vector components
Autoencoders and latent space fragmentation – III – correlations of latent vector components
Autoencoders and latent space fragmentation – IV – CelebA and statistical vector distributions in the surroundings of the latent space origin

we have learned that the multi-dimensional latent space volume which the Encoder of a convolution AE fills densely with z-points for CelebA images has a special shape and location. The bulk of z-points is confined to a multi-dimensional ellipsoid which is relatively small. Its center has a position, which does not coincide with the latent space’s origin. The bulk center is located within or very close to a hyper-sub-volume spanned by only a few coordinate axes.

We also saw also that we have difficulties to hit this region by artificially created z-points via latent vectors. Especially, if the approach for statistical vector generation is a simple one. In the beginning we had naively assumed that we can treat the vector components as independent variables. We assigned each vector component values, which we took from a value-interval [-b, b] with a constant probability density. But we saw that such a simple and seemingly well founded method for statistical vector creation has a variety of disadvantages with respect to the bulk distribution of latent vectors for CelebA images. To name a few problems:

  • The components of the latent vectors for CelebA images are correlated and not independent. See the third post for details.
  • If we choose b too large (&gt. 2.0) then the length or radius values of the created vectors will not fit typical radii of the CelebA vectors. See the 2nd post for details.
  • The generated vectors only correspond to z-points which fill parts of a thin multi-dimensional spherical shell within the cube filled by points with coordinate values -b < x_j < +b. This is due to mathematical properties of the distribution in multi-dimensional spaces. See the second post for more details.
  • Optimal parameter values are 1.0 < b < 2.0, just to guarantee at least the right vector length. However, due to the fact that the origin of the latent space is located in a border region of the CelebA bulk z-point distribution, most points will end up outside the bulk volume.

In this post I will show you that our statistical vectors and the related z-points do indeed not lead to a successful reconstruction of images with clearly visible human faces. Afterward I will briefly discuss whether we still can have some hope to create new human face images by a standard AE’s Decoder and statistical points in its latent space.

Relevant values of parameter b for the interval from which we choose statistical values for the vector components

What does the number distribution for the length of latent CelebA vectors look like? For case I explained in the 2nd post of this series we get:

For case II:

Now let me remind you of a comparison with the lengths of statistical vectors for different values of b:

See the 2nd post of this series for more details. Obviously, for our simple generation method used to create statistical vectors a parameter value b = 1.5 is optimal. b=1.0 an b=2.0 mark the edges of a reasonable range of b-values. Larger or smaller values will drive the vector lengths out of the required range.

Image reconstructions for statistical vectors with independent component values x_j in the range -1.5 < x_j < 1.5

I created multiples of 512 images by feeding 512 statistical vectors into the Decoder of our convolutional Autoencoder which was trained on CelebA images (see the 1st and the 2nd post for details). The vector components x_j fulfilled the condition :

-1.5 ≤   x_j   ≤ 1.5,    for all j in [0, 256]

The result came out to be frustrating every time. Only singular image showed like outlines of a human face. The following plot is one example out of series of tenths of failures.

And this was an optimal b-value! 🙁 But due to the analysis of the 4th post in this series we had to expect this.

Image reconstructions for statistical vectors with independent component values x_j in the range -1.0 < x_j < 1.0

The same for

-1.0 ≤   x_j   ≤ 1.0,    for all j in [0, 256]

Image reconstructions for statistical vectors with independent component values x_j in the range -2.0 < x_j < 2.0

The same for

-2.0 ≤   x_j   ≤ 2.0,    for all j in [0, 256]

Image reconstructions for statistical vectors with independent component values in the range -5.0 < x_j < 5.0

If you have read the previous posts in this series then you may think: Some of the component values of latent vectors for CelebA images had bigger values anyway. So let us try:

-1.0 ≤   x_j   ≤ 1.0,    for all j in [0, 256]

And we get:

Although some component values may fall into the value regions of latent CelebA vectors most of them do not. The lengths (or radii) of the statistical vectors for b=5.0 do not at all fit the average radii of latent vectors for CelebA images.

What should we do?

I have described the failure to create reasonable images from statistical vectors in some regions around the origin of the latent space also in other posts on standard Autoencoders and in posts on Variational Autoencoders [VAEs]. For VAEs one enforces that regions around the origin are filled systematically by special Encoder layers.

But is all lost for a standard Encoder? No, not at all. It is time that we begin to use our knowledge about the latent z-point distribution which our AE creates for CelebA images. As I have shown in the 2nd post it is simple to get the number distribution for the component values. Because these distributions were similar to Gaussians we have rather well defined value regions per component which we can use for vector creation. I.e. we can perform a kind of first order approximation to the main correlations of the components and thus put our artificially created values inside the densely populated bulk of the z-point region for CelebA images. If that region in the latent space has a meaning at all then we should find some interesting reconstruction results for z-points within it.

We can do this even with our simple generation method based on constant probability densities, if we use individual value regions -b_j ≤ x_j ≤ b_j for each component. (Instead of a common value interval for all components). But even better would be Gaussian approximations to the real number distributions for the component values. In any case we have to restrict the values for each component much stronger than before.

Conclusion

What we already had expected from previous analysis became true. A simple method for the creation of statistical latent vectors does not give us z-points which are good for the creation of reasonable images by a trained AE’s Decoder. The simplifications

  1. that we can treat the component values of latent vectors as independent variables
  2. and that we can assign the components x_j values from a common interval -b ≤ x_j &le b

cause that we miss the bulk region in the AE’s latent space, which gets filled by its trained Encoder. We have demonstrated this for the case of an AE which had been trained on images of human faces.

In the next post

Autoencoders and latent space fragmentation – VI – image creation from z-points along paths in selected coordinate planes of the latent space

I will show that even a simple vector creation can give us latent vectors within the region filled for CelebA images. And we will see that such points indeed lead to the reconstruction of images with clearly visible human face images by the Decoder of a standard AE trained on the CelebA dataset.