This post requires Javascript to display formulas!
In this post series I discuss results of a private study about some simple statistical vector distributions in multi-dimensional latent vector spaces. Latent spaces often appear in Machine Learning contexts and can be represented by the ℜN. My main interest is:
What kind of regions of such a space may we miss by choosing a vector distribution based on a simple statistical creation process?
This problem is relevant for statistical surveys of extended regions in latent vector spaces which were filled by encoding or embedding Neural Networks. A particular reason for such a survey could be the study of the reaction of a Decoder to statistical vectors in an Autoencoder’s latent space. E.g. for creative purposes. During such surveys we want to fill extended regions of the latent space with statistical data points. More precisely: With points defined by vectors reaching out from the origin. The resulting point distribution does not need to be a homogeneous one, but it should cover the whole target volume somehow and should not miss certain sub-regions in it.
Theoretically derived results for a uniform probability distribution per vector component
In my last post
I derived some formulas for central properties of a very simple statistical vector distribution. We assumed that each component of the vectors could be created independently and with the help of a uniform, constant probability distribution: Each vector component was based on a random value taken from a defined real number interval [-b, b] with a constant and normalized probability density. Obviously, this process treats the components as statistically independent variables.
Resulting vector end points fill a quadratic area in a 2D-space or a cubic volume in 3D-space relatively well. See my last post for examples. The formulas revealed, however, that the end points of our vectors lie within a multi-dimensional spherical shell of an average radius <R>. This shell is relatively broad for small dimensions (N=2,3). But it gets narrower and narrower with a growing number of dimensions N ≥ 4.
In this post I will first test my formulas and approximations for a constant probability density in [-b, b] with the help of a numerical experiment. Afterward I discuss what kind of regions in a latent space we may miss even when we fill a sequence of growing cubes around the origin with statistical points based on our special vector distribution.