KMeans as a classifier for the WIFI and MNIST datasets – II – PCA in combination with KMeans for the WIFI-example

I continue with my series of posts about using KMeans as a classifier for some simple ML datasets. In the last post

KMeans as a classifier for the WIFI and MNIST datasets – I – Cluster analysis of the WIFI example

I applied KMeans to the “WIFI” dataset which was discussed in the German “Linux Magazin”; see the Nov and Dec editions of 2021. We found 4 to 5 well separated clusters in a projection of the data points onto a 2-dimensional space defined by just two out of seven features.

I briefly discuss how this result is related to a PCA analysis. In my opinion the discussion in the “Linux Magazin” was a bit misleading regarding this point.

PCA analysis

A PCA analysis helps to identify the main orthogonal axes of the distribution of the samples’ data-points in their multidimensional “feature space”. If the data points for the samples are distributed in all directions and over all regions of the feature space in a similar way we may indeed need all of the feature spaces’s dimensions to describe the data distribution. Still, we could find some complicated curved hyperplanes which separate groups of data points with identical label quite well.

But often the data points are positioned along certain preferred directions, i.e. the data points are located along specific lines or (multidimensional) flat planes in the feature space – not withstanding an additional clustering. In such cases the data distribution may exhibit some intrinsic major axes in the feature space AND/OR the distribution may be confined to a subspace of the original feature space. The sub-space is defined and spanned by fewer axes than the original space. We speak of the “primary components” of the data distribution when we refer to the (orthogonal) axes of such a sub-space.

Of course, we can span the full original feature space by the most important primary component axes plus some extra orthogonal axes. I.e., we just need the same number of main components as the dimension of the original feature space. Then we get a new coordinate system which describes the vector space of the original features in a different way: The differences are

  • another orientation of the main component axes in comparison with the original axes,
  • a difference in the position of the origins of the two coordinate systems.

Of course, there is a mathematically well defined transformation which maps coordinates of data points with respect to the original feature spaces axes to coordinates in a coordinate system defined by the main component axes.

Of course, we can project a vector describing a data point onto unit vectors along the axes of either coordinate system. By using these special components we can describe the data distribution in terms of a sub-space spanned by only the most important primary component axes. A projection of ML data points to primary component axes is equivalent to a reduction of the dimensionality of the ML problem. Whenever we find that we can use fewer main component axes than the feature space’s number of dimensions to describe the data distribution reasonably well we can reduce the dimensionality of the problem: We project the original data vectors in the feature space to the axes of the main components’ coordinate system with fewer dimensions than the feature space.

How do we measure the “importance” of a main or primary component?

A metric for the importance of a primary component axis is its contribution to the so called “explained variance”. This quantity measures correlations of data in the original feature space and thus also the amount of information residing in the data.

From a mathematical point of view determining the main component axes corresponds to the diagonalization of the so called covariance matrix and the identification of eigenvectors. (You can find a short explanation in the books of S. Rashka on “Python Machine Learning”, 2016, PACKT or the book of J. Frochte on “Maschinelles Lernen”, 2019, Carl Hanser.) The eigenvectors define the directions of the main components’ axes. The variation of the data along such a main component axis contributes to the total variance by its measure of specific correlations, i.d. weighted quadratic data point distances. We are interested in finding those main component axes for which the projected data distribution explains most of the data variation.

Note that the axes which a PCA-analysis determines are orthogonal axes. Thus PCA describes a transformation of vectors in the feature space from one coordinate system with orthogonal axes to another coordinate system with orthogonal axes. Geometrically, this can be described by a sequence of a translation, followed by a rotation – and in case of a dimensionality reduction in addition by a projection.

Let us visualize an example. The following picture shows the surface of an asymmetric “ellipsoidal” data distribution in a 3-dim feature space. (Actually, the image does not display a real ellipsoid – but we ignore the differences for reasons of simplicity.) The dark arrows in the picture indicate the orthogonal axes and unit-vectors per dimension of the 3-dim feature space. The colored arrows of the ellipsoid show the direction of the main component axes of the “ellipsoid”.

Regarding the dimensions of the ellipsoid the elongation along the red vector is biggest. The width of the data point distribution in the direction of the blue vector is significantly smaller, but still bigger than in the direction of the green vector. So, we would expect that the two main component axes in the directions of the red and the blue arrows explain most of the data variation in the feature space. The projections of the data point vectors in a coordinate system defined by the main component axes onto the “green” axis would only give us rather small values. Therefore, we can reduce the dimensionality of the problem by describing the data distribution in only two dimensions: We project the vectors of the original data points in the feature space onto the two most important main component axes – in the directions of the red and the blue unit vectors.

Note: The axis in the direction of the dominant elongation of the ellipsoid has a diagonal orientation in the feature space. This means that all of the three original features contribute to the data points’ distribution in this direction. Therefore, the following point must be underlined:

A PCA analysis is not a selection process with respect to the original features. A vector describing a main component axis of the data distribution is a
linear combination of all unit vectors along various original axes of the feature-space. Normally many – if not all – original features contribute to a main component axis.. Without a detailed analysis you can not assume that only one or two original features determine the primary component axis.

In particular: You cannot assume that only one special feature dominates a main component axis. Unfortunately, the text in the Linux Magazin on the WIFI example could be read and interpreted in this way. So, let us have a closer look at the reason, why the projections of the WIFI-data from the original 7-dimensional feature-space onto a reduced 2-dim space of the signal-combinations [WLAN-4, WLAN-0] or [WLAN-3, WLAN-0] worked so well.

How many main components dominate the variance of the WIFI data distribution?

How many main components explain most of the variation of the samples’ data of the WIFI example? How do we get the required information about the contribution to the explained variance from PCA applications? Sklearn’s PCA implementation provides the usual “fit()”-function to perform the PCA calculation for a given data set. But it also provides an array named “explained_variance_” which contains the individual contributions of the main components to the “explained variance“.

So, as a first step, we simply try and apply Sklearn’s PCA()-function directly to the original WIFI data given in their 7-dimensional feature space. I.e. without any scaling. Just as it was done in the Linux Magazin. The bar plot below displays the “importance” of a maximum of 7 main components. The importance of each component is measured by its normalized contribution to the “explained variance”:

An accumulation gives the following percentages:

We see that only two of the main PCA components already explain 85% of the data variance in the feature space. We thus could choose these two main components as our “primary” components. But this does NOT automatically mean that only two features dominate the data distribution in the feature space.

Which features determine the primary components’ orientation?

As we have discussed above the projection of a unit vector along the main component axis onto the original axes of the feature-space may give similarly big values for each of the features. It is rather seldom that only a few features determine the direction of a main component’s axis.

But: The WIFI data are (on purpose?) indeed distributed in a very special way. Actually, only two original features determine the direction of the most important primary component’s axis already quite well. And just one feature dominates the second most important PCA component. So, in the WIFI example the first and the second main components are oriented more or less within a 2-dim feature plane and along a special feature axis, respectively. I.e. the data are more or less confined to a 3-dimensional sub-space of the feature space. How and where from did I get this information?

Sklearn’s function “PCA()” returns a reference to an object, which after a call to its method “fit()” has a filled property “components_“. This array gives us the 7 vector-components of each unit vector oriented along a PCA main component axis with respect to the various original axes of the feature space. I.e. we get the components of unit vectors along main component axes in terms of the original coordinate system spanning our feature space. The related coefficients of the vector tell us whether the main component axes are confined to a subspace spanned by just a few original features.

Below, the elements (rows) of the array were reversely sorted by the the contribution of the main component to the “explained variance”. So the first two rows correspond to the two most important PCA main components.

[6.22e-01 7.47e-04 1.03e-02 6.29e-01 2.02e-01 3.03e-01 2.91e-01]
 [1.71e-01 9.05e-02 4.31e-01 1.95e-01 8.50e-01 1.02e-01 7.62e-02]
 [2.41e-01 2.78e-01 2.18e-01 2.90e-01 9.80e-02 5.28e-01 6.67e-01]
 [1.67e-01 3.44e-01 7.05e-01 1.69e-01 4.22e-01 9.65e-02 3.75e-01]
 [1.13e-01 1.77e-01 3.15e-01 1.33e-01 1.82e-01 6.97e-01 5.66e-01]
 [6.37e-01 2.31e-01 3.28e-01 6.45e-01 1.26e-01 1.32e-02 3.22e-02]
 [2.80e-01 8.43e-01 2.50e-01 1.42e-01 2.43e-02 3.52e-01 5.12e-02]]

The first primary component has vector-components along the original axes of the feature space given by

6.22e-01, 7.47e-04, 1.03e-02, 6.29e-01, 2.02e-01, 3.03e-01, 2.91e-01

The features, i.e. the WLAN signals, are numbered in the given order from left to right. Obviously, the features “WLAN-0” and “WLAN-3” dominate the direction of the first primary component.

The second main component

1.71e-01, 9.05e-02, 4.31e-01, 1.95e-01, 8.50e-01, 1.02e-01, 7.62e-02

is instead dominated by he feature “WLAN-4”.

Actually, a closer look shows that the signal “WLAN-6” dominates the third main component by a relatively big value. This might be an indication that we had better used three primary components instead of two … I come back to this point below.

So, what do we learn from the results of the projections of the PCA unit vectors onto the axes of the feature space?

  1. In the very special case of the WIFI example around 3 original features dominate the overall data distribution.
  2. In the WLAN-0/WLAN-3 plane we should see an approximate diagonal distribution of data. Reason: the vector components are of almost equal size.
  3. As we already know from an elbow-analysis we have 4 to 5 clusters. So, we should see them clearly in a 3D-plot for the axes WLAN-0, WLAN-3 und WLAN-4.
  4. If the data are well separated into clusters along the diagonal in the WLAN-0/WLAN-3 plane AND the WLAN-4 direction then they will also be well separated in the 2-dim space of the 2 main PCA components.

Ok, let us visualize it. The next plot shows the data distribution from a view almost perpendicular to the WLAN-3/WLAN-4 plane. The colors indicate the labels of the data (i.e. the rooms where the strength of each of the 7 WLAN signals has been measured).

3D-plot of the WIFI data distribution in the space of the dominant 3 original features – the WLAN-3/WLAN-4 plane

We already see the clustering. The next plot shows the data distribution from a direction almost perpendicular to the WLAN-0/WLAN-4 plane.

3D-plot of the WIFI data distribution in the space of the dominant 3 original features – the WLAN-0/WLAN-4 plane

And now a view from above showing the diagonal distribution of very many data points. We clearly see that there is something strange going on in the “orange” room.

3D-plot of the WIFI data distribution in the space of the dominant 3 original features – the WLAN-3/WLAN-0 plane

So far, so good! We have again identified the clusters which we already got familiar with in my last post.

Data distribution in the vector space of the three most important PCA components

In full consistency with the results derived above we expect a good cluster separation in the plane of the first two main PCA components. These components are called “PCA-1” and “PCA-2” in the following plot:

3D-plot of the transformed WIFI data distribution in the space of the dominant 3 PCA components – the PCA-1/PCA-2 plane

But looking from a different perspective, we see that there still is a significant distribution along the axis of the third main component – at least with the scaling used along the PCA-3 axis.

3D-plot of the transformed WIFI data distribution in the space of the dominant 3 PCA components – the PCA-1/PCA-3 plane

Even when we take into account the different scales of the axes: The spread in z-direction (PCA-3) is relatively big compared with the data spread in the PCA-2 direction. Again, we see that the data indicate three main components. Why did we not get this information already in our bar plot for the “explained variance”?

Working on scaled data

Well, part of the answer to the last question is that we did not really treat the various WLAN signals equally well. Actually, for very simple reasons, a PCA analysis of really independent features with different measurement units should be applied to scaled data. So, just for curiosity’s sake, let us apply Sklearn’s StandardScaler() to our WIFI data ahead of a PCA-analysis. Then, we indeed get a different bar plot:

The elbow is now centered at a point corresponding to 3 main components! Below I show respective 3D-plots for the standardized and PCA-transformed data:

3D-plot of the scaled WIFI data distribution in the space of the dominant 3 PCA components – the PCA-1/PCA-3 plane

3D-plot of the scaled WIFI data distribution in the space of the dominant 3 PCA components – the PCA-1/PCA-2 plane

However, the 5th cluster – a subcluster of the orange one – is no longer so clearly visible as before. This is due to the fact that the standard deviation of the data around the mean value of each feature is adjusted to a value of 1.0 with StandardScaler(). A MinMaxScaler() does a better job:

In the case of the WIFI example there is also a strong counter-argument against scaling:
The individual features and their scales are NOT really independent of each other. A weaker signal or a specific spread around the mean value of a specific signal do actually mean something! When we have multiple maxima in a signal distribution (see the previous post) this carries some important information – and then the adjustment of the standard deviation to a standard value is not a really good idea. This means that it depends on the data and their meaning which kind of scaler one should use ahead of a PCA analysis.

Conclusions

The existence of a few primary components does not automatically mean that only a few features contribute to the data distribution’s variance in the features space. However, in the case of the WIFI data example we have a special situation for which only three out of seven features do determine the primary components and the direction of the respective preferred axes of the data distribution. We also saw that we may have to scale feature data properly before applying a PCA analysis.

In the next post of this series

KMeans as a classifier for the WIFI and MNIST datasets – III – KMeans as a classifier for the WIFI-example

we shall answer the question whether and how we can use the cluster algorithm KMeans also as a classifier for the WIFI data.

Ceterum censeo: The worst fascist today who really and urgently must be denazified is the Putler.

KMeans as a classifier for the WIFI and MNIST datasets – I – Cluster analysis of the WIFI example

In the November and December 2021 editions of the German “Linux Magazin” R. Pleger discussed a simple but nevertheless interesting example for the application of a cluster algorithm. His test case was based on a dataset of the UCI Irvine. This dataset contains 2000 samples with (fictitious?) data describing WIFI signals which stemmed from seven WLAN spots around a building. The signal strength of each source was measured at varying positions in four different rooms. I call the whole setup the “WIFI example” below.

One objective of the articles in the Linux Magazin was to demonstrate how simple it is today to apply basic Machine Learning methods. In a first step the author used a ML classifier algorithm to determine the location (i.e. the room) of a measuring instrument just from the strengths of the different WIFI signals. This task can be solved by a variety of algorithms – e.g. by a Decision Tree, SVM/SVC or a simple Multilayer Perceptron. The author used Sklearn’s RandomForestTree. This method is a good example for the powerful “Ensemble Learning” technique. When applied to the simple and well structured WIFI example it predicts the rooms for test samples with an accuracy of more than 98%.

The author afterward performed a deeper analysis of the WIFI data via Kmeans, MiniBatchKMeans and PCA. His second article underlined a major question, which sometimes is not taken seriously enough:

Do the data, which we feed into ML algorithms, really cover all aspects of the problem? Is the set of target labels complete or sufficient in the sense that the separation of the samples into labeled groups really reflects the problem’s internal structure? Or do the data contain more information than the labels reveal?

Unfortunately, in my opinion, the Linux Magazine covered an important point, namely the relation of the results of a PCA analysis to a 2-dimensional cluster visualization, in an incomplete and also slightly misleading way. In addition another interesting question was not discussed at all:

Can we use KMeans also as a classifier? How would we do this?

In this series of posts I want to dig a bit deeper into these topics – both for the WIFI example and also for the MNIST dataset. For MNIST we will not be able to visualize clusters as easily as for the WIFI example. Therefore, we should have a clear idea about what we do when we use clusters for classifying.

In this first post I focus on the results of a cluster analysis for the WIFI example. In a second article I will discuss the relation of cluster results to a PCA analysis. A third post will then present a very simple method of how to turn a cluster algorithm into a classifier algorithm. In later articles we shall transfer our knowledge to the MNIST data. More precisely: We shall combine a PCA analysis with a cluster classifier to predict the labels of handwritten digit images. We will use the PCA technique to reduce the dimensions of the MNIST feature space from 784 down to below 80. It will be interesting to see what accuracy we can reach with a relatively crude clustering approach on only about 30 main PCA components. As a side aspect we shall also have a look at standardization and normalization of the MNIST data.

I do not present any code in the first three posts as the required Python programs can be build relatively straight-forward and most of the core statements were already given in the Linux Magazin. You unfortunately have to buy the articles of the magazine; but see https://www.linux-magazin.de/ausgaben/2021/11/maschinenlernen/. However, as soon as we turn to MNIST I shall provide a Jupyter notebook.

The WIFI example: Two thousand samples, each with data for the signal strength of seven WLAN sources measured in four rooms

You can download the data WIFI data set from the following address:
https://archive.ics.uci.edu/ml/machine-learning-databases/00422/wifi_localization.txt

The feature space of this is example is 7-dimensional: 7 WLAN spots provide WIFI signals in the building. We have 2000 samples. Each sample provides the signal strength of each of the WLAN sources measured at different times and positions within a specific room. An integer number in [1,4] is provided as a label which identifies the room. The following plot shows the interpolated frequency distribution over the signal strength for each of the 7 signals in the four rooms:

Cluster analysis of the WIFI data – more than four rooms?

The original label-data of the WIFI example imply the existence of four rooms. But can we trust this information? The measurements in the room, which we called “Diele” in the plots above, indicate a consistent second peak for both the signals 0 and 3. Is this due to an opening into another room?

A simple method to analyze the inner structure of the distribution of data points in a configuration or feature space is a “cluster analysis”. The KMeans algorithm provides such an analysis for an assumed number of clusters.

KMeans is a basic but important ML method which reveals a lot about the data distribution in feature space and indirectly about the complexity of hyperplanes required to separate data according to their labels. Among other things KMeans determines the positions of cluster centers – the so called centroids – by measuring and systematically optimizing distances of samples to assumed centroids. Actually, the sum over all intra-cluster variances, i.e. the summed quadratic distances of the associated samples to their cluster’s centroid, is minimized. The respective quantity is called “inertia” of the cluster distribution. See e.g. the excellent book of P. Wilmott, Machine learning – an applied mathematics introduction” on this topic.

A simple method to find out into how many clusters a distribution probably segregates is to look for an elbow in the variation of the inertia with the number of clusters. When we look at the variation of inertia values with the number of potential clusters “k” for the WIFI example we get the following curve:

This indicates an elbow at k=4,5.

Another method to identify the most probable number of distinct clusters in a multi dimensional data point distribution is the so called “silhouette analysis”. See the book of A. Geron “Hands-On Machine Learning with Scikit-Learn, Keras and Tensorflow”, 2n edition, for a description. For the WIFI example the plots of the silhouette score data support the result of the elbow analysis:

The second plot shows ordered silhouette data for k = 3,4,5,6 clusters. Again, we get the most consistent pictures for k=4 and k=5.

So, the data indicate a fragmentation into 4 or 5 clusters. How can we visualize this with respect to the feature space?

Scatter plots for 2-dim sub-spaces of the feature space

A general problem with the visualization of cluster data for multidimensional data is that we are limited to 2, maximal 3 dimensions. And a projection down to two dimensions may not reflect the real cluster separation in the multidimensional feature space in a realistic way. But sometimes we are lucky.

We shall later see that there are two primary components which dominate the data and signal distributions in the WIFI example. A major question, however, is whether we will also find that only a few original features contribute dominantly to these major components. A PCA analysis does not mean that a “primary component” only depends on the same number of features!

As I did not know the relation of “primary components” to features I just plotted the results of “KMeans” for a variety of 2-dim signal combinations. I used Sklearn’s version of KMeans; due to the very small data ensemble KMeans is applicable without consuming too much CPU time (this will change with MNIST; there we need to invoke MiniBatchKMeans):

Note that the colorization of the data points in all plots was done with respect to the cluster number predicted by KMeans for the samples – and not with respect to their labels.

It is interesting that the projections onto two special feature combinations – namely WLAN-4/WLAN 0 and WLAN-3/WLAN-0 signal – show a very distinct separation of the clusters.

Four or five clusters ?

The data displayed above depend a bit on the initial distribution of cluster centers as an input into the KMeans algorithm. But for 4 and 5 clusters we get very consistent results. The next plots show the positions of the centroids:

This time the colorization was done with respect to the labels. What we see is: Five clusters represent the situation a bit better than only 4 clusters.

When we align this with the rooms: Five “rooms” may describe the signal variation better than only 4 rooms. The reason for this might be that one of the four rooms has a wall which partially separates different areas from another. We often find this in “entrances” [German: “Diele”] to houses. Sketches of the rooms in the Linux Magazin article actually show that this is the case. And, of course, such a wall or an opening into another room would have an impact on the damping of the WLAN signals.

Addendum 19.03.2022: Comparing clusters with groups of labeled data points

An important question which we have not answered yet by the images shown above is the following:

How well do clusters coincide with groups of data points having a specific label?

Note that in general you can not be sure that clusters reflect data points of the same label. Actually, a cluster is only a way to describe a close spatial vicinity of data points in some region of the multidimensional feature space. I.e. some kind of clumping of the data points around some centroids. But spatial vicinity does not necessarily reflect a label: A label border may often separate data points which are very close neighbors. And a cluster may contain a mixture of samples with different labels ….

Well, in the case of the WIFI example the identified 4 to 5 clusters match the groups of data points with different labels quite well. Below I superimposed the sample’s data points with different colors: First I colorized the data points according to their label. On top of the resulting scatter plot I placed the same data points again, but this time with a different and transparent colorization according to their cluster association. In addition I shifted the second data layer a bit to get a better contrast:

You see that the areas are not completely identical, but they overlap quite well. Obviously, I used 5 clusters. Also the fifth cluster fits well into a region characterized by just one label.

Conclusion

The simple WIFI example shows that a cluster analysis may give you new insights into the structure of ML data sets which a simple classifier algorithm can not provide. In the next article

KMeans as a classifier for the WIFI and MNIST datasets – II – PCA in combination with KMeans for the WIFI-example

we shall link the information contained in the “clusters” to the results of a PCA analysis of the WIFI example.

Stay tuned …

Ceterum censeo: The most important living fascist which must be denazified is the Putler.

Blender – complexity inside spherical and concave cylindrical mirrors – IV – reflective images of a Blender variant of Mr Kapoor’s S-curve

The topic of the last post in this series

Blender – complexity inside spherical and concave cylindrical mirrors – I – some impressions
Blender – complexity inside spherical and concave cylindrical mirrors – II – a step towards the S-curve
Blender – complexity inside spherical and concave cylindrical mirrors – III – a second step towards the S-curve

was the construction of a metallic object with basic similarities to the “S-curve” of artist Anish Kapoor. Our object was a bit more extreme than the artists real object; we had a smaller curvature radii in two dimensions and on the outer ends our surface approximated a half circle boundary curve. We therefore could expect multiple reflections of light rays on the concave side(s) of our virtual object when applying ray tracing.

At the end of my last article I already presented some images of the reflection of a far distant horizontal line marked by a sun close to the horizon at dusk or dawn. In this article I am going to add some simple objects – a small red and a small green sphere at varying positions. Plus a point like light source. I take some shots with the virtual Blender camera form different angles and with varying focal length. I present the results below without many comments.

What we see is a rich variation of patterns and figures. Mathematically it is all the result from a single and simple mapping-operation. Each operation maps one point on our surface to another point on the S-curve (or on a hit sphere). The points are given by millions and millions of light rays which in the end reach our virtual camera from different angles. The basic message is:

Simplicity can create a complexity which or brain would not predict without some deeper analysis. And a complex apparition may be based on simple rules and the selection of special circumstances.

So, besides many other philosophical aspects Mr. Kapoor’s “S-curve” reveals a very fundamental idea in physics, certain branches of mathematics and in information theory.

Reflections of a horizon line

Reflections of a horizon line and a red sphere

Reflections of a horizon line, a red and a green sphere

Note that the concave side of the S-curve gives us a first idea about what we can expect from a full half-sphere where even more reflections on the surface are possible before a light ray reaches the camera.

But, in my next article
Blender – complexity inside spherical and concave cylindrical mirrors – V – a video of S-curve reflections
I am first going to produce a movie of objects moving in front of the concave part of the S-curve.

 

Blender – complexity inside spherical and concave cylindrical mirrors – III – a second step towards the S-curve

I continue with my mini-series on how to (re-) build something like the S-curve of Mr. Kapoor in Blender. See :

Blender – complexity inside spherical and concave cylindrical mirrors – I – some impressions

In my last post

Blender – complexity inside spherical and concave cylindrical mirrors – II – a step towards the S-curve

I discussed how we can add a smooth transition from convex and concave curvature around the y-axis of an originally flat rectangular Blender mesh positioned in the (y,z)-plane. The rectangle had its longer side in y-direction. Starting from a flat area around the vertical middle axis (in z-direction) of the object we bent the wings to the left and right around the y-axis with systematically growing curvature, i.e. with shrinking curvature radius, in y-direction. The curvature around the y-axis left of the central z-axis got a different sign than the curvature right of the central z-axis. At the outer edges of both wings we approximated the form of a half cylinder. So curvature became a function of both x and y.

To create a really smooth surface with differentiable gradient and curvature we had to apply a modifier called “Subdivide Surface“. The trick to make this modifier work sufficiently well with only a few data points in z-direction was to keep curvature almost constant in z-direction for a given y-position along the horizontal axis. We achieved this by putting the vertices of our mesh on the central rotation-axis at the same z-(height)-values as the vertices on the circles at the outer edges. In the end we had established a smoothly varying gradient of the surface curvature around the y-axis with the y-coordinate while the partial derivative in z-direction of this curvature was close to zero for any fixed y-position.

In this post I want to add an “S-curvature” of the object in y-direction. Meaning: We are now going to bend the object along a S-shaped path in the (x,y)-plane. Physically, we are adding curvature in x-direction, more or less constant around two imaginary vertical axes positioned at some distance in y-direction from the central z-axis – and with different signs of the curvature. So, we are creating a superposition of a growing curvature around the y-axis with a constant curvature around two z-axes for each of the wings left and right of the central rotational z-axis of our object. Eventually, we build something like shown below:

When we look at images of Mr. Kapoor’s real S-curve we see that he keeps curvature at zero both in z- and a diagonal x/y-direction at the central rotational axis – due to the “S”. The same is true for our object. But: In comparison to the real S-curve of Mr. Kapoor our object is more extreme:

The ratio of height in z-direction to length in y-direction is bigger in our case. The object is shorter in y-direction and thus relatively higher in z-direction. The curvature radii around the y-axis are significantly smaller. Our surface approximates full half circle curves at the outer edges; in contrast to the real S-curve our surface approximates a shifted cut of a cylinder.

We could, of course, adapt our S-curve in Blender a bit more to the real S-curve. However: Our more extreme bending around the y-axis at the outer left and right edges of the object allows for multiple reflections of light falling in in x-direction on the concave sides of the object. This is not the case for the real S-curve.

Steps to give the surface an S-shape

How do we get to the surfaces presented in the above figures in Blender? The following steps comprise building a suitable symmetric and very fine grained Bezier-path and the use of a curve modifier. To achieve a relatively smooth bending in x-direction we must first subdivide our original object in sufficient sections in y-direction in addition to the already existing subdivisions in z-direction.

Step 1: Sub-dividing our object in y-direction

The final object mesh constructed in my last post is depicted below:

We “apply” (via menu “Object” > “apply”) any rotations we may have done so far. The object’s center should now coincide with the world center. We position the camera some distance apart in x-direction from our object, but at y=z=0. The following image shows the camera perspective.

Remember that our object consists of two meshes which we have joined at the central axis. We are now going to separate each half in 32 sections in y-direction.

Go to Edit mode, position the cursor in one half and press Ctrl-R:

Turn your mouse wheel until you have created 32 subdivisions. (The number is shown at the bottom left of your view-port). Left click twice to fix the subdivision lines at their present positions; do NOT move the mouse in between the clicks.

Doing this on both halves eventually gives us:

We check that y- and x-distances of the vertices have equal absolute values. We in addition check z-heights for selected vertices and compare them to the height of vertices on the outer circles.

Step 2: Design a S-path

We move our object, which has dimensions of 6m in y-direction and 2m in z- and x-direction, respectively, to the left (y=-6m). Just to get some space at the world origin. We now add a Bezier curve to our scene. Its origin is located at the world center and it is stretched along the x-axis. Rotate the curve around the z-axis by 90 degree such that it stretches along the y-axis. Choose a top-view position and rotate the viewport such that the y-axis points to the right.

Stretch the curve to an y-dimension of 6m. Select the curve. Go to edit mode. We now rotate the rightmost and the leftmost tangent bars such that we get the following curve:

You fix the required symmetry by watching and adjusting the position information of the tangent bar\’s end vertices in the sidebar of the viewport. Note that a tangent bar has 2 associated handles and vertices. By changing vertex positions systematically to [(x=-0.5, y= -3.5), (x=0.5, y= -2.5)] and [(x=-0.5, y=2.5), (x=0.5, y=3.5)] on the left and the right side, respectively, you get a seemingly nice flat S-Curve.

Go to object mode and change the dimension of the curve in x-direction to 2m. You then clearly see that the path is not a very smooth one, but consists of distinguished linear segments. This would be a major problem later on. In addition the curvature for the S seems to be a bit extreme. We change the x-dimension to just 1 m. Then we subdivide the curve into further segments in Edit mode. Multiple times.

And we get a smoothly curved and well dimensioned path:

Eventually, we end up with the following situation:

Step 3: Apply a curve modifier to our object

Now comes an important point: If you have assigned the origin of the curve to its midpoint, you should do the same with your object- see the Internet for appropriate Blender operations:

We then move our object to y=3.171m and z=1m, i.e. a bit further than rightmost end-point of the path and above the worlds central plane.

Now, we add a curve-modifier to our object, select the Bezier curve (our path) and get

We center our view and choose a top-position. We adjust the y-location of the object until its present center coincides with the world center

And there we have our personal S-curve – more extreme than Kapoor’s real S-cure – but this is only a question of dimension adjustments AND the right choice of how to cut the limiting circle mesh in the beginning (see the last post). Our S-curve will show multiple reflections on its concave side(s).

Apply the modifier “subdivision surface”

The rest is routine for us already. We apply the subdivision surface modifier to get a smooth surface.

When we modify the world’s sky texture a bit with a sun just at the horizon then we get images like the following just from the horizon line and from different camera perspectives with different focal lengths (wide angle shot).

We clearly see multiple reflections in vertical direction on the left concave side of our object.

Complexity out of simple things … Real fun …

Conclusion

Rebuilding something like the S-curve of Mr. Kapoor was hard work for me who uses Blender just as a hobby tool. But it was worth the effort. In my next post,

Blender – complexity inside spherical and concave cylindrical mirrors – IV – reflective images of a Blender variant of Mr Kapoor’s S-curve

I am going to show what happens when we place objects in front of our S-curve. This will give us a first impression of what might happen with a totally concave surface as an open half sphere.
Stay tuned …

 

Blender – complexity inside spherical and concave cylindrical mirrors – II – a step towards the S-curve

In my last post

Blender – complexity inside spherical and concave cylindrical mirrors – I – some impressions

I briefly discussed some interesting sculptures and optical experiments in reality. The basic ideas are worth some optical experiments in the virtual ray-tracing world of Blender. In this post I start with my trial to reconstruct something like the so called “S-curve” of the artist Anish Kapoor with Blender meshes.

If you looked at the link I gave in my last article or googled for other pictures of the S-curve you certainly saw that the metallic surface the artist placed at the Kistefoss museum is not just a simple combination of mirrored cylindrical surfaces. It is much more elegant:

The first point is that it consists of one continuous coherent piece of metal. The surface is deformed and changes its curvature continuously. It shows symmetry and rotational axes. When my wife and I first saw it we stood at a rather orthogonal position opposite of it. We only got aware of the different cylindrical deformations on the left and right side. We wondered what Kapoor had done at the middle vertical axis as we expected a gap there. Later we went to another position – and there was no gap at all, but a smooth variation of curvatures along the main axes of the object.

The second point is the combination of different curvatures: a cylindrical curvature in vertical direction (mirrored in left/right direction) plus the elegant S-like curvature in horizontal direction. The curvature in vertical direction grows with horizontal distance from the center – it is zero at the central vertical axis. The left and right part of the object are identical – they reflect a 180 °ree; rotation (not a mirroring process) around the central vertical axis. Actually, the gradient at the central rotational axis and the along the horizontal symmetry axes disappears. And no curvature at all at the central vertical axis.

All in all a lot of different symmetries and smooth curvature transitions! The artist plays with the appeal of symmetries to the human brain. But, at the same time, he breaks symmetry strongly in the visual impression of the viewer with the help of the rules of optics. Wonderful!

In this article I want to tackle the problem of a smooth transition between two cylindrically deformed surfaces in Blender first. The S-curvature is the topic of the next post.

The result first

I first show you what we want to achieve:

We get an impression of the mirroring effects in “viewport shading mode” by adding a sky texture to the world background and a simple textured plane:

The reader may have detected small dips (indentations) at the centers of the upper and lower edge. I come to this point later on. Compared to the real S-curve a major difference in vertical direction is that Kapoor did not use a the full curvature of a half cylinder at the outmost left and right ends. He maybe took only a cut off part of a half circle there. But what part of a half-circle you use in the end is a minor point regarding the general construction of such a surface in Blender.

How to get there?

As I only use Blender seldom I
really wondered how to create a surface like the one shown above. Mesh based or nurbs based? And how to get a really smooth surface? Regarding the latter point you may think of subdivisions, but this is a wrong approach as a subdivision of a mesh intersects linear connections between vertices. Therefore, if you applied simple subdivision to the object you would create points not residing on a circle/cylinder/surface – which in the end would disturb the optics by visible lines and flat planes. Even if you added a smoothing modifier afterward.

The solution in the end was simple and mesh based. There is one important point to note which has to do with rules for object creation in Blender:

You define the resolution of the mesh(es) we are going to construct in the beginning!

As we need to edit some vertex positions manually the resolution in first experiments should rather be limited. For a continuous surface we shall apply a surface smoothing modifier anyway. This modifier rounds up edges a bit – which leads to the “dips” I mentioned. They will be smaller the higher you choose the meshes’ resolution – but this is something for a final polished version.

Constructional steps

All in all there are many steps to follow. I only give a basic outline. Read the Blender manual for details.
Note: I added the application of a modifier in the middle of the steps for illustration purposes. You should skip this step and apply the modifier only in the end. I sometimes experienced strange effects when applying and deleting the modifier during work with vertices.

Step 1: You first create a mesh based circle. You now decide which number of mesh nodes and basic resolution of the later surface you want to have. This is done by the the tool menu that opens in Blender version 2.82 in the lower left of the viewport. Lets keep to the standard value of 32 mesh points (vertices). This obviously means that a half-circle later on will contain 17 vertices. All vertices of our first reside exactly on the circle line. The circles center resides at the global world center. You also see that 4 points of the circle sit on the world axes X, Y. Leave the circle exactly where it is. Do not apply any translation. (It would be hard to realign it with world axes later on.)

Step 2: Change to Edit mode and remove one side of the circle (left of the X-axis) by eliminating the superfluous vertices. Do it such that the end points of the remaining half-circle reside exactly on the X-axis of the world mesh. Keep the origin of the mesh were it is. Do NOT close the circle mesh on the X-axis, i.e. do not create a closed loop of vertices!

Step 3: You then add a line mesh in Object mode. This can e.g. achieved by first creating a path. Move it along the world Y-axis to get some X-distance from the half-circle (-3m). Select the path by right clicking and convert the path to a mesh by the help of a menu point. Go to Edit mode again and eliminate vertices (or adding by subdividing) – until the resulting line mesh has exactly the same number of vertices (17) as your half circle (including the end points). In object mode set the origin to the mesh’s geometry, i.e. its center. Move the line mesh to X=0. Change its X-dimension to the same value the half circle has (2m).

Step 4: Rotate the half-circle by 90 degrees around the X-axis to get a basic scene like in the picture below. Join the two meshes to one object.

Step 5: Go to Edit mode and provide missing edges to connect the line segment with the half-circle.

Step 6: Add faces by selecting all vertices and choosing menu point “Face > Grid Fill”.

Hey, this was a major step. save your results – and make a backup copy for later experiments.

Step 7: Add a Sky Texture to the world. Activate the Cycles renderer. Rotate the object by 90 degree around the Y-axis. Choose viewport shading mode.

Step 8: Move object to Z=1m. Right click in Object mode on your object it; choose “Shade smooth”.

Just to find that you still see the edges of the faces. Smooth is not really smooth, unfortunately.

Step 9: Skip this step in your own experiment and perform it at the end of our construction. Just for illustrating that the flat surfaces can be eliminated later on, I add a modifier to our object – namely the modifier “subdivision surface” – which offers a more intelligent algorithm than “Shade smooth”. Just for testing I give it the following parameters:

We get:

Much more convincing! You see e.g. at the left side that the corners have been rounded – this will later lead to the dips I mentioned.

Intermediate consideration
We could now duplicate our object, rotate the duplicate and join it with the original. But before we do this we change the height values of the vertices along the left edge (actually a line segment). From our construction it is clear that corresponding vertices on the half circle and the left edge cannot have the same Z-coordinate values – they reside at different heights above the ground. The “catmull clark” algorithm of our modifier therefore creates a surface with gradients and curvature varying in all coordinate directions. There is no real problem with this. However, we reduce the chance for certain caustics and cascades of multiple reflections on the concave side of the final surface. Cylindrical surfaces (i.e. with constant curvature) give rise to sharp reflective caustics. To retain a bit of this and keep the curvature rather constant in Z-direction (whilst varying in X-direction), we are going to adjust the heights of the vertices along the straight left edge to the heights of the vertices along the half-circle.

Step 10: Go to edit mode. Do NOT move the vertices of the half circle! Check the Z-value of each of the vertices of the half-circle by selecting one by one and looking at the information on
the sidebar of the Blender interface (View > Sidebar). Change the Z-coordinate of the half-circle’s counterpart on the left straight edge to the very same value. Repeat this process for all vertices of the half-circle and the corresponding ones of the straight edge.

You see that the vertices are now non-equidistantly distributed along the Z-axis on the left side !
This gives us already a slightly different shading in the lower part.

Step 11: Important! Remove the modifier if you applied it. Then: Move the object such that all vertices on the left corner are at Y=0 and X=0. For Y=0 you can just adjust the median of the vertices. Check also that the corners of the half-circle have X=0 and Y=3. All vertices of the half circle should have Y=3.

Then snap the cursor to the grid at X=0, Y=0, Z=1. Afterward snap the origin of the object to the cursor. The object’s coordinates should now be X=0, Y=0, Z=1.

Step 10: In Object mode: Duplicate the object by SHIFT D + Enter. Do not move the mouse in between; don’t touch it. Rotate the active duplicate around the Z-axis by 180 degrees.

Check the coordinates of the vertices of the mirrored object. If its right vertices reside at y=0 and its left at y=-3 then join the two objects to one. Note: At the middle there are still two rows of vertices. But their vertices should coincide exactly at their x=0, Y=0 and Z-values. If not you will see it later by some distortions in the optics.

Step 11: Add a metallic material

Place the camera at

and add
the modifier again with the settings given above. Render with the help of the material preview:

Step 11: Add a Sun at almost 180 degrees and play a bit with the sky

We get in full viewport shading:

Watch the sharp edges created by multi-reflections on the left concave side of the object. This we got due to our laborious adjustment of the Z-coordinates of our central vertices.

Save your result for later purposes!

Adding some elements to the scene

After having created such an object we can move and rotate it as we like. In the following images I mirrored it (2 rotations!). The concave curvature is now at the right side. Then I added a plane with some minimum texture with disturbances. Eventually, I added some objects and extended light sources, plus a change of the sun’s color to the red side of the spectrum. (Hint: When moving around spacious light sources relatively close to the object the reflections should not show any straight line disturbances. Its a way to test the smoothness of your surface created by the modifier.)

Yeah, one piece of metal with growing cylindrical concave and convex curvatures to the left and the right. We are getting closer to a reconstruction of the S-curve. And have a look at the nice deformations of the reflected images of a red cylinder, a green cone and a blue sphere, which I have placed relatively closely to the concave surface on the right side. Physics and Blender are fun! But all respect and tribute again to Anish Kapoor for his original idea!

In the next post

Blender – complexity inside spherical and concave cylindrical mirrors – III – a second step towards the S-curve

I have a look at an additional S-curvature in horizontal direction. Stay tuned ..