People who have a physics education and want to start with Machine Learning [ML] often stumble across the use of the term “tensor” in typical Machine Learning frameworks. For a physicist the description of a “tensor” in ML appears strange from the very beginning – and one asks oneself: Have I mist something? Is there a secret relation and mapping between the term in physics and the term in ML?
Guys, let me warn you: The expression “tensor” in theoretical and mathematical physics has a (very) different meaning than what ML people use as tensors in ML. To say it clearly:
Tensors used in physics (e.g. in General Relativity or QFT) are not the same as tensors in ML.
For historical reasons even could even say:
The central term “tensor” in ML (e.g. by Google in Tensorflow) represents a kind of misuse of the original term and related framework developed by Ricci.
I admit that the last time I have worked professionally in physics and astrophysics is 3 decades ago. But I shall describe the major properties of physical tensors below as good as I remember them from that time. For people interested in a quick refresher on the topic I recommend the book “Tensors made easy”, 6th edition, 2018, ISBN 978-1-326-29253-9.
Tensors in theoretical physics
Physicists will regard tensors as multi-linear forms defined on a multidimensional (metrical) vector-space. It is a multi-linear function where the rank is the number of arguments accepted. A tensor is a function which is linear in all its arguments. The number of arguments it takes is called the ‘rank.’ Tensors can accept a certain number of vectors or covectors (which are 1-forms that basically introduce a scalar product on the vector space) as arguments. Most importantly:
Tensor objects follow certain rules regarding their component transformation under a change of the base vectors of the vector space. Components of tensors transform either in a contravariant or covariant form with respect to the (linear) change of base vectors.
Tensor fields can be defined on differentiable and curved affine (metrical) manifolds defined over Rn. The description of curved (affinely connected) manifolds requires coordinate systems and terms of differential geometry. Tensor equations will keep their form in case of a transformation of the coordinate system (by differentiable transformation functions defining a Jacobi matrix for old and new coordinates at each point). In the case of differential equations this requires to introduce so called covariant derivatives. This allows to keep up certain Tensor relations (including those based on differential operations) during coordinate transformations.
So, tensors in mathematical physics are bound to transformatory properties regarding the change of the base vectors of underlying vector spaces and the change of coordinate systems in complex geometries. Tensors in physics describe the property of physical objects independently of the choice of a specific coordinate system in space-time or other finite or infinite vector spaces as e.g Hilbert-spaces. The reference of the tensor definition to multi-linear forms, vectors and co-vectors reveals their complex structure and properties.
Note: Physical tensors with rank ≥ 2 have n**rank components, where n is the dimension of the underlying vector space. The rank and the vector-space define the number of components.
Tensors in ML => ML-tensors
Tensors in Machine Learning are most of all variants of multidimensional arrays. To distinguish them clearly from tensors in mathematical physics I will call them below “ML-tensors”.
I quote from the documentation of tensorflow:
“Tensors are multi-dimensional arrays with a uniform type (called a dtype). You can see all supported dtypes at tf.dtypes.DType. If you’re familiar with NumPy, tensors are (kind of) like np.arrays. All tensors are immutable like Python numbers and strings: you can never update the contents of a tensor, only create a new one.”
(Highlighting was done by me.
An introductory book to PyTorch – “PyTorch for Deep Learning”, Ian Pointer, 2021, O’Reilly – expresses the following opinion (freely summarized from the German text version): A tensor is a container for numbers. It is also accompanied by a set of rules which define transformations between tensors. The easiest way for an understanding a tensor in ML is probably to think of it as a multidimensional array.
Arrays simply allow for a specific “multidimensional” organisation of data to describe some objects of interest by basically independent variables. The important point is that these variables get ordered with respect to a few central, often apparent aspects of the object. In case of a picture two such aspects might be the spatial dimensions in terms of width and height.
Tensors in ML thus have axes or dimensions of the array to describe the organization of its components in a kind multidimensional structure. Each axis symbolizes one major aspect. Along each axis we allow for a number of distinct, indexed positions fixing the “size” of this dimension. Actually a axis corresponds to an indexed list of discrete elements; it should not be confused with an axis describing a dimension of a continuous spatial geometry mapped to RN. The axes of an array define a discrete multidimensional lattice of points at defined distances.
The number of dimensions is the so called rank of a ML-tensor. All sizes for the individual dimensions are gathered in a tuple called shape.
A ML-tensor of rank 3 can be thought of an arrangement of numbers (components) in a 3-dimensional lattice. However, the shape of this lattice can show a different number of elements for each of the three dimensions. I.e. an array of shape (50,2 16) can be a ML-tensor with values for 50x2x16 =1600 logically independent variables.
Think of a color image with a rectangular form and a certain pixel resolution (100 x 50 px). We may arrange the RGB-pixel values in form of an array with shape (100x50x3). Here we would have 15000 logically independent variables to characterize our object.
A rank alone obviously does not define the number of elements of an ML-tensor – we need the shape in addition.
Note also that the organization of the elements of a ML-tensor can be changed if required – one can e.g. transform a tensor of rank 3 with 1600 elements into one of rank 1 with the same number of elements in a certain linear ordered way. Such an ordered reorganization of a tensor’s elements into a tensor of different rank corresponds in Numpy to a reshaping of an array.
ML-tensors can in the same way as arrays be objects to certain algebraic operations with other “Ml-tensor” objects. For such operations the two involved tensors must obey certain rules regarding their shapes. There is a partial overlap of such operations with those used in Linear Algebra for linear mappings between objects of vector spaces of (different) finite dimensions. So, its no wonder that libraries for linear algebra dominate the field of ML-calculations.
Some of the required operations can be performed very fast and distributed over a series of processing units (or cores) which can work in parallel.
Note that the layers of modern Artificial Neural networks [ANNs] transform ML-tensors to ML-tensors. A change of ranks and the number of elements is possible during the operations of certain layers of ANNs.
ML-tensors simply are a clever arrangement of data describing objects of interest in the form of a multidimensional array. ANNs transform such ML-tensors and a change of the rank during such transformations is very common.
Tensors in physics are complex objects corresponding to multi-linear forms defined on vector spaces. Tensors in physics must fulfill certain properties regarding their components and their relations in case of linear transformations of the base vectors and in case of tensor fields in affine and metrical geometries also regarding the change of coordinate systems. Tensors keep their rank during such basic transformations.
Summary: ML-tensor are very different objects compared to tensors in mathematical physics. One should not mix them up or confuse them.
If someone of my experienced friends in physics has found some reasonable mapping of ML-tensors to multi-linear forms, please send me an email.