Last week I stared preparing posts for my new blog on Machine Learning topics (see the blog-roll). During my studies last week I came across a scientific publication which covers an interesting topic for ML enthusiasts, namely the question what kind of statistical distributions we may have to deal with when working with data of natural objects and their properties.

The reference is:

S. A. FRANK, 2009, “The common patterns of nature”, Journal of Evolutionary Biology, Wiley Online Library

Link to published article

I strongly recommend to read this publication.

It explains statistical large-scale patterns in nature as limiting distributions. Limiting distributions result from an aggregation of the results of numerous small scale processes (neutral processes) which fulfill constraints on the preservation of certain pieces of information. Such processes will damp out other fluctuations during sampling. The general mathematical approach to limiting distributions is based on entropy maximization under constraints. Constraints are mathematically included via Lagrangian multipliers. Both are relatively familiar concepts. The author explains which patterns result from which basic neutral processes.

However, the article also discusses an intimate relation between aggregation and convolutions. The author furthermore presents an related interesting analysis based on Fourier components and respective damping. For me this part was eye-opening.

The central limit theorem is explained for cases where a finite variance is preserved as the main information. But the author shows that Gaussian patterns are not the only patterns we may directly or indirectly find in the data of natural objects. To get a solid basis from a spectral point of view he extends his Fourier analysis to the occurrence of infinite variances and consequences for other spectral moments. Besides explaining (truncated) power law distributions he discusses aspects of extreme value distributions.

All in all the article provides very clear ideas and solid arguments why certain statistical patterns govern common distributions of natural objects’ properties. As ML people we should be aware of such distributions and their mathematical properties.