Wednesday, March 2, 2022

Variance, Attractors and Behavior of Chaotic Statistical Systems

 We study the properties of a typical chaotic system to derive general insights that apply to a large class of unusual statistical distributions. The purpose is to create a unified theory of these systems. These systems can be deterministic or random, yet due to their gentle chaotic nature, they exhibit the same behavior in both cases. They lead to new models with numerous applications in Fintech, cryptography, simulation and benchmarking tests of statistical hypotheses. They are also related to numeration systems. One of the highlights in this article is the discovery of a simple variance formula for an infinite sum of highly correlated random variables. We also try to find and characterize attractor distributions: these are  the limiting distributions for the systems in question, just like the Gaussian attractor is the universal attractor with finite variance in the central limit theorem framework. Each of these systems is governed by a specific functional equation, typically a stochastic integral equation whose solutions are the attractors. This equation helps establish many of their properties. The material discussed here is state-of-the-art and original, yet presented in a format accessible to professionals with limited exposure to statistical science. Physicists, statisticians, data scientists and people interested in signal processing, chaos modeling, or dynamical systems will find this article particularly interesting. Connection to other similar chaotic systems is also discussed. 

1. The Geometric System: Definition and Properties

We consider the infinite series 

where X(1), X(2) and so on are independently and identically distributed random variables. We use the notation X to represent any of these random variables. Let F denotes the distribution function. This system satisfies the following functional equation:

The k-th moment of Z thus satisfies

This can be re-written as 

The latter formula can be used to compute all the moments recursively. In particular,

Note that to have convergence, we need |E(X)| < 1. We also assume here that |E(X^2)| < 1, so that the variance exists.

1.1. Parameter estimation and test for independence

It is pretty amazing that we were able to establish a formula for the variance, despite the fact that all the terms in the infinite series defining Z, are highly correlated. If the terms were independent, as in Z = X(1) + X(2) X(3) + X(4) X(5) X(6) + ..., then we would have Var(Z) = Var(X) / [ (1 - E(X^2)) (1 - E^2(X)) ] instead. This leads to a new test to check if the terms in the series in question are independent or comes from the original model Z = X(1) + X(1) X(2) + X(1) X(2) X(3) + ... The statistic for the test is T = Var(Z) (1 - E(X^2)) (1 - E^2(X)) / Var(X), computed on the data set. It is expected to be equal to 1 only in case of independence, or if (1 - E(X))^2 = 1 - E^2(X) . The latter can happen only if E(X) = 0 (resulting in E(Z) =0) or if E(X) = 1. But E(X) = 1 can be excluded since |E(X)| < 1. Here the notation E^2(X) means E(X) at power two.

To estimate the mean and variance of X if Z = X(1) + X(1) X(2) + X(1) X(2) X(3) + ...,  assuming you only observe Z, and X is an hidden variable, use the following formulas: 

A consequence of this formula is that E(Z) > -1/2. For details, see here

1.2. Connection to the Fixed-Point Theorem

Let Z(k) = X(k) + X(kX(k+1) + X(kX(k+1) X(k+2) + ...We have Z(k) = X(k) (1 + Z(k+1)). As k tends to infinity, Z(k) tends to ZThe convergence is in distribution. So at the limit, Z and X(1 + Z) have the same distribution. Also Z(k) is independent of Z(k+1). In other words, the distribution of Z  is a fixed-point of the backward stochastic recurrence equation Z(k) = X(k) (1 + Z(k+1)). Solving for Z amounts to solving a stochastic recurrence equation.

2. Geometric and Uniform Attractors

We focus here on special cases, where X has a discrete distribution. In particular, we prove the following:

  • Z has a geometric distribution if and only if X has a Bernouilli distribution.
  • Z has a uniform distribution on [-1, 1] if and only if X has the following distribution: P(X = -0.5) = P(X = 0.5) = 0.5.
  • Z can not have an arbitrary distribution.

2.1. General formula

Here X and Z are discrete. Using the notation

we have, based on the definition of Z:

2.2. The geometric attractor

It is easy to directly prove, using the definition of Z, that if X is Bernouilli(p), then Z is Geometric with parameter 1 - p, that is P(Z = k) = p (1 - p)^k, for k = 0, 1, and so on.  Likewise, if Z is geometric, then X must be Bernouilli. To prove this, using the notations of section 2.1, let us assume that P(Z = k) = q(k) = q(0) (1 - q(0))^k, and that P(X = k) = p(k), for k = 0, 1, and so on. The equation p(1)p(0) = q(1) = q(0) (1 - q(0)) combined with p(0) = q(0) yields p(1) = 1 - q(0).  As a result, p(0) + p(1) = q(0) + (1 - q(0)) = 1. Thus if k > 1, then P(X = k) = p(k) = 0. This corresponds to a Bernouilli distribution for X

2.3. Not any distribution can be an attractor

In the Central Limit Theorem (CLT) framework, by far the main attractor is the Gaussian attractor. The few other ones are all stable distributions with infinite variance. Here we have far more attractors, for instance the geometric distribution. Yet not any distribution can be an attractor. For instance, if Z and X have a discrete distribution, using the formulas in section 2.1, we must have p(1) = q(1) / p(0) = q(1) / q(0). Thus we must have q(1) ≤ q(0) for Z to be an attractor. In short any discrete distribution with P(Z = 0)  <  P(Z = 1) can not be an attractor. Also, any distribution (discrete or continuous) with E(Z) < -1/2 can not be an attractor: see section 1.1 for the explanation. 

We also found in section 2.2 that there is only one distribution for X that leads to the geometric distribution for Z. Just like in the CLT framework, there is only distribution for X leading to the Cauchy attractor: X must be Cauchy itself. This raises an interesting question: can two different distributions (for X) lead to the same attractor? The answer appears to be negative here (in contrast with the CLT framework), but I haven't proved it. In section 2.4. we show that there is only one possible distribution for X, leading to the uniform attractor. 

2.4. The uniform attractor

If X is such that P(X = -0.5) = P(X = 0.5) = 0.5, then Z is uniform on [-1, 1]. This is easy to prove using the formula for the moments, in section 1. In particular, E(Z^k) = 1 / (k + 1) if k is odd, and 0 otherwise. Also, E[(1 + Z)^k] = 2^k / (k + 1). We also have E(X^k) = 1 / 2^k if k is odd, and 0 otherwise. Thus the only way to satisfy E(Z^k) = E(X^k) E[(1 + Z)^k] is if all the moments of Z are those of a uniform distribution on [-1, 1]. Thus Z has the prescribed distribution. 

To prove the converse, note that in order for Z to be uniform on [−1,1] then we must have E(X^k) = 1 / 2^k if k is odd, 0 otherwise. The only distribution having these moments is the distribution discussed at the very beginning of section 2.4.

3. Discrete X Resulting in a Gaussian-looking Attractor

In section 2.2, we saw that if X is Bernouilli, then Z has a discrete distribution. By slightly modifying the Bernouilli distribution in section 2.4, we obtained a continuous uniform distribution for Z. Here we consider another discrete distribution for X, not very different from a Bernouilli, to obtain a continuous, Gaussian-looking distribution for Z. Specifically, we work with X defined by

P(X = -1 ) = P(X = -0.5) = P(X = 0.5) = P(X = 1) = 0.25.

The functional equation in section 1 can be re-written as

Here f represents the continuous density attached to Z. It satisfies f(z) = f(-z) and f(0) = f(1) = f(-1). We also have

The empirical percentile distribution attached to Z is pictured below.

Figure 1Z percentiles if P(X = -1 ) = P(X = -0.5) = P(X = 0.5) = P(X = 1) = 0.25

Despite the appearances, it is possible that the distribution of Z, though visually very smooth, may be differentiable nowhere, and that the density, at least in classical terms, does not exist. I conjecture that it does in this example. For instance, in a similar stochastic system based on infinite nested square roots, it was found that some attractors (the limiting distributions) are almost smooth but with a distribution differentiable nowhere, despite being continuous mostly everywhere: see here for details. This is especially the case when the seed distribution (X in this case) is discrete, particularly when it takes only on a finite number of values. In some cases, even the support domain for Z has gaps, sometimes visible, large ones: see here. However, here the distribution and its support domain look well behaved, and even very well approximated by a uniform distribution for z between -1 and 1.  

Similar distributions have been analyzed by David Bailey in his book Experimental Mathematics in Action, published in 2007. In particular, sections 5.2 and 5.3 (pages 114-137) are very relevant to this context. One of the densities he has studied, namely 2qf(z) = f((z-1) / q) + f((z+1) / q), is very similar to the one I studied in my article A Strange Family of Statistical Distributions, and the functional equation for f is somewhat similar to the one discussed in this section. All these problems end up in attractors and functional equations like the one discussed here.

3.1. Towards a numerical solution

A possible way to find a numerical solution is as follows. Rather than focusing on the density f, we focus on its distribution. Build a sequence of distributions that are piecewise uniform on the support domain, starting in this particular case with

At iteration n > 1, the approximated solution (distribution) is piecewise linear on n disjoint contiguous intervals (these intervals eventually cover all the real numbers as n tends to infinity). Its value is always between 0 and 1, and it must be a strictly increasing function. It is chosen to minimize an error criterion, defined as

In short, you want the n-th iterate to be as close as possible to solving the theoretical functional equation of this system, while being piecewise uniform on n domains, and that successive iterations reduce the error, eventually to zero as n tends to infinity. Here X has the distribution specified at the beginning of section 4, and g(XZ) = X (1 + Z) is at the core of the functional equation defined at the very beginning of section 1. Of course this assumes that the solution satisfying these constraints is unique. It also assumes that the algorithm in question converges to the solution. Techniques about how to solve this problem are described in my article New Perspectives on Statistical Distributions and Deep Learning.

4. Special Cases with Continuous Distribution for X

This section can be skipped. It features two interesting exercises, with solution, to help the reader dig deeper into the material presented so far.

4.1. An almost perfect equality

Let X = sin(πY) with Y being Normal(0, 1). Simulate Z, and estimate its variance based on 20,000 deviates. Prove that Var(Z) is not equal to 1, despite the very strong empirical evidence.

Solution

For simulations, you can use the techniques offered here. Note that E(X) = 0, thus Var(Z) = E(X^2) / (1 - E(X^2)).  Also,

Thus Var(Z) = 1.00000020... and it is different from 1. For the details of the computation, see here

4.2. Is the log-normal distribution an attractor?

Using simulated values of Z based on X being log-normal, show that Z looks almost like log-normal too. However, it is just a good approximation, but not an exact solution. Prove that Z is not log-normal. 

Solution

For simulations, you can use the techniques offered here. I solved this exercise using X = exp(Y) / 5, with Y being Normal(0, 1). Thus X is log-normal and has the following moments (see here): E(X^k) = exp(k^2 / 2) / 5^k, for k = 0, 1, 2, and so on. Since E(Z^k) = E(X^k) E[ (1 + Z)^], it is possible to compute the exact value of E(Z^k). The approximate values are

These values are obtained using the following formulas:

Now let us assume that Z is log-normal. We must have E(Z^k) = exp(  + 0.5 ()^2 ) for some parameters μ and σ. Solving for exp( μ + 0.5 σ^2 ) = 0.4919678... and exp( 2μ + 0.5 (2σ)^2 ) = 0.8324035... yields μ = -1.3269649... and 0.5 σ^2 = 0.61762300...

Now E(Z^3) = exp( 3μ + 0.5 (3σ)^2 ) = 4.8438607... This is very different from the value computed earlier (namely, E(Z^3) = 12.7967051...) and thus we must conclude that Z can not be log-normal.

5. Connection to Binary Digits and Singular Distributions

Here we explore cases that are far more chaotic than the well behaved examples of sections 2 and 3. These cases have a common denominator with other chaotic statistical systems, and we explore some of these connections.

5.1. Numbers made up of random digits

Let's revisit the case discussed in section 2.4, with P(X = -0.5) = P(X = 0.5) = 0.5. We proved that the attractor Z has a uniform distribution on [-1, 1]. Now, let

Here the B(k)'s are independent Bernouilli variables with parameter p, that is, P(B(k) = 1) = p. Note that the B(k)'s are the binary digits of a random number 1 + Z in [0, 2]. Then we have

Conversely, for k ≥ 0, we have 

This establishes the reciprocity and one-to-one mapping between the binary numeration system and the representation of Z introduced at the beginning of section 1. If you pick up a number Z at random, its binary digits are independently and identically distributed with a Bernoulli distribution of parameter p = 0.5. Such numbers, called normal numbers,  are thus uniformly distributed, as the main result in section 2.4 suggests. Non-normal numbers are extremely rare: the set of non-normal numbers has Lebesgue measure zero. Examples include the case when p ≠ 1/2. We explore this case in the next section. 

5.2. Singular distributions

In most cases, if X is discrete, then Z's distribution is nowhere differentiable. A typical example is the following: if P(X = -0.5) =0.25 and P(X = 0.5) = 0.75, then see below the percentile distribution for Z, it's clearly corresponds to a singular distribution on [-1, 1].  

Figure 2Z percentiles if P(X = -0.5) =0.25 and P(X = 0.5) = 0.75

This is the same type of distribution as the one featured in section 1 in this article, associated with a random number that does not have 50% of its digits equal to 1 in base 2. 

In such cases where the distribution of Z is difficult to identify, one might use the Mellin transform M(Z) to solve the functional equation, if both X and Z are positive. Here we have M(Z) = M(X) M(1 + Z). Mellin transforms are similar to characteristic functions (themselves based on Fourier transforms). If X and Y are independent, then M(XY) = M(X) M(Y), while the characteristic function CF(X +  Y) is equal to the product CF(X) CF(Y). See also the Wikipedia article on the product distribution, here. However, as mentioned in section 3.2 in this article, neither the Mellin nor the Characteristic Function may exist in such irregular cases.

By contrast with Figure 2, see below the percentile distribution (corresponding to a uniform distribution for Z) if P(X = -0.5) = P(X = 0.5) = 0.5.

Figure 3:  Z percentiles if P(X = -0.5) = P(X = 0.5) = 0.5

5.3. Connection to Infinite Random Products

Here we discuss a totally different system, but closely related to our discussion in section 5.1. The purpose is to show how striking the similarities are, between what appears at first glance to be two unrelated systems.  

Every real number Z in [1, 2] can be represented as a product of distinct factors of the form 1 / 2^k, that is:

The algorithm to find the X(k)'s is a simple version of the greedy algorithmIt is also related to Feynman's algorithm, see here. If all the X(k)'s are equal to 1, then Z = 2.384231029... and 1 - 1/Z is Pell's constant: see herehere and here. A number can have two representations, a standard and a non-standard one, for instance:

More precisely:

The standard one is on the left-hand side. As usual, we use the notation X to represent a generic X(k), since those are also (by definition) all independently and identically distributed. Below is the distribution (CDF) of Z, in blue, if X is Bernouilli(0.5). The red curve represents its approximation by a logarithmic function.

Figure 4Z's CDF (blue) versus log approximation (red) for Bernouill(0.5) infinite products

The approximation for Z's CDF in Figure 4 is equal to (log z) / (log λ), and the support domain is [1, λ] where λ = 2.384231029... is the constant discussed earlier in section 5.3. Now this is becoming very interesting: compare this chart with the one obtained for continued radicals in section 2.2 in this article. How similar! Below is the error term, that is, the difference between the log approximation and the exact CDF. 

Figure 5Difference between the log approximation (red curve) and true CDF (blue curve) in Figure 4

Below is the percentile distribution if this time P(X = 0) = 5/6 and P(X = 1) = 1/6. 

Figure 6Percentile distribution, but this time P(X = 0) = 5/6 and P(X = 1) = 1/6

Figure 6 shows the exact same type of distribution as in figure 2, despite the fact that the two systems are unrelated. Now, if P(X = 0) = P(X = 1) = P(X = 2) = P(X = 3) = 0.25, then the percentile distribution for Z looks very smooth, and its inverse S shape (not displayed here) is very similar to that in Figure 1.

6. A General Classification of Chaotic Statistical Distributions

Over the last few years, I have analyzed many systems similar to the ones discussed here. A summary table can be found here. Many but not all are related to numeration systems. For details see my book Applied Stochastic Processes published in 2018, as well as Appendix B (Stochastic Processes and Organized Chaos) in my book  Statistics: New Foundations, Toolbox, and Machine Learning Recipes published in 2019. See also the following articles:

I am currently studying a new system, based on the discrete difference operator. You can see my research progress here and here.  It will soon be published in a new article, and all these articles will soon be turned into a new book. I have discovered that all these systems share a number of common features: attractor distributions, functional equations, chaotic statistical distributions, fractal behavior, and auto-correlations (long range or quickly decaying). Below is a first attempt at classifying the various types of attractors found in these systems. 

Figure 7very wild percentile distribution related to the infinite, scaled difference operator (source: here)

In many cases where Z has a continuous distribution, it looks like the limiting distribution can sometimes be normal, or at least very smooth. If Z has a discrete distribution, we have several possibilities, see below. 

  • The limiting distribution can be normal or some other regular smooth distribution, either well known or not
  • The limiting distribution can look very smooth yet be differentiable nowhere, and yet very close to normal or related distribution (see Figure 1) 
  • The limiting distribution can look very un-smooth
  • The limiting distribution can be very peculiar, as in Figure 6, yet sometimes computable
  • The limiting distribution can be a piecewise linear mix of uniform distributions or some other weird mix like a staircase (see here
  • The limiting distribution can be very wild, totally chaotic (see Figure 7 and also here) as if it was a Fourier series with a bunch of missing terms, not unlike the Weierstrass function.

In some of the chaotic cases, the distribution has a fractal behavior (see section 2.2 in this article.)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Fuzzy Regression: A Generic, Model-free, Math-free Machine Learning Technique

  A different way to do regression with prediction intervals. In Python and without math. No calculus, no matrix algebra, no statistical eng...