Vincent Granville Articles and Books: February 2022

Tuesday, February 22, 2022

Bernoulli Lattice Models - Connection to Poisson Processes

Bernouilli lattice processes may be one of the simplest examples of point processes, and can be used as an introduction to learn about more complex spatial processes that rely on advanced measure theory for their definition. In this article, we show the differences and analogies between Bernouilli lattice processes on the standard rectangular or hexagonal grid, and the Poisson process, including convergence of discrete lattice processes to continuous Poisson process, mainly in two dimensions. We also illustrate that even though these lattice processes are purely random, they don't look random when seen with the naked eye.

We discuss basic properties such as the distribution of the number of points in any given area, or the distribution of the distance to the nearest neighbor. Bernouilli lattice processes have been used as models in financial problems, see here. Most of the papers on this topic are hard to read, but here we discuss the concepts in simple English. Interesting number theory problems about sums of squares, deeply related to these lattice processes, are also discussed. Finally, we show how to identify if a particular realization is from a Bernouilli lattice process, a Poisson process, or a combination of both.

See below a realization of a Bernouilli process on the regular hexagonal lattice. The main feature of such a process is that the point locations are fixed, not random. But whether a point is "fired" or not (that is, marked in blue) is purely random and independent of whether any other point is fired or not. The probability for a point to be fired is a Bernouilly variable of parameter p.

Figure 1: realization of Bernouilli hexagonal lattice process

The source code to produce this plot is available here. More sophisticated models, known as Markov random fields, allows for neighboring points to be correlated. They are useful in image analysis, see here and here.

To the contrary, Poisson processes assume that the point locations are random. The points being fired are uniformly distributed on the plane, and not restricted to integer or grid coordinates. In short, Bernouilli lattice processes are discrete approximations to Poisson processes. Below is an example of a realization of a Poisson process.

Figure 2: realization of a Poisson process

1. Definition and Convergence to Poisson Process

We are dealing here with square lattices covering the entire two-dimensional plane. A point of the lattice is a vector (rx, ry) where x, y are integers (positive or negative) and r a strictly positive real number, called the scale of the lattice. A Bernouilli variable is attached to each point: it is equal to 1 with probability p, and to 0 with probability 1 - p. A point of the lattice is said to be fired if its Bernouilli variable is equal to 1. Such points are marked in blue in all the plots. The Bernouilli variables are independent and have the same parameter p: in other words, this 2-parameter lattice process is homogeneous, as p does not depend on the location.

We compare these processes to homogeneous Poisson process of intensity λ, covering the entire plane. For Poisson processes, point locations are random, see figure 2. The number of points in any given area D has a Poisson distribution with mean λ S(D), where S represents the surface of the area. The distance to the nearest neighbor has the following distribution, see here:

1.1. Poisson approximation to Bernouilli lattice process

In the same way that Bernouilli variables are approximated by Poisson variables when p is small, Bernouilli processes are approximated by Poisson processes when p is small. The parameters p and λ play the same role.

The way convergence occurs is very intuitive. Start with a Bernouilli lattice process as pictured in figure 3, with r = 1. Reduce r by a factor 2 and p by a factor 4: see result in figure 4. Keep repeating this step indefinitely, and you end up with figure 2.

Figure 3: Realization of a Bernouilli square lattice process

Figure 4: A more granular version, with smaller r and p

2. Properties of Bernouilli lattice processes

As for Poisson processes, two main features are:

the distribution between a point of the lattice and its closest "blue" neighbor
the distribution of the number of blue points in any given area.

The number N of blue points in any given area D is a Binomial variable with expectation p S(D), with S(D) being the total number of lattice points (blue or not) in D. That is,

For any lattice point z, let Y be the distance to the closest blue neighbor. Let D be a circle centered at z, with radius y. The probability that Y is larger than y is the probability that there is no blue point in D, that is, based on the previous formula:

Note that if z is a blue point, it is not considered to be a nearest neighbor to itself.

2.1. More about the Poisson approximation

When r and p get smaller and smaller as discussed in section 1.1, we have convergence to the Poisson process, and S(D) in the above formula (for Bernouilli lattice processes) tends to the surface area of D, that is

Likewise, for the Poisson process, based on the first formula in section 1, when λ is small, we have

Now the analogy with Poisson processes becomes more clear. Also, p and λ are equivalent, at the limit. Below is the distribution of Y for a Bernouilli lattice process with p = 0.001.

Figure 5: Distribution of distance to nearest neighbor: P(Y < y)

It is indistinguishable from that of a Poisson process with λ = 0.001. There is a major difference though: the distribution is a discrete one for the lattice process, and a continuous one for the Poisson process, but you can't tell the difference with the naked eye, for such a small value of p.

2.2. Model testing

Since Poisson and Bernouilli lattice processes can be quite similar depending on the granularity of the lattice, one might be interested in testing if some observed data fits with either model. An easy test involves computing the empirical distribution of the distance to the nearest neighbor, and check whether it fits the Bernouilli lattice or Poisson theoretical distribution.

For blended models, say a mixture of Poisson and Bernouilli lattice, one can do simulations. Simulations will also allow you to estimate the proportions of Bernouilli versus Poisson in the mixture model.

2.3. Connection to number theory: sums of two squares

An interesting result that can easily be derived from the formulas in section 2.1 is the following. The number of (x, y) solution to x^2 + y^2 ≤ n, where n is a positive integer and the unknown x, y are positive or negative integers, tends to π n as n tends to infinity. This is known as the Gauss circle problem. So, on average, the equation x^2 + y^2 = n has π solutions, though for some n, for instance n = 1105, it can be much higher (32 in this case). Source code to compute the number of solutions is available here. More generally, the number of integer solutions to such inequalities can be approximated by the surface area under the curve using integration techniques, the curve here being x^2 + y^2 ≤ n.

A deeper result is the following. Let r(n) be the number of solutions to x^2 + y^2 = n. This function is known as the sum of squares function. For k between n and 2n, we have

integers such that r(k) > 0, based on this well known result, where c = 0.7642... is the Landau-Ramanujan constant. Also, based on the Gauss circle problem, the total number of solutions across all k's between n and 2n is asymptotically 2π n − π n = π n. These π n solutions are distributed across cn / SQRT(log n) integers satisfying r(k) > 0. This means that for large n's satisfying r(n) > 0, the average value of r(n) grows as π SQRT(log n) / c. A consequence of this is that the records for r(n) grow faster than π SQRT(log n) / c. How much faster?

Technical note: Sums of squares are far more abundant than primes, yet their natural density is also zero. This means that as n becomes larger and larger, the chance for n to be a sum of two squares, tends to zero. But for the few remaining n's that can be expressed as a sum of two squares, the number of ways they can be expressed that way, increases more and more on average. That is, r(n) tends to increase with n, for those few large n satisfying r(n) > 0. As a result, for any n large or small, the number of solutions to x^2 + y^2 = n is equal to π on average. What changes (it increases) when n is large is the probability that r(n) = 0, but the average value of r(n) remains unchanged.

More about sums of squares in the context of number theory, can be found here and here. See also this post and this article.

Sunday, February 20, 2022

Fascinating New Results in the Theory of Randomness

Updated on March 24. See new sections on Fibonacci numbers [3.2.(b)], comparing stochastic processes [4.1.(b)], connection with Brownian motions [4.1.(c)], new material in section 4.3, and new addition in the Appendix [5.4].

I present here some innovative results from my most recent research on stochastic processes. chaos modeling, and dynamical systems, with applications to Fintech, cryptography, number theory, and random number generators. While covering advanced topics, this article is accessible to professionals with limited knowledge in statistical or mathematical theory. It introduces new material not covered in my recent book (available here) on applied stochastic processes. You don't need to read my book to understand this article, but the book is a nice complement and introduction to the concepts discussed here.

None of the material presented here is covered in standard textbooks on stochastic processes or dynamical systems. In particular, it has nothing to do with the classical logistic map or Brownian motions, though the systems investigated here exhibit very similar behaviors and are related to the classical models. This cross-disciplinary article is targeted to professionals with interests in statistics, probability, mathematics, machine learning, simulations, signal processing, operations research, computer science, pattern recognition, and physics. Because of its tutorial style, it should also appeal to beginners learning about Markov processes, time series, and data science techniques in general, offering fresh, off-the-beaten-path content not found anywhere else, contrasting with the material covered again and again in countless, identical books, websites, and classes catering to students and researchers alike.

Some problems discussed here could be used by college professors in the classroom, or as original exam questions, while others are extremely challenging questions that could be the subject of a PhD thesis or even well beyond that level. This article constitutes (along with my book) a stepping stone in my endeavor to solve one of the biggest mysteries in the universe: are the digits of mathematical constants such as Pi, evenly distributed? To this day, no one knows if these digits even have a distribution to start with, let alone whether that distribution is uniform or not. Part of the discussion is about statistical properties of numeration systems in a non-integer base (such as the golden ratio base) and its applications. All systems investigated here, whether deterministic or not, are treated as stochastic processes, including the digits in question. They all exhibit strong chaos, albeit easily manageable due to their ergodicity. .

Interesting connections to the golden ratio, Fibonacci numbers, Pisano periods, special polynomials, Brownian motions, and other special mathematical constants, are discussed throughout the article. All the analyses were done in Excel. You can download my spreadsheets from this article; all the results are replicable. Also, numerous illustrations are provided.

Content of this article

1. General framework, notations and terminology

Finding the equilibrium distribution
Auto-correlation and spectral analysis
Ergodicity, convergence, and attractors
Space state, time state, and Markov chain approximations
Examples

2. Case study

First fundamental theorem
Second fundamental theorem
Convergence to equilibrium: illustration

3. Applications

Potential application domains
Example: the golden ratio process
Finding other useful b-processes

4. Additional research topics

Perfect stochastic processes and Brownian motions
Characterization of equilibrium distributions (the attractors)
Probabilistic calculus and number theory, special integrals

5. Appendix

Computing the auto-correlation at equilibrium
Proof of the first fundamental theorem
How to find the exact equilibrium distribution
Perfect process with no auto-correlation

6. Additional Resources 1. General framework, notations and terminology

We are dealing here with sequences { x(n) }, starting with n = 1, recursively defined by an iterative formula x(n + 1) = g(x(n)). We will explore various functions g in the next sections. Typically, x(n) is a real number in [0, 1], and g is a mapping such that x(n+1) = g(x(n)) is also in [0, 1]. The first, value, x(1), is called the seed. In short, { x(n) } is a time series or stochastic process, and the index n denotes the (discrete) time or iteration.

Typically, the values x(n) appear to be distributed somewhat randomly, according to some statistical distribution called the equilibrium distribution, and generally, the x(n)'s are auto-correlated. So x(n) can be seen as a realization or observation of a random variable X, whose distribution is the equilibrium distribution. That is, the empirical distribution of the x(n)'s, when computed on a large number of terms, tends to the theoretical equilibrium distribution in question. Also, in practice, the vast majority of seeds yield the same exact equilibrium distribution. Such seeds are known as good seeds, the other ones are called bad seeds.

1.1. Finding the equilibrium distribution

The equilibrium distribution can be obtained by solving the equation P(X < y) = P(g(X) < y) with y in [0, 1]. This is actually a stochastic integral equation: the probability distribution P is the solution, and corresponds to the distribution of X. This distribution is sometimes denoted as F. Whether the equilibrium distribution exists or not, and whether it is unique or not (for good seeds), is not discussed here. However, we will provide several examples with unique equilibrium distribution, throughout this article, including how to solve the stochastic integral equation. The density attached to the equilibrium distribution is called the equilibrium density and is denoted as f.

1.2. Auto-correlation and spectral analysis

The theoretical auto-correlation between successive values of x(n) can be computed as follows:

Once computed, it is interesting to compare its value to the observed auto-correlation measured on the first few thousand terms of the sequence { x(n) }. Longer-term auto-correlations (lag-2, lag-3 and so on) can be computed using the same principle, either theoretically or empirically (on data). The entire auto-correlation structure, given the equilibrium distribution, uniquely characterizes the stochastic process. The study of these auto-correlations is called spectral analysis.

1.3. Ergodicity, convergence, and attractors

So far we have looked at one instance or realization { x(n) } of the underlying process, characterized by a mapping g and a seed x(1). This provides enough information to determine the auto-correlation structure and equilibrium distribution, which do not depend on the good seed.

There is another way to look at things. You can simulate m deviates of a random variable Z(1) with any pre-specified distribution, say uniform on [0, 1]. Then apply the mapping g to each of these deviates, to obtain another set of m values. These new values are m deviates of a random variable denoted as Z(2), also with known statistical distribution. Repeat this step over and over, to obtain Z(3), Z(4), and so on. For large n, Z(n) converges in distribution to the equilibrium distribution, regardless of the initial distribution chosen for Z(1). We illustrate how it works on an example, later in this article. Because convergence to the same equilibrium distribution occurs regardless of the initial distribution, the equilibrium distribution, in the language of dynamical systems, is called an attractor distribution. The method described here can be used to identify these attractors.

A stochastic process where the equilibrium distribution does not depend on Z(1) nor on the good seed, is called ergodic. All the processes studied here are ergodic. An example of non-ergodic process can be found here.

1.4. Space state, time state, and Markov chain approximations

The space state is the space where { x(n) } takes its values; here it is [0, 1] and is thus continuous. The time space is attached to the index n, and it is discrete here. However, in some of our examples, x(n) can be written explicitly as a function of n, thus it can easily be extended to real values of n, making it a time-continuous process.

We can also divide the continuous space state [0, 1] into a finite number of disjoint intervals S(1), S(2), ... , S(k). Rather than studying the full equilibrium distribution, we could compute the probabilities

Then we are dealing with a state-discrete Markov chain with k states, and p(i, j) represents the transition probability for moving from state i to state j, estimated on n observations x(1), ..., x(n). One can compute the steady state probability vector, by solving a linear system of k equations; it represents the stable state of the system. As k and n tend to infinity and the intervals S(1), ..., S(k) become infinitesimal, the steady state probability vector converges to the equilibrium density. This is another way to approximate the equilibrium distribution.

1.5. Examples

The most well known examples of { x(n) } are random walks (n discrete) and Brownian motions (n continuous). However, since the space state considered in this article is [0, 1], a better suited example would be a random walk constrained to stay within [0, 1]. Such processes are discussed in my book, in chapter 3. Other less well known examples, but more relevant to this article, are also discussed in my book (chapters 7 to 12), and here, including the logistic map x(n+1) = 4 x(n) (1 - x(n)). One that I plan to study is x(n+1) = (sin(Pi x(n))^b, where b is a parameter between 0 and 1.

Let us now introduce the example that we will discuss in detail throughout the remaining of this article. It is defined with the following mapping:

Here the curly brackets represent the fractional part function, also denoted as FRAC. The straight brackets on the right-hand side, represent the integer part or floor function, also denoted as INT. Since the seed is between 0 and 1, we also have this interesting property: INT(b x(n)) is the n-th digit of the seed x(1), in base b. Thus the parameter b is called the base, and it can be any real number greater than 1. In the next section, we consider the case where b is in ]1, 2]. Non-integer bases are also discussed here and here, and also extensively in my book. While this process looks very peculiar, there is a mapping between the base-2 system, and the very popular logistic map system: see chapter 10 in my book for details, or here for a summary.

It makes sense to call this process the base-b process. If b is an integer, its equilibrium distribution is uniform on [0, 1] assuming you use a good seed. Also, if b is an integer, the auto-correlation between successive values of x(n) is 1 / b. This fact was probably mentioned for the first time, in my book. It was never proved, but assumed based on simulations and a lot of data crunching. The proof is actually quite simple and worth reading; it shows how to compute such auto-correlations, and constitutes an interesting classroom problem or exam question. You will find it in the appendix in this article.

Pretty much all numbers in [0, 1] are good seeds for the b-process. However, there are infinitely many exceptions: in particular, none of the rational numbers is a good seed. Identifying the class of good seeds is an incredibly complicated problem, still unsolved today. If we knew which numbers are good seeds, we would know whether or not the digits of Pi or any popular mathematical constant, are evenly distributed. Another question is whether or not a good seed is just a normal number, and conversely. The two concepts are closely related, and possibly identical. Later in this article, we will discuss a stochastic process where all seeds are good seeds.

Finally, the most interesting values of b are those that are less than two. In some ways, the associated stochastic processes are also much easier to study. But most interestingly, the similarities between these b-processes and stochastic dynamical systems, are easier to grasp, for instance regarding branching behavior, and attractors. This is the subject of the next section. The second fundamental theorem in the next section is one of the fascinating results published here for the first time, and still a work in progress.

Note that if b is an integer, it is easy to turn the time-discrete b-process into a time-continuous one. We have

Thus the formula can be extended to values of n that are not integers.

2. Case study

In this section, we consider the b-process introduced as an example in section 1.5, with b in ]1, 2]. Thus, x(n+1) = g(x(n)), with g(x) = bx - INT(bx), and x(1) is the seed. We now jump right away to the two fundamental theorems, and cool applications will follow afterwards.

2.1. First fundamental theorem

Let Z be a random variable with an arbitrary distribution F, admitting a density function f on [0, 1]. Let Y = g(Z) be the fractional part of bZ, and b in ]1, 2]. Then we have:

This result is easy to obtain and constitutes an interesting classroom problem, or exam question. The proof is in the appendix. This theorem allows you to design a simple iterative algorithm to approximate the equilibrium distribution, and to asses how fast it converges to the solution. The result is valid even if the density function of Z has an infinite but countable number of discontinuities. This will be the case in the examples discussed here, in which a uniform distribution on [0, 1] is chosen for Z.

The algorithm, already discussed in the first section (see the ergodicity, convergence and attractors subsection), consists in iteratively computing the distribution of g(Z), g(g(Z)), g(g(g(Z))), and so on, until the difference between two successive iterates is small enough. Here, the difference is measured as the distance between two distributions, using one of the many distance metrics discussed in the literature (see here.)

The next theorem tells you in more details what happens if you choose a uniform distribution on [0, 1], for Z. This was our favorite choice in most of our simulations.

2.2. Second fundamental theorem

We use the same assumptions as in the first theorem, but here Z has a uniform distribution on [0, 1]. The following theorem can be used to find the equilibrium density, as illustrated in the appendix, using the supergolden ratio constant for b.

Let Z(1) = Z, and Z(n+1) = g(Z(n)). Then Z(n+1) has a piece-wise uniform distribution, more precisely, a mixture of n+1 uniform distributions on n+1 intervals. These intervals are denoted as

[0, c(1)[, [c(1), c(2)[, [c(2), c(3)[, ... , [c(n), 1],

and the constant value of the density of Z(n+1) on the k-th interval (k = 1, ... , n +1) is denoted as d(k). The distribution of Z(n+1) has the following features:

Sometimes c(k-1) = c(k) depending on b, k, and n
b^n d(k) is an integer and { d(k) } is a decreasing sequence
c(k) is a polynomial of degree n in b, with coefficients equal to 0, 1, or -1
Only the dominant coefficient of this polynomial is equal to 1

It is convenient to use the notation c(0) = 0 and c(n+1) = 1. The c(k)'s, for k = 1, ... , n, are called the change points. A change point is thus a discontinuity in the density function. One of these change points is always equal to b - 1.

I haven't completed the proof of the second theorem yet, and the theorem itself can probably be further refined. However, using the first fundamental theorem, it is easy to see that when moving from iteration n to n+1, we observe the following:

Because b is smaller then 2 and Z(n+1) takes on value between 0 and 1, it is clear that Z(n+1), the fractional part of bZ(n), takes more frequently on smaller values (closer to 0) than on larger ones (closer to 1.) Thus the interval densities d(k) are highest next to zero, and lowest next to 1, and decreasing in between. This explains why { d(k) } is a decreasing sequence.
The densities are also constant on each interval, as we are only dealing with uniform densities, throughout the iterations. Also b^k d(k) must be an integer, as the formula in the first fundamental theorem only involves adding integers (divided by b). This is easy to prove by recursion.
Finally, at iteration n = 2, we have a single change point c(1) = b - 1, and two intervals. Any new iteration, because of the formula in the first fundamental theorem, creates a whole new set of new change points, each one either equal to cb or c(b - 1), where c is any change point from the current iteration. This explains the special polynomial expression for c(k).

For any value of n, the exact distribution of Z(n+1) can be computed explicitly. The computation is elementary, but becomes incredibly tedious (and should be automated using some software) as soon as n is larger than 5: this is a combinatorial problem. But particular results are easier to obtain. The simplest case is as follows:

Exercise: prove that this is actually a density, that is,

Other easy cases include the full solution (for any value of b between 1 and 2) when n = 2 or n =3. This is left as an exercise. Note that if for some finite value of n, Z(n) has the equilibrium density, then Z(n+1), Z(n+2) and so on also have that exact same density. This is why the equilibrium distribution is called an attractor.

2.3. Convergence to equilibrium: illustration

The first picture below illustrates the convergence of the empirical equilibrium densities to the theoretical solution, starting with a simulated uniform density on [0, 1] for Z(1), and computing the empirical densities for Z(2), Z(3), and so on, up to Z(7). You can check out the computations in this spreadsheet. The parameter b used here is the supergolden ratio constant (see next section) and we used 100,000 observations to estimate each density.

Below are a few equilibrium densities (approximated using the empirical density) for various values of b.

The spreadsheet used to produce the 4 above charts, with detailed computations, is available here. Some exact solutions (where the theoretical equilibrium density is easy to compute) are provided in the next section and in the appendix, with a short tutorial on how to discover these solutions and to apply the methodology to the general case (see appendix.).

3. Applications

In this section, we discuss applications, still focusing for the most part on b-processes with b smaller than 2. But we also discuss other stochastic processes.

3.1. Potential application domains

Stochastic processes or time series models are routinely used in Fintech to model the price of some commodities. Thus, b-processes offer a new tool for quants, behaving quite differently than traditional processes and time series. The variety in these b-processes is such, and the behavior so unique depending on b, that it allows the data scientist to attach a unique number to an observed time series: its estimated parameter b. Two different values of b provide two wildly different types of patterns, as is usually the case with all chaotic dynamical systems, for instance, with the logistic map. Whether in finance or other fields, these processes model situations in which an auto-correlated system evolves chaotically over time, with sharp drops every now and then in the equilibrium density, occurring at what we defined earlier as change points. Depending on b, the number of change points in [0, 1] can be 2, 3, 4, and so on, up to values so large that the equilibrium density looks perfectly smooth (this is the case, for instance if b is very close to 1.) Thus the parameter b can be chosen to fit with a wide array of change point locations, as well as various downward trends and gaps, in the equilibrium density. As discussed in section 2, the b-process can be seen as an infinite mixture of uniform distributions on infinitesimal intervals, or finite mixture on larger intervals, depending on b.

Other specific applications include:

Generation of non-periodic, continuous, replicable pseudo-random numbers. By far, the largest class of pseudo random number generators currently in use is made of periodic, discrete generators, though the period is extremely large in modern generators. And random numbers produced using physical devices are typically not replicable. To get a good generator, one would have to use a value of b resulting in a known equilibrium distribution, start with a good seed, and map the sequence { x(n) } so that the distribution of the mapped x(n)'s (each one representing a random number) becomes uniform, with no auto-correlation. How to do this is described in the next sub-section about the golden ratio process.
Thus, with a careful choice of b and proper mapping, b-processes can be used in cryptographic systems and Blockchain. Digits produced by b-processes, and defined as a(n) = INT(b x(n)), have the following particular property. The digits are binary (equal to 0 or 1), so each digit can be called a bit, using the language of information theory. Indeed, when b = 2, this is just the standard base-2 numeration system that we are all familiar with. However, when b is smaller than 2, each digit carries an amount of information smaller than the standard bit. Indeed, that amount is equal to the logarithm of b in base 2 (and of course, equal to 0 if b = 1.) So not only we invented a unit of information that is smaller than the smallest unit currently in use, but it allows you to create encryption systems filled with some amount of natural blurring, which may or may not be useful depending on the purpose.
Another application is to benchmark computer systems, testing for accuracy when performing heavy computations that require a large number of decimals. If you compute the successive values of x(1), x(2) and so on up to x(n), all your numbers will be completely wrong once n is larger than 45. You may not notice it initially, but try in Excel with a base b that is an even integer: it will become very obvious! Sometimes it does not matter (for instance when studying asymptotic properties such as auto-correlations or the equilibrium distribution) because b-processes are ergodic, and sometimes it matters. This is discussed in detail in my book, available here: see the chapters about high precision computing, or read this article.
Also, b-process can be used to benchmark and test the power of statistical tests, the sample size needed, and other statistical procedures. Since the "observations" { x(n) } have a known statistical distribution (depending on b, see next subsection about the golden ratio process) -- a property never found in actual, real life data sets -- you can test a number of hypotheses (for instance about the value of some auto-correlation), and check when your statistical tests provide the right or wrong answer. Here, the right answer is known in advance!
Another application is to design a lottery, where the winning number is a sequence of digits generated after re-mapping a b-process so that its associated digits are uniformly distributed, and with no auto-correlation. See the golden ratio process below about how to do this re-mapping. Use digits in positions n = 1,001 to 1,020 the first week, 1,021 to 1,040 the second week, and so on. The advantage of such a lottery is that the winning numbers are known in advance yet unpredictable unless you know the secret base b, the secret seed x(1), and the secret starting point n = 1,001. So it can be labeled as a game of skills rather than a game of chance, and not subject to lottery laws. Another example of such a "lottery" is described here. See also my new stock trading and lottery game (number guessing).

3.2. Example: the golden ratio process

The golden ratio process, as its name indicates, corresponds to b = (1 + SQRT(5)) / 2. Its associated numeration system, with INT(b x(n)) representing the n-th digit of the seed x(1) in base b, has been studied in some details, see here. This b-process is the best behaved one, and the easiest to study, besides b = 2. It is probably the only b-process with exactly one change point, and its equilibrium distribution is known (probably published here for the first time.) Thus, it is a good candidate for applications such as encryption, random number generation, model fitting, testing statistical tests, or Blockchain.

(a) Properties and use in cryptography

Using the notations introduced in section 2, this process has the following features:

The unique change point is c(1) = b - 1
The equilibrium distribution is a mixture of two uniform distributions: one on [0, c(1)[, and one on [c(1), 1[
At equilibrium, the two respective densities are d(1) = (5 + 3*SQRT(5)) / 10 and d(2) = (5 + SQRT(5)) / 10.

Below is a picture of the equilibrium density associated with this process:

In order to make this process suitable for use in cryptography, one has to map { x(n) } onto a new sequence { y(n) }, so that the new equilibrium density becomes uniform on [0, 1]. This is achieved as follows:

If x(n) < b -1, then y(n) = x(n) / (b - 1) else y(n) = (x(n) - (b - 1)) / (2 - b).

Now the { y(n) } sequence has a uniform equilibrium distribution on [0, 1]. However, this new sequence has a major problem: high auto-correlations, and frequently, two or three successive values that are identical (this would not happen with a random b, but here b is the golden ratio -- a very special value -- and this is what is causing the problem.)

A workaround is to ignore all values of x(n) that are larger than b - 1, that is, discarding y(n) if x(n) is larger than b -1. This is really a magic trick. Now, not only the lag-1 auto-correlation in the remaining { y(n) } sequence is equal to 1/2, the same value as for the full { x(n) } sequence with b = 2, but the lag-1 auto-correlation in the remaining sequence of binary digits (digits are defined as INT(b y(n)) is also equal to zero, just like for ordinary digits in base 2! The proof of these facts is left as an exercise.

(b) Bad seeds and connection to Fibonacci numbers

Fibonacci numbers are defined by the recursion F(n + 1) = F(n) + F(n - 1), with F(1) = F(2) = 1. Also, F(n) = INT(x b^n) +1 if n is odd, and F(n) = INT(x b^n) otherwise, with x = 1 / SQRT(5). Thus they are related to the golden ratio process with b = (1 + SQRT(5)) / 2 and the seed x = x(1) = 1 / SQRT(5). That seed is actually a bad seed, resulting in periodicity: x(n + 4) = x(n).

But the link to bad seeds of the golden ratio process does not stop here. All fractions 1 / k, with k a positive integer, are also bad seeds, resulting in periodicity in { x(n) }. Here we compare these periods with those (well studied) of Fibonacci numbers modulo k, known as Pisano periods. If you are not familiar with elementary modulo arithmetic, you can check the Wikipedia page on the subject, here.

I only tested a few values of k, but in all cases, both periods were identical.

3.3. Finding other useful b-processes

There are different ways to compute the equilibrium distribution of a b-process when you have 3 change points or less. Finding the change points is easy: one of them is always b - 1, and the other ones can be any of these:

You can identify them by visual inspection of the empirical equilibrium density. And among the 8 potential change points listed above, you must ignore those below 0 or above 1. Note that the golden ratio process actually has two change points: b^2 - b and b - 1. But b^2 - b = 1 in this case, thus the first one is not a real change point. If you try with b = 1.61 (very close to the golden ratio) this ghost change point is now visible, and it is very close to 1. If you try b = 1.60, you now have 3 change points. And with b = 1.59, the empirical equilibrium density looks entirely different, possibly with a lot of change points and no visible drop (just a linear, bumpy downward trend), though it is hard to tell. In some cases, a change point can be double (or triple) for instance if b^3 - b^2 - b = b - 1. It typically results in an equilibrium density with very few change points. Once the change points are known, the densities can be computed using the first fundamental theorem (see section 2 in this article) and solving a linear system of equations. This is illustrated in the appendix.

Interestingly, the b-processes most likely to have a simple equilibrium density with very few change points, correspond to b's for which two of the above polynomials have the exact same value, or one is equal to 0 or 1, when evaluated for the b in question. This was the case for the golden ratio process. Below are other examples:

b^3 - b^2 = 1 yields b = 1.4655712318767... (3 change points, b is the supergolden ratio)
b^3 - b^2 - b = b - 1 yields b = 1.8019377358048... (3 change points)
b^5 - b^4 = 1 or b^3 - b = 1 yields b = 1.32471795724475... (4 change points, b is the plastic number)

You can find more about these three special constants, in this article. The exact values, respectively for the supergolden ratio and the plastic number, are

To get an approximation of the equilibrium distribution and see the change points, start with a good seed x(1). For whatever reasons x(1) = SQRT(2)/2 works better than Pi/4. Then compute x(n) up to N = 1,000,000, then plot the empirical equilibrium density computed on the x(n)'s. This is illustrated in my spreadsheet, available here. See also the picture below based on values of x(n) for n =1, ...,300,000, with b being the plastic number.

As a general rule, the lower the value of b, the more change points, on average. Also, most values of b (whether special or not) always produce a few major change points (and frequently a large number of minor ones), with big drops in the density function occurring at the major change points. Analyzing the polynomials discussed in the second fundamental theorem, can help you identify these major change points.

In the appendix, we completely solve the case where b is the supergolden ratio.

4. Additional research topics

Here we discuss three potential topics for future research: stochastic processes free of bad seeds, the asymptotic properties of attractors and the construction of a table of attractors summarizing their features, and finally, some applications of b-processes to probabilistic and experimental number theory, including the discussion of some special integrals.

4.1. Perfect stochastic processes and Brownian motion

The b-process, defined by g(x) = bx - INT(bx), has bad seeds, as discussed earlier. For a b-process, the vast majority of seeds are good seeds (the set of bad seeds actually has Lebesgue measure zero), but nobody knows if mathematical constants such as PI or SQRT(2) are good or bad seeds. Are there any stochastic processes free of bad seeds? Such processes can have some benefits (but mostly drawbacks!) and are called perfect processes, until someone comes up with a better word. The term universally good averaging sequence is sometimes used. One example is the following.

The process defined by g(x) = x + b - INT(x + b), where b is a positive irrational number, fits the bill. Since by definition, x(n+1) = g(x(n)), it is easy to see that

x(n) = (n-1)b + x(1) - INT((n-1)b + x(1)).

The fact that there is no bad seed is guaranteed by the equidistribution theorem. Even x(1) = 0 is a good seed.

(a) Properties of perfect processes

This process is investigated in chapter 11 in my book, available here (see page 70.) The n-th binary digit is defined as INT(2 x(n)), and these digits carry even less information than those generated by b-processes with b between 1 and 2. If b = log(2), the first few digits of the seed x(1) = 0 are as follows:

0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0

In contrast to b-processes, all seeds (regardless of b) have 50% of digits equal to 0, and 50% equal to 1. This process is related to integral C defined later in this article, in the sub-section Probabilistic calculus and number theory, special integrals.

The equilibrium distribution is always uniform if b is irrational, thus it is possible to compute the theoretical lag-1 auto-correlation of the sequence { x(n) } (using the first formula in section 1) and search for the b's that minimize, in absolute value, that auto-correlation. See appendix for a detailed solution. The empirical equilibrium distribution converges much faster to the theoretical one, than with b-processes. However, I've found a striking, unusual pattern for b = Pi and b = exp(Pi).

The empirical density, computed on x(1), ..., x(n) and binned into N intervals, shows strong periodic bumps that other irrational b do not produce, not even b = Pi - 0.00001. It occurs even with the seed x(1) = 0, with specific values of n and N, for instance n = 10,000 and N = 100, or n =1,000,000 and n = 100, but not with N = 100,000 and N = 100. These bumps decrease as n increases, and convergence to uniform [0, 1] still occurs for b = Pi and b = exp(Pi). Initially, I thought it was an issue with my internal machine arithmetic, but both my Perl and Excel implementations reproduce the same patterns. The Perl code is available here. The pattern is illustrated in the figure below.

The above picture shows the empirical density, with n = 56,000 observations and N = 100 bins, for four values of b. It is extremely close to the theoretical uniform equilibrium on [0, 1]. I truncated the Y-axis to visually amplify the pattern. The spreasheet is available here.

(b) Comparison with b-processes

Below we contrast some of the properties of b-processes, with those of perfect processes.

Below is a chart comparing the auto-correlation of b-processes with that of perfect processes. The red curve was computed empirically (based on simulations) while the blue curve represents the exact values. The small bumps in the red curve are real, they are not caused by a small sample size. Note that b is in [1, 4] here. While in this article, we focused on b between 1 and 2 for b-processes, it can easily be extended to any b larger than 2.

Auto-correlations and cross-correlations in multivariate processes are studied in more details in this article. For b-processes, the lag-k auto-correlation in base b is equal to the lag-1 auto-correlation in base b^k. For perfect processes, the lag-k auto-correlation in base b is equal to the lag-1 auto-correlation in base bk. These results are valid for any good seed.

(c) Connection with Brownian motions

Now, let us investigate the connection with Brownian motions. It seems that few of the perfect processes investigated here can emulate Brownian motions, because they are usually too strongly auto-correlated. But there are a few exceptions. The opposite is true for b-processes.

Let us define

where E is the expected value of { x(n) }, and u(n), v(n) and w(n) are functions chosen to stabilize the variances of { y(n) } and { z(n) } respectively, as n becomes large. For details about stabilizing the variance to turn time-discrete processes such as { y(n) } or { z(n) } into time-continuous processes such as Brownian motions (the time is the index n), see chapter 1 and 2 in my book, here. In short, it consists of re-scaling both the Y-axis (values) and X-axis (time, or n) to make sure that variances stay finite and non-zero as n tends to infinity. As time increments become infinitesimal and n tends to infinity, convergence to a continuous process is obtained. The textbook example is the transformation of a random walk into a Brownian motion. Here we only provide two examples, and technical details are omitted.

The first example is the b-process with b an integer larger than 1. In this case, with x(1) a good seed thus E = 1/2,with u(n) = SQRT(n), v(n) = SQRT(n^3), and w(n) =0, after re-scaling the time axis, { y(n) } becomes a Brownian motion, while { z(n) } becomes an integrated Brownian motion. There is nothing new here. What is new though, is the fact that this works too even if b is not an integer.

The second example involves a perfect process, with the seed x(1) = 1, b = SQRT(2) - 1, E = 1/2, u(n) = 1, v(n) = SQRT(n), and w(n) = n / 2. After re-scaling, this time { z(n) }, not { y(n) }, looks like a Brownian motion, see picture below.

If you try the perfect process with a different parameter b and a different seed, the result will usually be very different, sometimes unexpectedly beautiful with many smooth bumps, if your sample is large enough, but it won't look at all like a Brownian motion. All the computations are in my spreadsheet, available here. You can play with it to see the variety of patterns that it can produce. Here only the first 500 values of z(n) have been used. If you try with the first 50,000 values instead, it still looks Brownian, and indeed, strikingly similar due to the fractal nature of Brownian motions (when you zoom in or out, the randomness patterns stay the same.) A statistical test to assess the Brownian character of this time series, would probably conclude that it is likely to be Brownian. One such test was designed by Grzegorz Sikora, see here; his article was submitted for publication in 2018.

One of the issues with very large n is machine precision. As a test, replicate this example with a large sample, using only 8 correct decimals in your computation, rather than the 15 that Excel offers by default. Is the resulting chart still the same? If you want to replicate the example with the b-process and b an integer, avoid b = 2, try b =3 instead, because programming languages and Excel rely on base-32 or base-64 arithmetic, and assign the value 0 to x(n) when n > 55 or so. A workaround is to use high precision computing, or use b = 1.9999999 (1.999 won't work, and indeed theoretically, it is not supposed to work.)

The connection to Brownian motions (including the smoothness of b-processes versus perfect processes) is further studied in my article Long-range Correlations in Time Series: Modeling, Testing, Case Study, available here.

4.2. Characterization of equilibrium distributions (the attractors)

Here, we focus again on b-processes with b in ]1, 2]. Another interesting research topic is about characterizing the class of attractors. That is, what kind of distribution is an equilibrium distribution? What makes them peculiar, compared to other distributions? Another question is about how the number of attractors grows as the number of change points increases. Is there an asymptotic relationship between the number of change points (say N), and the number of attractors that have N change points?

It is not even known if the number of attractors with a finite number of change points, is finite or infinite. Surely, there are more than two attractors with two change points, and much more than one attractor with three change points. The ones listed in the above table are only those that I have studied. So this table is a work in progress.

4.3. Probabilistic calculus and number theory, special integrals

When I first started this research project a while back, the initial purpose was to study the behavior of the digits of numbers such as Pi. In fact, in this article, INT(b x(n)) represents the n-th digit of the seed x(1) in base b, whether b is an integer, a real number between 1 and 2, or any real number above 1. My book Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems published in June 2018 (see here) was the first milestone: developing a general framework to study this kind of problems. Since then, I have had new ideas. Here, I present some of them, related to this article, that I am still pursuing today.

In this subsection, the notation { x } represents the fractional part of the number x, in contrast to the remaining of the article, where { x(n) } represents the entire sequence x(1), x(2), and so on. Here we will only consider the case b = 2. Also, the seed x(1) is denoted as x.

If b is an integer and if the seed x = x(1) is in [0, 1], we have, for k larger or equal to 0:

(a) Interesting series, limits, and integrals

One of the promising starting points is the following result. The proportion of digits of x equal to 0 in base 2, is 50% if and only if the series below, with b = 2, converges:

In particular, it always converges if x is a good seed in base 2. It would be interesting to study the wildly erratic behavior of this function, which is not only discontinuous everywhere, but admits a dense set of singularities (where it does not converge.) Note that if we replace b^k by k in the above series, it always converges whenever x is an irrational number, a consequence of the equidistribution theorem. What happens if we replace b^k by k^2 or k^b ?

A related quantity is the following:If b is an integer, M(b) = (log 2) / 2 and does not depend on x, assuming x is a good seed. If b is not an integer, M(b) is smaller than (log 2) / 2. More precisely, M(b) = E(b) log(2), where E(b) is the expected value of the equilibrium distribution of a b-process. If b is between 1 and 2, then E(b) is approximately equal to the proportion of binary digits in base b, that are equal to 1, for a good seed x; it does not depend on x. For instance, based on results established in section 3.2.(a), we have:

Finally, let us consider the following integrals:

It seems that A is related to M(b). After a change of variable that makes the parameter b disappear, A = B, so A does not depend on b. One can prove that B = (log 2) / 2, see here for details. What about C? That one is also equal to (log 2) / 2, as one would expect, so A = B = C = (log 2) / 2. Other similar integrals, known as Frullani integrals, can be found here.

Integral A is associated with b-processes, which have bad seeds, and are sometimes called universally bad averaging sequences for that reason. Integral C is associated with a process with no bad seed, defined at the beginning of section 4, see Perfect stochastic processes in this article. The integrals A and C are associated with time-continuous versions of these processes, respectively for b-processes and perfect processes.

(b) Some expected values, distribution of binary digits

Let E(b) be the expected value of the equilibrium distribution of a b-process. What is the average value of E(b) between 1 and 2? That is,

The value is around 0.38. See references at the end of this section, for a tentative solution. Interestingly, the exact value of E(b) is not even known for most b's. The figure below shows E(b), as well as the proportion P(b) of digits equal to 1 for a b-process. The largest peak takes place at b = (1 + SQRT(5)) / 2, the golden ratio. The case b = 2 corresponds to the standard base-2 numeration system. The n-th digit of the seed x = x(1) is INT(2 x(n)), and it is equal to either 0 or 1 depending on x. It is assumed that x is a good seed in base b, thus P(b) and E(b) do not depend on x. Also, b is in ]1, 2].

(c) References

StackExhange: Computation of B
StackExhange: Average value of M(b) on [1, 2]
Research paper: Arithmetics on number systems with irrational bases, by P. Ambroz, et al.
Wikipedia: Golden ration numeration system

5. Appendix

Here we dive into more technical details, regarding four problems discussed in the article.

5.1. Computing the auto-correlation at equilibrium

We consider a b-process where b is an integer, so the equilibrium distribution is "known" to be uniform on [0, 1]. This fact has been taken for granted probably for more than a thousand years (and that's why people believe that the digits of Pi and other mathematical constants, are uniformly distributed), but it would be nice (and easy) to prove it, if the seed is a good seed. This is left as an exercise. It is not true usually if the seed is a bad seed. Pi is believed to be a good seed, but no one has ever managed to prove it: it is one of the biggest mathematical challenges of all times, and I once offered a $500,000 award either for a solid proof or rebuttal of this fact.

Note that at equilibrium, both X and g(X) have the same distribution, so their mean and variance are identical. So if b is an integer and the seed x(1) is a good seed, the only challenge in the auto-correlation formula mentioned in the first section, is the computation of E[X g(X)].

We have:

By definition, g(X) = bX - INT(bX). Thus,

Combined with the fact that E(X) = E(g(X)) = 1/2 and Var(X) = Var(g(X)) = 1/12, as the equilibrium distribution is uniform on [0, 1], we obtain the final result: the correlation between X and g(X), that is, the theoretical auto-correlation between two successive values of x(n), is equal to 1 / b. It is easy to check this result by computing the estimated value of the lag-1 auto-correlation on x(1), ..., x(n), with n = 1,000,000: this test provides a very good approximation.

5.2. Proof of the first fundamental theorem

Here b is in ]1, 2]. If Y = g(X), we have, for y in [0, 1]:

Thus, using the notation F for the probability distribution function (and f for the density) we have:

Taking the derivative with respect to y on both sides of the equation, we obtain the final result:

5.3. How to find the exact equilibrium distribution

We focus on b-processes with b in ]1, 2]. Finding the equilibrium distribution (actually, its density) is accomplished in two steps.

First, compute P(b) for all the polynomials P mentioned in the second fundamental theorem. Any value of P(b) between 0 and 1 corresponds to a potential change point. By looking at the empirical equilibrium density, computed on 100,000 values of { x(n) }, you can find the approximate value of the major change points: they correspond to points of discontinuity in the density function. For instance, if b = 1.4656... (the supergolden ratio), there is clearly a change point around 0.46, see the first picture in section 2. There is another one around 0.68. The exact values c(1) and c(2) of the two change points are derived from two of these polynomials:

c(1) = b - 1, and c(2) = b^2 - b,

because no other polynomial (in the small list that you have to check) gets so close to 0.46 and 0.68 respectively, when evaluated at b.

Then, once the change points are identified, take three different values -- one in each interval -- for instance

0.25 in S(1) = [0, c(1)[, 0.50 in S(2) = [c(1), c(2)[, and 0.75 in S(3) = [c(2), 1].

This assumes that you have three intervals, but you can easily generalize if you have more. Now apply the first fundamental theorem with y = 0.25, y = 0.50, and y = 0.75 respectively. You get:

Note that

0.853... = (1 + 0.25) / b, and it is in S(3), and 0.171... = 0.25 / b, and it is in S(1)
0.341... = 0.50 / b, and it is in S(1)
0.512... = 0.75 / b, and it is in S(2)

At equilibrium, the density functions of X and Y are identical. Thus, if d(1), d(2) and d(3) denote the density values in each of the 3 intervals, we end up with the following linear system, where d(1), d(2) and d(3) are the unknowns:

d(1) = (d(3) + d(1)) / b

d(2) = d(1) / b

d(3) = d(2) / b

It has an infinite number of solutions, and you need to add one constraint, the fact that the total density sums to 1, to be able to solve it. That constraint is

d(1)c(1) + d(2)(c(2) - c(1)) + d(3)(1 - c(2)) = 1,

that is,

d(1)(b-1) + d(2)(b^2 - 2b +1) + d(3)(1 - b^2 + b) = 1.

Finally, the solution, in this case, is

d(1) = b^2 / (2b^3 - 4b^2 + 2b +1), d(2) = d(1) / b, and d(3) = d(1) / b^2.

If you can not easily determine which polynomials yield the change points or you want to automate the method, you may as well try any two combinations of the potential polynomials (assuming you have two change points), and for each pair of polynomials (that that is, for each pair of change point candidates) solve a similar linear system. You then plug the tentative equilibrium densities obtained for each pair of polynomials, into the formula in the first fundamental theorem. Only one of them will satisfy the fact that X and Y have the same density everywhere on [0, 1], and that is the solution.

5.4. Perfect process with no auto-correlation

We compute the covariance E(X, g(X)) between successive observations x(n) in a perfect process. These processes were introduced in subsection 4.1 and are defined by g(x) = x + b - INT(x + b). We show that the covariance between X and g(X), and thus the lag-1 auto-correlation, is zero if and only if the fractional part of b is equal to (3 + SQRT(3)) / 6 or (3 - SQRT(3)) / 6. Also, the lag-1 auto-correlation is minimum, and equal to -1/2, when the fractional part of b tends to 1/2.

Here, b is any positive irrational number. The seed x(1) can be any real number in [0, 1], even x(1) = 0, since perfect processes don't have bad seeds. Also, as usual and by definition, x(n+1) = g(x(n)).

To prove the result, we start with the fact that since X is in [0, 1], we have:

INT(X + b) = INT(b) if X < INT(b + 1) - b, otherwise INT(X + b) = INT(b + 1).

The equilibrium distribution being uniform on [0, 1], and using the brackets to represent the INT function, we thus have:

At equilibrium, we have E(X) = E(g(X)) = 1/2, and E(X^2) = 1/3 since the distribution is uniform. Thus,

With the notation k = INT(b), A = 1, B = 1 - 2(k + 1), and C = (k + 1)^2 - (k + 1) + 1/6, finding the values of b that yield Cov(X, g(X)) = 0, consists in solving the quadratic equation Ab^2 + Bb + C =0. There may be a solution for each k = 0, 1, 2, and so on. Note that the discriminant of the quadratic equation does not depend on k:

Thus the solutions are b = k + (3 + SQRT(3)) / 6 and b = k + (3 - SQRT(3)) / 6, for k = 0, 1, and so on. Note that b must be irrational, otherwise the equilibrium distribution may not exist or may not be uniform.

Vincent Granville Articles and Books

Tuesday, February 22, 2022

Bernoulli Lattice Models - Connection to Poisson Processes

Sunday, February 20, 2022

Fascinating New Results in the Theory of Randomness

Fuzzy Regression: A Generic, Model-free, Math-free Machine Learning Technique

Blog Archive