Saturday, September 4, 2021

Machine Learning Perspective on the Twin Prime Conjecture

 This article focuses on the machine learning aspects of the problem, and the use of pattern recognition techniques leading to interesting, new findings about twin primes. Twin primes are prime numbers p such that p + 2 is also prime. For instance, 3 and 5, or 29 and 31. A famous, unsolved and old mathematical conjecture states that there are infinitely many such primes, but a proof still remains elusive to this day. Twin primes are far rarer than primes: there are infinitely more primes than there are twin primes, in the same way that that there are infinitely more integers than there are prime integers.

Here I discuss the results of my experimental math research, based on big data, algorithms, machine learning, and pattern discovery. The level is accessible to all machine learning practitioners. I first discuss my experimentations in section 1, and then how it relates to the twin prime conjecture, in section 2. Mathematicians may be interested as well, as it leads to a potential new path to prove this conjecture. But machine learning readers with little time, not curious about the link to the mathematical aspects, can read section 1 and skip section 2.

I do not prove the twin prime conjecture (yet). Rather, based on data analysis, I provide compelling evidence (the strongest I have ever seen), supporting the fact that it is very likely to be true. It is not based on heuristic or probabilistic arguments (unlike this version dating back to around 1920), but on hard counts and strong patterns.

This is not different from analyzing data and finding that smoking is strongly correlated with lung cancer: the relationship may not be causal as there might be confounding factors. In order to prove causality, more than data analysis is needed (in the case of smoking, of course causality has been firmly established long ago.)

Read full article here

Thursday, May 13, 2021

Fascinating Facts About Complex Random Variables and the Riemann Hypothesis

Despite my long statistical and machine learning career both in academia and in the industry, I never heard of complex random variables until recently, when I stumbled upon them by chance while working on some number theory problem. However, I learned that they are used in several applications, including signal processing, quadrature amplitude modulation, information theory and actuarial sciences. 

In this article, I provide a short overview of the topic, with application to understanding why the Riemann hypothesis (arguably the most famous unsolved mathematical conjecture of all times) might be true, using probabilistic arguments. Stat-of-the-art, recent developments about this conjecture are discussed in a way that most machine learning professionals can understand. The style of my presentation is very compact, with numerous references provided as needed. It is my hope that this will broaden the horizon of the reader, offering new modeling tools to her arsenal, and an off-the-beaten-path reading. The level of mathematics is rather simple and you need to know very little (if anything) about complex numbers. After all, these random variables can be understood as bivariate vectors (X, Y) with X representing the real part and Y the imaginary part. They are typically denoted as Z = X + iY, where the complex number i (whose square is equal to -1) is the imaginary unit. There are some subtle differences with bivariate real variables. The complex Gaussian variable is of course the most popular case.

Read full article here.

Sunday, August 30, 2020

The Exponential Mean: Alternative to Classic Means

 Given n observations x1, ..., xn, the generalized mean (also called power mean) is defined as 

The case p = 1 corresponds to the traditional arithmetic mean, while p = 0 yields the geometric mean, and p = -1 yields the harmonic mean. See here for details. This metric is favored by statisticians. It is a particular case of the quasi-arithmetic mean

Here I introduce another kind of mean called exponential mean, also based on a parameter p, that may have an appeal to data scientists and machine learning professionals. It is also a special case of the quasi-arithmetic mean. Though the concept is basic, there is very little if any literature about it. It is related to the LogSumExp and the Log semiring. It is defined as follows:

Here the logarithm is in base p, with p positive. When p tends to 0, mp is the minimum of the observations. When p tends to 1, it yields the classic arithmetic mean, and as p tends to infinity, it yields the maximum of the observations. 

Content of this article

  • Advantages of the exponential mean
  • Illustration on a test data set
  • Illustration on a test data set
  • Doubly exponential mean

Read the full article here. 

Friday, June 5, 2020

Bernouilli Lattice Models - Connection to Poisson Processes

Bernouilli lattice processes may be one of the simplest examples of point processes, and can be used as an introduction to learn about more complex spatial processes that rely on advanced measure theory for their definition. In this article, we show the differences and analogies between Bernouilli lattice processes on the standard rectangular or hexagonal grid, and the Poisson process, including convergence of discrete lattice processes to continuous Poisson process, mainly in two dimensions. We also illustrate that even though these lattice processes are purely random, they don't look random when seen with the naked eye.  
We discuss basic properties such as the distribution of the number of points in any given area, or the distribution of the distance to the nearest neighbor. Bernouilli lattice processes have been used as models in financial problems. Most of the papers on this topic are hard to read, but here we discuss the concepts in simple English. Interesting number theory problems about sums of squares, deeply related to these lattice processes, are also discussed. Finally, we show how to identify if a particular realization is from a Bernouilli lattice process, a Poisson process, or a combination of both. 
See below a realization of a Bernouilli process on the regular hexagonal lattice. The main feature of such a process is that the point locations are fixed, not random. But whether a point is "fired" or not (that is, marked in blue) is purely random and independent of whether any other point is fired or not. The probability for a point to be fired is a Bernouilly variable of parameter p
Figure 1: realization of Bernouilli hexagonal lattice process
More sophisticated models, known as Markov random fields, allows for neighboring points to be correlated. They are useful in image analysis.
To the contrary, Poisson processes assume that the point locations are random. The points being fired are uniformly distributed on the plane, and not restricted to integer or grid coordinates. In short, Bernouilli lattice processes are discrete approximations to Poisson processes. Below is an example of a realization of a Poisson process.
Math / data science articles by the same author:
Here is a selection of articles pertaining to experimental math and data science:

Thursday, May 28, 2020

New Probabilistic Approach to Factoring Big Numbers

Product of two large primes are at the core of many encryption algorithms, as factoring the product is very hard for numbers with a few hundred digits. The two prime factors are associated with the encryption keys (public and private keys). Here we describe a new approach to factoring a big number that is the product of two primes of roughly the same size. It is designed especially to handle this problem and identify flaws in encryption algorithms.  
Riemann zeta function in the complex plane
While at first glance it appears to substantially reduce the computational complexity of traditional factoring, at this stage there is still a lot of progress needed to make the new algorithm efficient. An interesting feature is that the success depends on the probability of two numbers to be co-prime, given the fact that they don't share the first few primes (say 2, 3, 5, 7, 11, 13) as common divisors. This probability can be computed explicitly and is about 99%. 
The methodology relies heavily on solving systems of congruences, the Chinese Remainder Theorem, and the modular multiplicative inverse of some carefully chosen integers. We also discuss computational complexity issues. Finally, the off-the-beaten-path material presented here leads to many original exercises or exam questions for students learning probability, computer science, or number theory: proving the various simple statements made in my article. 
Some Number Theory Explained in Simple English
  • Co-primes and pairwise co-primes
  • Probability of being co-prime
  • Modular multiplicative inverse
  • Chinese remainder theorem, version A
  • Chinese remainder theorem, version B
The New Factoring Algorithm
  • Improving computational complexity
  • Five-step algorithm
  • Probabilistic optimization
  • Compact Formulation of the Problem
Read the full article here
Other Math Articles by Same Author
Here is a selection of articles pertaining to experimental math and probabilistic number theory:

Wednesday, May 6, 2020

Simple Trick to Dramatically Improve Speed of Convergence

We discuss a simple trick to significantly accelerate the convergence of an algorithm when the error term decreases in absolute value over successive iterations, with the error term oscillating (not necessarily periodically) between positive and negative values. 
We first illustrate the technique on a well known and simple case: the computation of log 2 using its well know, slow-converging series. We then discuss a very interesting and more complex case, before finally focusing on a more challenging example in the context of probabilistic number theory and experimental math.
The technique must be tested for each specific case to assess the improvement in convergence speed. There is no general, theoretical rule to measure the gain, and if the error term does not oscillate in a balanced way between positive and negative values, this technique does not produce any gain. However, in the examples below, the gain was dramatic. 
Let's say you run an algorithm, for instance gradient descent. The input (model parameters) is x, the output if f(x), for instance a local optimum. We consider f(x) to be univariate, but it easily generalizes to the multivariate case, by applying the technique separately for each component. At iteration k, you obtain an approximation f(k, x) of f(x), and the error is E(k, x) = f(x) - f(k, x). The total number of iterations is N. starting with first iteration k = 1.  
The idea consists in first running the algorithm as is, and then compute the "smoothed" approximations, using the following m steps.
  • General framework and simple illustration
  • A strange function
  • Even stranger functions

Sunday, March 1, 2020

State-of-the-Art Statistical Science to Tackle Famous Number Theory Conjectures

The methodology described here has broad applications, leading to new statistical tests, new type of ANOVA (analysis of variance), improved design of experiments, interesting fractional factorial designs, a better understanding of irrational numbers leading to cryptography, gaming and Fintech applications, and high quality random numbers generators (and when you really need them). It also features exact arithmetic / high performance computing and distributed algorithms to compute millions of binary digits for an infinite family of real numbers, including detection of auto- and cross-correlations (or lack of) in the digit distributions.
The data processed in my experiment, consisting of raw irrational numbers (described by a new class of elementary recurrences) led to the discovery of unexpected apparent patterns in their digit distribution: in particular, the fact that a few of these numbers, contrarily to popular belief, do not have 50% of their binary digits equal to 1. It turned out that perfectly random digits simulated in large numbers, with a good enough pseudo-random generator, also exhibit the same strange behavior, pointing to the fact that pure randomness may not be as random as we imagine it is. Ironically, failure to exhibit these patterns would be an indicator that there really is a departure from pure randomness in the digits in question.
In addition to new statistical / mathematical methods and discoveries and interesting applications, you will learn in my article how to avoid this type of statistical traps that lead to erroneous conclusions, when performing a large number of statistical tests, and how to not be misled by false appearances. I call them statistical hallucinations and false outliers.
This article has two main sections: section 1, with deep research in number theory, and section 2, with deep research in statistics, with applications. You may skip one of the two sections depending on your interests and how much time you have. Both sections, despite state-of-the-art in their respective fields, are written in simple English. It is my wish that with this article, I can get data scientists to be interested in math, and the other way around: the topics in both cases have been chosen to be exciting and modern. I also hope that this article will give you new powerful tools to add to your arsenal of tricks and techniques. Both topics are related, the statistical analysis being based on the numbers discussed in the math section. 
One of the interesting new topics discussed here for the first time is the cross-correlation between the digits of two irrational numbers. These digit sequences are treated as multivariate time series. I believe this is the first time ever that this subject is not only investigated in detail, but  in addition comes with a deep, spectacular probabilistic number theory result about the distributions in question, with important implications in security and cryptography systems. Another related topic discussed here is a generalized version of the Collatz conjecture, with some insights on how to potentially solve it.
1. On the Digits Distribution of Quadractic Irrational Numbers
  • Properties of the recursion
  • Reverse recursion
  • Properties of the reverse recursion
  • Connection to Collatz conjecture
  • Source code
  • New deep probabilistic number theory results
  • Spectacular new result about cross-correlations
  • Applications
2. New Statistical Techniques Used in Our Analysis
  • Data, features, and preliminary analysis
  • Doing it the right way
  • Are the patterns found a statistical illusion, or caused by errors, or real?
  • Pattern #1: Non-Gaussian behavior
  • Pattern #2: Illusionary outliers
  • Pattern #3: Weird distribution for block counts
  • Related articles and books

Machine Learning Perspective on the Twin Prime Conjecture

  This article focuses on the machine learning aspects of the problem, and the use of pattern recognition techniques leading to interesting,...