Thursday, December 9, 2021

Poisson-binomial Stochastic Processes: Introduction, Simulations, Inference

In simple English, we introduce a special, off-the-beaten-path type of point process called Poisson-binomial. We analyze its properties and perform simulations to see the distribution of points that it generates, in one and two dimensions, as well as to make inference about its parameters. Statistics of interest include distance to nearest neighbors or interarrival times  (some times called increments).  Combined with radial processes, it can be used to model complex systems involving clustering, for instance the distribution of matter in the universe. Limiting cases are Poisson processes, and this may lead to what is probably the simplest definition of of stochastic, stationary Poisson point process. Our approach is not based on traditional statistical science, but instead on modern machine learning methodology. Some of the tests performed reveal strong patterns invisible to the naked eye.

Probably the biggest takeaway from this article is its tutorial value. It shows how to handle a machine learning problem involving the construction of a new model, from start to finish, with all the steps described at a high level, accessible to professionals with one year worth of academic training in quantitative fields. Numerous references are provided to help the interested reader dive into the technicalities. This article covers a large amount of concepts in a compact style: for instance, stationarity, isotropy, intensity function, paired and unpaired data, order statistics, hidden processes, interarrival times, point count distribution, model fitting, model-free confidence intervals, simulations, radial processes, renewal process, nearest neighbors, model identifiability, compound or hierarchical processes, Mahalanobis distance, rescaling, and more. It can be used as a general introduction to point processes. 

Related probability distributions include logistic, Laplace, uniform, Cauchy, Poisson, Erlang, Poisson-binomial (the distribution these processes are named after), and many that don't have a name. These distributions are used in this article. Part 1 of this article can be found here. The link below points to part 2.

The accompanying spreadsheet has its own tutorial value, as it uses powerful Excel functions that are overlooked by many. Source code is also provided and included in the spreadsheet.

Read the full article here.


1. Unpaired two-dimensional processes

2. Cluster and hierarchical point processes

  • Radial process
  • Basic cluster process
  • The spreadsheet
3. Machine learning inference
  • Interarrival time, and estimation of the intensity
  • Estimation of the scaling factor
  • Confidence intervals, model fitting, and identifiability issues
  • Estimation in higher dimensions
  • Results from some testing, including about radiality
  • Distributions related to the inverse or hidden process
  • Convergence to the Poisson process

Wednesday, December 1, 2021

A Gentle, Original Approach to Stochastic Point Processes

 In this article, original stochastic processes are introduced. They may constitute one of the simplest examples and definitions of point processes. A limiting case is the standard Poisson process - the mother of all point processes - used in so many spatial statistics applications. We first start with the one-dimensional case, before moving to 2-D. Little probability theory  knowledge is required to understand the content. In particular, we avoid discussing measure theory, Palm distributions, and other hard-to-understand concepts that would deter many users. Nevertheless, we dive pretty deep into the details, using simple English rather than arcane abstractions, to show the potential. A spreadsheet with simulations is provided, and model-free statistical inference techniques are discussed, including model-fitting and test for radial distribution. It is shown that two realizations of very different processes can look almost identical to the naked eye, while they actually have fundamental differences that can only be detected with machine learning techniques. Cluster processes are also presented in a very intuitive way.  Various probability distributions, including logistic, Poisson-binomial and Erlang, are attached to these processes, for instance to model the distribution and/or number of points in some area, or distances to nearest neighbors. 

Read full article here

Neat R Plots with the Cairo Graphics Library

As I spent much of my career in business intelligence and marketing, programming in R was never one of my top skills. While I started progra...