Monday, December 31, 2018

Announcement: Winner of the Data Science Central Competition

Back in 2017, we posted a problem related to stochastic processes and controlled random walks, offering a $2,000 award for a sound solution, see here for full details. The problem, which had a FinTech flavor, was only solved recently (December 2018) by Victor Zurkowski.
About the problem:
Let's start with X(1) = 0, and define X(k) recursively as follows, for k > 1:
and let's define U(k), Z(k), and Z as follows:
where the V(k)'s are deviates from independent uniform variables on [0, 1].
So there are two positive parameters in this problem, a and b, and U(k) is always between 0 and 1. When b = 1, the U(k)'s are just standard uniform deviates, and if b = 0, then U(k) = 1. The case a = b = 0 is degenerate and should be ignored. The case a > 0 and b = 0 is of special interest, and it is a number theory problem in itself, related to this problem when a = 1. Also, just like in random walks or Markov chains, the X(k)'s are not independent; they are indeed highly auto-correlated.
Prove that if a < 1, then  X(k) converges to 0 as k increases. Under the same condition, prove that the limiting distribution Z
  • always exists, (Note: if a > 1, X(k) may not converge to zero, causing a drift and asymmetry)
  • always takes values between -1 and +1, with min(Z) = -1 and max(Z) = +1,
  • is symmetric, with mean and median equal to 0
  • and does not depend on a, but only on b.
For instance, for b =1, even a = 0 yields the same triangular distribution for Z, as any a  > 0.
Main question: In general, what is the limiting distribution of Z? I guessed, using empirical data science techniques such as model fitting, simulations, and goodness-of-fit tests,  that the solution (which implied solving a stochastic integral solution) was, with z in [-1. 1]:
About the author and the solution:
Victor not only confirmed that the above density function is a solution to this problem, but also that the solution is unique, focusing on convergence issues, in a 27-page long paper. One detail still needs to be worked out: whether or not scaled Z visits the neighborhood of every point in [-1,1] infinitely often. Victor believes that the answer is positive. You can read his solution here, and we hope it will result in a publication in a scientific journal.
Victor Zurkowski, PhD, is a predictive modeling, machine learning, and optimization expert with 20+ years of experience, with deep expertise developing pricing models and optimization engines across industries, including Retail, Financial Services. He published various academic papers in Mathematics and Statistics across numerous topics, and is currently VP of Data Science at Polymatiks. Victor holds a Ph.D. in Mathematics from the University of Minnesota and an M.Sc. in Statistics from the University of Toronto.

Thursday, December 27, 2018

Why You Should be a Data Science Generalist - and How to Become One

The new advice today for data scientists is not to become a generalist. You can read recent articles on this topic, for instance here.  In this blog, I explain why I believe it should be the opposite. I wrote about this here not long ago, and provide additional arguments in this article, as to why it helps to be a generalist.  
Of course, it is difficult, and probably impossible to become a data science generalist just after graduating. It takes years to acquire all the skills, yet you don't need to master all of them. It might be easier for a physicist, engineer, or biostatistician currently learning data science, after years of corporate experience, than it is for a data scientist with no business experience. Possibly the easiest way to become one is to work for start-up's or small companies, taking on many hats as you will probably be the only data scientist in your company, and will have to change jobs more frequently than if you work for a big company. To the contrary, for a big company, you are expected to work in a very specialized area, though it does not hurt to be a generalist, as I will illustrate shortly. Being a specialized data scientist could put you on a very predictable path that limits your career growth and flexibility, especially if you want to create your company down the line. Let's start with explaining what a data science generalist is.
The data science generalist
The generalist has experience working in different roles and different environments, for instance, over a period of 15 years, having worked as a
  • Business analyst or BI professional, communicating insights to decision makers, mastering tools such as Tableau, SQL and Excel; or maybe being the decision maker herself
  • Statistician / data analyst with expertise in predictive modeling
  • Expert in algorithm design and optimization
  • Researcher in an academic-like setting, or experience in testing / prototyping new data science systems and proofs of concept (POC)
  • Builder / architect: designing APIs, dashboards, databases, and deploying/maintaining yourself some modest systems in production mode
  • Programmer (statistical or scientific programmer with exposure to high performance computing and parallel architectures - you might even have designed your own software)
  • Consultant, directly working with clients, or adviser
  • Manager or director role rather than individual contributor
  • Professional with roles in various industries (IT, media, Internet, finance, health care, smart cities) in both big and small companies, in various domains ranging from fraud detection, to optimizing sales or marketing, with proven, measurable accomplishments
In short, the generalist has been involved at one time or another, in all phases of the data science project lifecycle
The generalist might not command a higher salary, but has more flexibility career-wise. Even in a big company, when downsizing occurs, it is easier for the generalist to make a lateral move (get transferred to a different department), than it is for the "one-trick pony". 
Timing is important too. If you become a generalist at age 50 (as opposed to age 45) it might not help as getting hired becomes more difficult as you get past 45. Still, even if 50 or more, it opens up some possibilities, for instance starting your own business. And if you can prove that you have been consistently broadening your skills throughout your career cycle, as generalists do by definition, it will be easier to land a job, especially if your salary expectations are reasonable, and your health is not an issue for your future employer.  
To read the full article, click here

Fuzzy Regression: A Generic, Model-free, Math-free Machine Learning Technique

  A different way to do regression with prediction intervals. In Python and without math. No calculus, no matrix algebra, no statistical eng...