The problem of missing mass goes back to the cryptographic work of Good and Turing during WWII, but has been also studied in the context of linguistic, ecology, and by probability theoreticians. The missing mass is defined as the *weight of elements not observed in a sample*, due to pure chance:

Such an event and the total mass of all unseen elements are exponentially small in the sample size.

Quite curiously, proving it rigorously has been quite hard. The first proof appeared at COLT’00, but has since then been reworked many times, in attempts to simplify and numerically improve. The arguments were based on a thermodynamic framework, logarithmic Sobolev inequalities and information theory. Formally, for some constant \(K>0\) we want

$$\Pr[\pm(M-\boldsymbol{E}M)>\epsilon]\leq \mathrm{e}^{-K\cdot n \epsilon^2}$$

where \(n\) is the sample size and \(M\) is the missing mass for the i.i.d. sample of size \(n\).

The problem of proving it “standard” concentration inequalities has been open so far. In reponse to this challenge, in my recent note I have **proved it with Bernstein’s inequality**, century old. So, no complicated approaches and refinements were necessary!