On Subgaussian Concentration of Missing Mass

The problem of missing mass goes back to the cryptographic work of Good and Turing during WWII, but has been also studied in the context of linguistic, ecology, and by probability theoreticians. The missing mass is defined as the weight of elements not observed in a sample, due to pure chance: Such an event and […]

Approximating Tails of Beta Distribution

Beta distribution is ubiquitous in statistics, but particularly popular in real-world modeling. The beta-binomial model is perhaps the most known example, given the recent interest in Bayesian inference. But it was in use nearly 50 years ago, for example in toxicology. Unfortunately, computing probabilities from the density depends on intractable incomplete beta integrals. This creates […]

Improving State-of-Art on Sparse Random Projections

Random projections are widely used to reduce data dimension in various analyses. Provable guarantees were developed first in the important result of Johnson and Lindenstrauss on Lipschitz maps, but more recently there has been a lot of follow-up work in the context of machine-learning. Particularly attractive are sparse random projections, which share similar guarantees as […]

Performance drawbacks of Tensorflow Datasets

Tensorflow, the popular framework for machine-learning, recommends its new dataset API for preprocessing and serving data. It supports useful tricks, such as caching data in memory, prefetching in parallel threads and others described in tutorials. Still, Tensorflow has issues with slow data slicing, so the dataset API may actually do harm in setups where computations […]