Demystifying TestDom

Recruiters nowadays use online timed tests when screening developers. I recently looked at Python & Algorithms Hard questions at TestDome. While the timing and hints seem to push towards implementing tricks from scratch, for the quality in long term it is better to structure the problem and use established solutions (divide & conquer). The battery …

On Subgaussian Concentration of Missing Mass

The problem of missing mass goes back to the cryptographic work of Good and Turing during WWII, but has been also studied in the context of linguistic, ecology, and by probability theoreticians. The missing mass is defined as the weight of elements not observed in a sample, due to pure chance: Such an event and …

Approximating Tails of Beta Distribution

Beta distribution is ubiquitous in statistics, but particularly popular in real-world modeling. The beta-binomial model is perhaps the most known example, given the recent interest in Bayesian inference. But it was in use nearly 50 years ago, for example in toxicology. Unfortunately, computing probabilities from the density depends on intractable incomplete beta integrals. This creates …

Improving State-of-Art on Sparse Random Projections

Random projections are widely used to reduce data dimension in various analyses. Provable guarantees were developed first in the important result of Johnson and Lindenstrauss on Lipschitz maps, but more recently there has been a lot of follow-up work in the context of machine-learning. Particularly attractive are sparse random projections, which share similar guarantees as …

Performance drawbacks of Tensorflow Datasets

Tensorflow, the popular framework for machine-learning, recommends its new dataset API for preprocessing and serving data. It supports useful tricks, such as caching data in memory, prefetching in parallel threads and others described in tutorials. Still, Tensorflow has issues with slow data slicing, so the dataset API may actually do harm in setups where computations …