Teams that collaborate on data-science tasks using cloud platforms often choose to share a preconfigured ML environment, such as Kaggle Docker Python image. This resolves reproducibility and dependency issues, while individual team members can add custom packages on top, with local virtual environments, for example adding less common packages for computer vision.
This robust setup requires pointing to the base environment as
--system-site-packages when configuring the local virtual environment. Below, we see an example of a local environment with the package DeepForest (not present in the Kaggle image).
root@cf1b6f63d729:/home/jupyter/src/tree_counting# python -m venv .deepforest --system-site-packages root@cf1b6f63d729:/home/jupyter/src/tree_counting# pip install --upgrade pip --quiet root@cf1b6f63d729:/home/jupyter/src/tree_counting# pip install deepforest --quiet
The local environment can be further exposed to jupyter as a custom kernel.
root@cf1b6f63d729:/home/jupyter/src/tree_counting# source .deepforest/bin/activate (.deepforest) root@cf1b6f63d729:/home/jupyter/src/tree_counting# python -m ipykernel install --user --name .deepforest --display-name "Kaggle+DeepForest" Installed kernelspec .deepforest in /root/.local/share/jupyter/kernels/.deepforest
The architecture is shown below.
This script demonstrates the difference between system-level and local packages.
(.deepforest) root@cf1b6f63d729:/home/jupyter/src/tree_counting# python Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> import tensorflow >>> import deepforest '/home/jupyter/src/tree_counting/.deepforest/lib/python3.7/site-packages/deepforest/__init__.py' >>> tensorflow.__file__ '/opt/conda/lib/python3.7/site-packages/tensorflow/__init__.py'
Finally, it is worth mentioning the Dev Containers extension, which connects IDE to a running container. Then we can enjoy all the VS Code features 🙂