Pré-requis

Sur Ubuntu il faut installer python3.8-venv avant Poetry :

sudo apt-get install python3.8-venv

Poetry

curl -sSL https://install.python-poetry.org | python3 -

Ajouter la commande suivante dans le .bashrc (toujours nécessaire ?) :

export PATH="$HOME/.local/bin:$PATH"

Activer la version 3.8 de Python

poetry env use 3.8

Installation des dépendances

poetry config virtualenvs.in-project true
poetry install

poetry config virtualenvs.in-project true permet d'installer l'environnement comme un sous-dossier du projet plutôt que dans le home. C'est recommandé pour que VSCode trouve l'environnement.

Pour développer la pipeline, il faut des packages supplémentaires :

poetry install --extras "pipeline"

Debug Poetry

Pour supprimer un environnement : https://python-poetry.org/docs/managing-environments/

poetry env list
poetry env remove 3.7

Pour nettoyer tout

rm poetry.lock 
poetry env list
poetry env remove leximpact-prepare-data-0Rkp9wuO-py3.8
poetry cache clear --all pypi
poetry env use -vvv 3.8
poetry install

Pour afficher l'arbre des dépendances:

 poetry show --tree

Specifier la version de Python à Poetry

poetry env use /usr/bin/python3.8

How to develop

!ln -s ../leximpact_prepare_data
!cd analyses && ln -s ../../leximpact_prepare_data
!cd extractions_base_des_impots && ln -s ../../leximpact_prepare_data
!cd retraitement_erfs-fpr && ln -s ../../leximpact_prepare_data

Update package to last version

poetry update

Jupyter

First time, and after adding a librairy :

!~/.local/bin/poetry run python -m ipykernel install --name leximpact-prepare-data-kernel --user

Launch jupyter

poetry run jupyter lab

Check style

make precommit

Update precommit

A faire de temps en temps pour rester à jour:

poetry run pre-commit autoupdate

NBDev

# Run pre-commit before converting notebooks
poetry run pre-commit  run --all-files
# Build lib from notebook
poetry run nbdev_build_lib
# Build docs from notebook
poetry run nbdev_build_docs
# Re-run pre-commit
poetry run pre-commit  run --all-files
!make precommit
#!poetry run nbdev_build_docs
!cd .. && make docs

Lien sécurisé vers l'ERFS-FPR

sudo mkdir -p /mnt/data-in /mnt/data-out
sudo chown $USER:$USER /mnt/data-*
sshfs dc5:/rpool/private-data/input /mnt/data-in
sshfs dc5:/rpool/private-data/output /mnt/data-out

How we build the docs

The documentation is available at https://documentation.leximpact.dev/leximpact_prepare_data/

It's build with NBDev in the GitLab CI.

Due to dependancies conflicts, we have to do it like this:

  • Use Poetry env for default environnnement
  • Use venv for specific env to remove notebook output, because --clear-output do not work with nbconvert < 6 that is needed for nbdev. We do it to avoid publishing sensitive data. We have to find a better way to publish outputs without sensitive data.
  • Use docker for nbdev_build_docs because it does not work in our env for unkown reason.

Then we copy the docs via scp to our server and build the final statics docs with Jekyll on it.

NBDev build docs with Jekyll because it is supported by Github for free hosting.

Test de la doc en local

Pour convertir les Notebooks en Jekyll:

docker run -v $PWD:/project -w /project -v /media/data-in:/mnt/data-in -v /media/data-out:/mnt/data-out fastai/jekyll sh deploy/build_docs.sh

Pour interpréter le Jekyll:

make docs_serve

Puis aller sur http://127.0.0.1:4000/leximpact_prepare_data//.

Anaconda sur CASD

Construction du paquet

docker run -i -t -v $PWD:/src continuumio/miniconda3 /bin/bash
cd /src
python3 gitlab-ci/src/get_pypi_info.py -p leximpact-prepare-data
conda install -y conda-build anaconda-client
conda config --set anaconda_upload yes
conda build -c conda-forge -c leximpact -c openfisca .conda

Pour faire l'upload:

anaconda login
anaconda upload \
    /opt/conda/conda-bld/noarch/leximpact-prepare-data-0.0.8-py_0.tar.bz2 \
    /opt/conda/conda-bld/noarch/leximpact-prepare-data-casd-0.0.8-py_0.tar.bz2 \
    /opt/conda/conda-bld/noarch/leximpact-prepare-data-dev-0.0.8-py_0.tar.bz2

Test en local

Installer le paquet dans un environnement propre:

mkdir -p casd-test
cd casd-test
git clone https://git.leximpact.dev/leximpact/leximpact-prepare-data.git
rm -r  ./conda-env
conda create  --prefix ./conda-env python=3.8
conda activate ./conda-env
conda config --add channels conda-forge
conda config --set channel_priority strict
conda install -c conda-forge -c openfisca -c leximpact leximpact-prepare-data-casd
ipython kernel install --user --name=prepare-data-conda-env

Pour vérifier que tout a fonctionné:

jupyter lab

Puis ouvrir le fichier leximpact-prepare-data/notebook/extractions_base_des_impots/test_install.ipynb et l'exécuter.

Pour sortir de l'environnement

conda deactivate