'Can I make conda solve this environment faster?
I am using geopandas in a module that is run through GitLab CI... and the environment solving step takes forever. Like, around 30 minutes of solving for 2 minutes of running the job.
At each CI job
- a container with the ad hoc image is started
- a conda environment is created with the dependencies needed for the package
- the package is installed and a script is run
Of course, I could create a specific image for this job and go through the burden of solving only once but this means dependencies would be frozen... and this is not the expected behavior.
As is recommended in geopandas documentation, I use the conda-forge channel.
Here is the environment file:
name: my_package
channels:
- conda-forge
dependencies:
- conda-forge::python
- conda-forge::numpy
- conda-forge::pandas
- conda-forge::geopandas
- conda-forge::geopy
- conda-forge::pyarrow
- conda-forge::scikit-learn
- conda-forge::matplotlib
- conda-forge::coverage
- conda-forge::shapely
- conda-forge::intake
- conda-forge::pytest
- conda-forge::sphinx
- conda-forge::pysmb
- conda-forge::xlrd
- conda-forge::openpyxl
- conda-forge::sphinx_rtd_theme
Any idea on how to speed up environment solving?
Solution 1:[1]
I agree with @Olsgaard's suggestion, that it's worth considering a redesign of the CI workflow to decouple the image generation from the testing phase. However, that doesn't technically "speed up environment solving" as was queried.
For faster solves:
Use Mamba, as @FlyingTeller mentioned. This provides fast solving by using a compiled SAT solver rather than Python.
At least pin the
pythonversion, e.g.,python=3.9. Consider also adding minimum versions for DAG "hubs" likenumpy,pandas, etc.. This would vastly reduce the solution space.
Solution 2:[2]
There are a few paths to solve this. What you could do is have the CI pipeline run 3 steps
- step a: load custom image and install dependencies
- step b: create a new image with the new dependencies
- step c: run your tests
As long as step b and c run in parallel, the image creation won't hinder your tests, and since you are always updating your environment, step a will run much faster. You can add logic in step b, to make sure it only builds a new image when needed.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | merv |
| Solution 2 |
