'Activating conda environment by name from within snakemake workflow

Detailed info here, tl;dr can be found towards the end...

I've got a bioinformatics workflow I run using snakemake, with a python3 wrapper script using the snakemake API so that the snakemake command is simplified (https://github.com/charlesfoster/covid-illumina-snakemake). Most necessary programs are installed in a 'master' conda environment, while other programs with incompatible dependencies are installed in dedicated environments using conda directives within snakemake rules.

However, some programs cannot be easily included in this manner because they have a more complex installation. An example is pangolin (https://github.com/cov-lineages/pangolin), which requires the pangolin repo to be cloned, a conda environment created, then a "pip install .". Then, to run pangolin within the workflow, I have the following rule:

rule pangolin:
    input:
        fasta=os.path.join(RESULT_DIR, "{sample}/variants/{sample}.consensus.fa"),
    output:
        report=os.path.join(RESULT_DIR, "{sample}/pangolin/{sample}.lineage_report.csv"),
    shell:
        """
        set +eu
        eval "$(conda shell.bash hook)" && conda activate pangolin && pangolin --outfile {output.report} {input.fasta} &> /dev/null
        set -eu
        """

I've also tried the new named conda environment directive as of snakemake version ~6.15.5:

rule pangolin:
    input:
        fasta=os.path.join(RESULT_DIR, "{sample}/variants/{sample}.consensus.fa"),
    output:
        report=os.path.join(RESULT_DIR, "{sample}/pangolin/{sample}.lineage_report.csv"),
    conda:
        "pangolin"
    shell:
        """
pangolin --outfile {output.report} {input.fasta} &> /dev/null

        """

Steps to run the workflow:

  1. conda activate CIS
  2. CIS [options] directory_name/

While this works on my main development PC, when I try to install the pipeline on a new computer, I end up getting the following error:

Could not find conda environment: pangolin
You can list all discoverable environments with `conda info --envs`.

If I run conda info --envs manually within the terminal, I get the following:

$USER/Programs/covid-illumina-snakemake/.snakemake/conda/520fff074cd181af7ee385f2520fdd81
$USER/Programs/covid-illumina-snakemake/.snakemake/conda/cb6755e5de757f643e542e3ec52055b7
base                     $USER/miniconda3
CIS                   *  $USER/miniconda3/envs/CIS
pangolin                 $USER/miniconda3/envs/pangolin

If I run conda info --envs within the snakemake workflow itself, I get the following:

$USER/Programs/covid-illumina-snakemake/.snakemake/conda/520fff074cd181af7ee385f2520fdd81
$USER/Programs/covid-illumina-snakemake/.snakemake/conda/cb6755e5de757f643e542e3ec52055b7
$USER/miniconda3
base                  *  $USER/miniconda3/envs/CIS
$USER/miniconda3/envs/pangolin

(username redacted here in both for brevity)

So, as you can see, the names of the environments are no longer detected within the snakemake workflow, and the 'CIS' environment is incorrectly thought to be 'base'. Therefore, the pangolin conda environment cannot be activated by name with eval "$(conda shell.bash hook)" && conda activate pangolin.

tl;dr: conda info --envs has unexpected and different behaviour when invoked from within a snakemake workflow, which is 'driven' by a python script within a 'master' conda env.

Does anyone know why this might be, and/or how to fix it? Is there a better way to activate a named conda environment within a snakemake workflow?

Thanks!

snakemake version: 6.15.5

conda version: 4.11.0



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source