'Link rule_1 output targetting files with rule_2 input targetting the files' folder [snakemake]
I am trying to create a workflow in snakemake with two rules:
pool_filesthat creates, from a list of genomes saved in different folders, a copy of each genome into a same folderrun_pairwisethat takes the path of the folder containing the genome copies, runs a function (in my case in ANI calculation, but is not relevant) and save all the results in a output folder
My issue is that input and output of the first rule pool_files are single files, while the input and output of the second rule run_pairwise are folders. My workaround is to provide both the copied files of pool_files and the output folder of run_pairwise as inputs for rule all, however, in the best case scenario, I am getting an error like:
ChildIOException: File/directory is a child to another output
The table (object gnm_table in the example below) that I read in and that contains the path of all genomes looks like this:
dir file
0 _input/genomes/ref aaa_v1.0.fa
1 _input/genomes bbb.fa
2 _input/genomes ccc.fa
3 _input/genomes ddd.fa
While a temptative code that I came up with so far looks like this:
import os
rule all:
input:
expand("_results/pool_gnms/{target}", target=gnm_table.file),
"_plots/ANI"
rule pool_files:
input:
i_gnm = lambda wildcards: os.path.join(gnm_table.dir[gnm_table.file == wildcards.target].to_string(), wildcards.target)
output:
gnm_link = "_results/pool_gnms/{target}",
shell:
'ln -s '
'{input.i_gnm} '
'{output.gnm_link}'
rule calculate_ANI:
input:
pool_dir = "_results/pool_gnms",
output:
ANI_dir = directory("_results/ANI")
shell:
'average_nucleotide_identity.py '
'-o {output.ANI_dir} '
'-i {input.pool_dir}'
What strategy should I follow to accomplish this task? Maybe I should use a checkpoint? Many thanks for any input!
Solution 1:[1]
There is no need in checkpoints. Checkpoints are needed when you don't know the files that a rule would create (e.g. you don't know the number of clusters that the algorithm finds). In your case you have everything you need in the gnm_table. You may define a rule that claims all files that need to be copied in rule pool_files as its input. The output of this rule may be a flag, and this flag could be an input for rule calculate_ANI.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Dmitry Kuzminov |
