'Snakemake: MissingInputException Missing input file

I'm beginner with Python but I have to run a snakemake script with a rules file and a config file. But I'm blocked with this error:

MissingInputException(job.rule, missing_input)
snakemake.exceptions.MissingInputException: Missing input files for rule all:
Kmer/FS01377_plot.pdf
Kmer/FS070_summary.txt
Kmer/FS01255_summary.txt
Kmer/FS01245_summary.txt [...]

I guess I'm missing something but I can't find it. When I write SAMPLES, = glob_wildcards(join(datadir, "{sample}_fastp.fastq.gz")) without the R1, it work for "rule all" but not for the rest of the script. Here my rule file :

#!/usr/bin/env python
from snakemake.shell import shell
from os.path import join
shell.executable("/bin/bash")
#set the workdir
workdir:config["workdir"]
datadir=config["datadir"]

SAMPLES, = glob_wildcards(join(datadir, "{sample}_R1_fastp.fastq.gz"))

# Patterns for the 1st mate and the 2nd mate using the 'sample' wildcard. Snakemake regular expression matching the forward mate FASTQ files.
PATTERN_R1 = '{sample}_R1_fastp.fastq.gz'
PATTERN_R2 = '{sample}_R2_fastp.fastq.gz'

## SNAKEMAKE PROCESS

#All
rule all:
  input:
    expand("Kmer/{sample}_histo_jellyfish.histo", sample=SAMPLES),
    expand("Kmer/{sample}_plot.pdf", sample=SAMPLES),
    expand("Kmer/{sample}_plot.log.pdf", sample=SAMPLES),
    expand("Kmer/{sample}_model.txt", sample=SAMPLES),
    expand("Kmer/{sample}_progress.txt", sample=SAMPLES),
    expand("Kmer/{sample}_summary.txt", sample=SAMPLES)

## K-mer process
rule merge_fastq:
  input:
    r1 = join(datadir, PATTERN_R1),
    r2 = join(datadir, PATTERN_R2)
  output:
    merge_fastq="Kmer/{sample}_merge_fastp.fastq" 
  message:
    "Merge fastq : {wildcards.sample}"
  run:
            shell(
               "cat {input.fastp_R1} {input.fastp_R2} > Kmer/{wildcards.sample}_merge_fastp.fastq.gz "
               "gzip -d {wildcards.sample}_merge_fastp.fastq.gz "
               )
rule jellyfish_process:
  input:
    merge_fastq="Kmer/{sample}_merge_fastp.fastq"
  output:
    reads_jellyfish="Kmer/{sample}_reads_jellyfish"
  message:
    "Jellyfish processing - Counting k-mer : {wildcards.sample}"
    run:
            shell(
               ". /appli/bioinfo/jellyfish/2.2.10/env.sh ; "
               "jellyfish count -C -m 21 -s 1000000 -t 30 {input.merge_fastq} -o {output.reads_jellyfish} ; "
               ". /appli/bioinfo/jellyfish/2.2.10/delenv.sh"
               )
rule jellyfish_histogram:
  input:
    reads_jellyfish="Kmer/{sample}_reads_jellyfish"
  output:
    histo_jellyfish="Kmer/{sample}_histo_jellyfish.histo"
  message:
    "Jellyfish processing - Exporting the histogram : {wildcards.sample}"
  run:
            shell(
               ". /appli/bioinfo/jellyfish/2.2.10/env.sh ; "
               "jellyfish histo -t 10 {input.reads_jellyfish} > {output.histo_jellyfish} ; "
               "rm Kmer/{wildcards.sample}_reads_jellyfish "
               "rm Kmer/{wildcards.sample}_merge_fastp.fastq "
               ". /appli/bioinfo/jellyfish/2.2.10/delenv.sh"
               )
rule GenomeScope:
  input:
    histo_jellyfish="Kmer/{sample}_histo_jellyfish.histo"
  output:
    plot_genomescope="GenomeScope/{sample}_plot.pdf",
    model_genomescope="GenomeScope/{sample}_model.txt",
    plotlog_genomescope="GenomeScope/{sample}_plot.log.pdf",
    progress_genomescope="GenomeScope/{sample}_progress.txt",
    summary_genomescope="GenomeScope/{sample}_summary.txt"
  message:
    "GenomeScope processing - Fitting the model : {wildcards.sample}"
  run:
            shell(
               ". /appli/bioinfo/genomescope2/2.0/env.sh ; "
               "mv {input.histo_jellyfish} ~ "
               "module load R "
               "R --vanilla --slave --args "
               "{wildcards.sample}_histo_jellyfish.histo "
               "21 "
               "150 "
               "OutputGenomeScope "
               "10000 "
               "Summary "
               "{wildcards.sample} "
               "< ~/GenomeScope/genomescope_cluster.R "
               "mv OutputGenomeScope/model.txt {output.model_genomescope} && "
               "mv OutputGenomeScope/plot.log.pdf {output.plotlog_genomescope} && "
               "mv OutputGenomeScope/plot.pdf {output.plot_genomescope} && "
               "mv OutputGenomeScope/progress.txt {output.progress_genomescope} && "
               "mv OutputGenomeScope/summary.txt {output.summary_genomescope} && "
               "mv {wildcards.sample}_histo_jellyfish.histo fastp.fastq/{wildcards.species}/GenomeScope"
               ". /appli/bioinfo/genomescope2/2.0/delenv.sh"
               )

Thank you for your help.



Solution 1:[1]

There may be something else going on but I guess that in rule all:

expand("Kmer/{sample}_plot.pdf", sample=SAMPLES),

should be:

expand("GenomeScope/{sample}_plot.pdf", sample=SAMPLES),

in order to be consistent with the output of rule GenomeScope (and the same for other files in rule all).

(This is one of the things of snakemake I really like: you catch these inconsistencies very early and the pipeline doesn't even start before you fix them)


By the way, instead of using the shell() function inside the run directive, I would do:

shell:
    r"""
    . /appli/bioinfo/genomescope2/2.0/env.sh
    mv {input.histo_jellyfish} ~
    ..etc etc
    """

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1