[Fixed] MultiQC snakemake wrapper: ModuleNotFoundError No module named 'imp' – Snakemake

by
Alexei Petrov
python snakemake

The Problem:

MultiQC snakemake wrapper breaks with the error ‘ModuleNotFoundError No module named \u2019imp\u2019’ when running a FastQC and MultiQC pipeline. The error arises from the \u2018imp\u2019 module being not found. Further, the ‘imp’ module is an integral part of Python. Earlier versions and newer versions of the wrapper yield the same error.

The Solutions:

Solution 1: Use Python version less than 3.12.0

The imp module is deprecated in favor of importlib and removed from the standard library since Python version 3.12.0. Depending on the Python version your program-call/conda-environment is using, you will encounter a ModuleNotFoundError error.

To resolve this issue, you can create a new Conda environment using a Python version less than 3.12.0 and install MultiQC in that environment. Here’s an example of a YAML file (multiqc_env.yaml) you can use to create the environment:

name: multiqc_env
channels:
  - bioconda
  - conda-forge
  - default
dependencies:
  - python<3.12
  - multiqc

Once you have created the environment, you can activate it and run MultiQC as usual. Here’s an example of a Snakemake rule (run_multiqc) that you can use to run MultiQC:

rule run_multiqc:
  input:
    some_input_file
  output:
    some_output_file
  params:
    extra="-d",
    output_dir=lambda wildcards, output: os.path.dirname(output[0]),
    output_file_name=lambda wildcards, output: os.path.basename(output[0]),
    input_directories=lambda wildcards, input: set(os.path.dirname(fp) for fp in input)
  log:
    ...
  conda:
    "multiqc_env.yaml"
  shell:
    "multiqc {params.extra} --force "
    "-o {params.output_dir} -n {params.output_file_name} "
    "{params.input_directories} &> {log}"

This rule assumes that you have an input file (some_input_file) and that you want to generate an output file (some_output_file). You can adjust the input and output files as needed for your specific use case.

By using a Python version less than 3.12.0, you can avoid the ModuleNotFoundError error and run MultiQC successfully.

Q&A

What module is used for loading code in packages and modules?

imp module is used for loading code in packages and modules.

What is the reason behind ModuleNotFoundError?

The ‘imp’ module was removed from the standard library since python version 3.12.0.

How to fix the ModuleNotFoundError?

Fix the used python on a version < 3.12.0.