'Error when running Python FLEX Template: module from subdirectory cannot be found
i am attempting to run a Dataflow job using Flex template, but i am getting stuck on a 'module not found error' and i cannot figure out why so here is the structure of my directory
|__ modules
|____ edgar_quarterly_form4.py
|____ __init__.py
|__ main.py
|__ setup.py
|__ __init__.py
my main.py has this import in its code
from modules import edgar_quarterly_form4
and here's my dockerfile
FROM gcr.io/dataflow-templates-base/python3-template-launcher-base
ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
RUN mkdir -p ${WORKDIR}/modules
WORKDIR ${WORKDIR}
COPY spec/python_command_spec.json ${WORKDIR}/python_command_spec.json
COPY modules ${WORKDIR}/modules
ENV DATAFLOW_PYTHON_COMMAND_SPEC ${WORKDIR}/python_command_spec.json
RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.27.0
COPY __init__.py ${WORKDIR}/__init__.py
COPY setup.py ${WORKDIR}/setup.py
COPY main.py ${WORKDIR}/main.py
# Super important to add these lines.
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
And here's my setup.py file
import setuptools
REQUIRED_PACKAGES = [
'numpy',
'beautifulsoup4',
'pandas',
'sendgrid==6.2.1',
'lxml',
'pandas_datareader',
'apache-beam[gcp]==2.27.0',
]
setuptools.setup(
packages=setuptools.find_packages(),
install_requires=REQUIRED_PACKAGES,
)
However, every time my template runs i am getting this error
368, in load_session module = unpickler.load() File "/usr/local/lib/python3.7/site-
packages/dill/_dill.py", line 472, in load obj = StockUnpickler.load(self) File
"/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 827, in _import_module return
getattr(__import__(module, None, None, [obj]), obj) ModuleNotFoundError: No module named
'modules'
And i cannot figure out why. I have added some echos to my docker file to see if all of the files have been copied, and all the files have been copied successfully to the image... so i cannot really figure out what's going on Please note i am getting exacty the same error even if the edgar_quarterly_form4.py file is in the same directory as main.py
kind regards Marco
Solution 1:[1]
Ok, it seeems that with beams 2.27 this solution does not work Instead, you shoudl follow what is outlined in this thread
Including another file in Dataflow Python flex template, ImportError
you'll have to add a setup_file parameter to your metadata, and pass a
--parameter setup_file=
Solution 2:[2]
The issue is with Dataflow implemenation w.r.t documentation.
Somehow FLEX_TEMPLATE_PYTHON_SETUP_FILE is not getting honored, and thus need to pass setup_file parameter explicitely like --parameter setup_file=.
Also, the name of the setup file MUST be setup.py.
A working example of the same could be found here https://github.com/toransahu/apache-beam-eg/tree/main/python/using_flex_template_adv1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | user1068378 |
| Solution 2 | Toran Sahu |
