'Running PySpark script in EC2 via docker build

I have some PySpark code sitting in a Bitbucket repository which contains a Dockerfile. I would like to run this in my EC2 instance via docker build however I'm constantly getting errors.

The PySpark (version 3.2) code reads files, applies transformations, and writes them to CSVs.

The Dockerfile:

# For more information, please refer to https://aka.ms/vscode-docker-python
FROM python:3.8-slim

# Keeps Python from generating .pyc files in the container
ENV PYTHONDONTWRITEBYTECODE=1

# Turns off buffering for easier container logging
ENV PYTHONUNBUFFERED=1

# Activate virtualenv
ENV VIRTUAL_ENV=/opt/crn-venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

# Install pip requirements
COPY requirements.txt .
RUN python -m pip install -r requirements.txt

WORKDIR /app
COPY . /app

# Creates a non-root user with an explicit UID and adds permission to access the /app folder
# For more info, please refer to https://aka.ms/vscode-docker-python-configure-containers
RUN adduser -u 5678 --disabled-password --gecos "" appuser && chown -R appuser /app
USER appuser

# Install Java and set JAVA_HOME
# ADDED THIS AFTER GETTING A JAVA_HOME NOT FOUND ERROR
RUN apt-get update && \
    apt-get install -y openjdk-8-jdk && \
    apt-get install -y ant && \
    apt-get clean && \
    mkdir -p /var/lib/apt/lists/partial && \
    rm -rf /var/lib/apt/lists/ && \
    rm -rf /var/cache/oracle-jdk8-installer;

ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
RUN export JAVA_HOME

# During debugging, this entry point will be overridden. For more information, please refer to https://aka.ms/vscode-docker-python-debug
CMD ["python", "crn_fix.py"]

In EC2, this is how I'm trying to run the code:

sudo docker build -t myimage https://path/to/myrepo.git#main

Error I receive is:

Step 13/17 : RUN apt-get update &&     apt-get install -y openjdk-8-jdk &&     apt-get install -y ant &&     apt-get clean &&     sudo mkdir -p /var/lib/apt/lists/partial &&     rm -rf /var/lib/apt/lists/ &&     rm -rf /var/cache/oracle-jdk8-installer;
 ---> Running in e8d942dac9c3

Reading package lists...
E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)
The command '/bin/sh -c apt-get update &&     apt-get install -y openjdk-8-jdk &&     apt-get install -y ant &&     apt-get clean &&     sudo mkdir -p /var/lib/apt/lists/partial &&     rm -rf /var/lib/apt/lists/ &&     rm -rf /var/cache/oracle-jdk8-installer;' returned a non-zero code: 100

I've tried adding mkdir -p /var/lib/apt/lists/partial but still the same error. Also tried with sudo, still the same error.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source