'Unable to add/import additional python library datacompy in aws glue

i am trying to import additional python library - datacompy in to the glue job which use version 2 with below step

  1. Open the AWS Glue console.

  2. Under Job parameters, added the following:

  3. For Key, added --additional-python-modules. For Value, added datacompy==0.7.3, s3://python-modules/datacompy-0.7.3.whl.

from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

import datacompy

from py4j.java_gateway import java_import
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

## @params: [JOB_NAME, URL, ACCOUNT, WAREHOUSE, DB, SCHEMA, USERNAME, PASSWORD]
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'URL', 'ACCOUNT', 'WAREHOUSE', 'DB', 'SCHEMA','additional-python-modules'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)


but the job return the error

module not found error no module named 'datacompy'

how to resolve this issue?



Solution 1:[1]

With Spark 2.4, Python 3 (Glue Version 2.0)

I set the following Job Parameter

enter image description here

Then I can import it my Job like so

import pandas as pd
import numpy as np
import datacompy


df1 = pd.DataFrame(np.random.randn(10,2), columns=['a','b'])
df2 = pd.DataFrame(np.random.randn(10,2), columns=['a','b'])

compare = datacompy.Compare(df1, df2, join_columns='a')


print(compare.report())

and when I check the CW Log for the Job Run

enter image description here

If you're using a Python Shell Job, try the following

Create a datacompy whl file or you can download it from PYPI

upload that file to an S3 bucket

Then enter the path to the s3 whl file in the Python library path box

s3://my-bucket/datacompy-0.8.0-py3-none-any.whl

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1