'Unable to add/import additional python library datacompy in aws glue
i am trying to import additional python library - datacompy in to the glue job which use version 2 with below step
Open the AWS Glue console.
Under Job parameters, added the following:
For Key, added --additional-python-modules. For Value, added datacompy==0.7.3, s3://python-modules/datacompy-0.7.3.whl.
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import datacompy
from py4j.java_gateway import java_import
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
## @params: [JOB_NAME, URL, ACCOUNT, WAREHOUSE, DB, SCHEMA, USERNAME, PASSWORD]
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'URL', 'ACCOUNT', 'WAREHOUSE', 'DB', 'SCHEMA','additional-python-modules'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
but the job return the error
module not found error no module named 'datacompy'
how to resolve this issue?
Solution 1:[1]
With Spark 2.4, Python 3 (Glue Version 2.0)
I set the following Job Parameter
Then I can import it my Job like so
import pandas as pd
import numpy as np
import datacompy
df1 = pd.DataFrame(np.random.randn(10,2), columns=['a','b'])
df2 = pd.DataFrame(np.random.randn(10,2), columns=['a','b'])
compare = datacompy.Compare(df1, df2, join_columns='a')
print(compare.report())
and when I check the CW Log for the Job Run
If you're using a Python Shell Job, try the following
Create a datacompy whl file or you can download it from PYPI
upload that file to an S3 bucket
Then enter the path to the s3 whl file in the Python library path box
s3://my-bucket/datacompy-0.8.0-py3-none-any.whl
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |


