'AWS Glue 3.0 PySpark: different behavior when installing dependencies using wheels vs installing same dependencies with Glue itself
Having a problem launching PySpark job that utilizes connection to RedShift via awswrangler lib. Everything works fine if using --additional-python-modules: awswrangler==2.10.0 parameter (which I suppose makes Glue to utilize pip install awswrangler==2.10.0 under the hood). But this approach is restricted because we're using company's artifactory as dependency repo.
However, if I set awsrangler wheel (and it's connected dependencies as a wheels) using Glue's 'Python libraries path', I'm getting redshift connection error (NotADirectoryError, which caused by ssl settings presumably). The question is why behavior is different? The list of wheels I'm getting by 'pip freeze' after awswrangler installation on clear virtual env.
Will be appreciated to any clues\ideas.
Upd: the second question is: can custom artifacts repo be set for pulling dependencies by Glue?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|