'Properly handling Escape Characters in Boto3
I have a S3 Bucket Streaming logs to a lambda function that tags files based on some logic.
While I have worked around this issue in the past and I understand there are some characters that need to be handled I'm wondering if there is a safe way to handle this with some API or is it something I need to handle on my own.
For example I have a lambda function like so:
import boto3
def lambda_handler(event, context):
s3 = boto3.client("s3")
for record in event["Records"]:
bucket = record["s3"]["bucket"]["name"]
objectName = record["s3"]["object"]["key"]
tags = []
if "Pizza" in objectName:
tags.append({"Key" : "Project", "Value" : "Great"})
if "Hamburger" in objectName:
tags.append({"Key" : "Project", "Value" : "Good"})
if "Liver" in objectName:
tags.append({"Key" : "Project", "Value" : "Yuck"})
s3.put_object_tagging(
Bucket=bucket,
Key=objectName,
Tagging={
"TagSet" : tags
}
)
return {
'statusCode': 200,
}
This code works great. I upload a file to s3 called Pizza-Is-Better-Than-Liver.txt then the function runs and tags the file with both Great and Yuck (sorry for the strained example).
However If I upload the file Pizza Is+AmazeBalls.txt things go sideways:
Looking at the event in CloudWatch the object key shows as: Pizza+Is%2BAmazeBalls.txt.
Obviously the space is escaped to a + and the + to a %2B when I pass that key to put_object_tagging() it fails with a NoSuchKey Error.
My question; is there a defined way to deal with escaped characters in boto3 or some other sdk, or do I just need to do it myself? I really don't and to add any modules to the function and I could just use do a contains / replace(), but it's odd I would get something back that I can't immediately use without some transformation.
I'm not uploading the files and can't mandate what they call things (i-have-tried-but-it-fails), if it's a valid Windows or Mac filename it should work (I get that is a whole other issue but I can deal with that).
Solution 1:[1]
Since no other answers I guess I post my bandaid:
def format_path(path):
path = path.replace("+", " ")
path = path.replace("%21", "!")
path = path.replace("%24", "$")
path = path.replace("%26", "&")
path = path.replace("%27", "'")
path = path.replace("%28", "(")
path = path.replace("%29", ")")
path = path.replace("%2B", "+")
path = path.replace("%40", "@")
path = path.replace("%3A", ":")
path = path.replace("%3B", ";")
path = path.replace("%2C", ",")
path = path.replace("%3D", "=")
path = path.replace("%3F", "?")
return path
I'm sure there is a simpler, more complete way to do this but this seems to work... for now.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | hoss |
