'Convert bytes array format read from S3 to numpy array or tensor in AWS SageMaker
I have read some X_train and y_train and uploaded them in a form of in-memory bytes array to s3 as below:
X_train and y_train are one dimensional arrays like:
X_train:
array([[ 2. ],[12.9],[ 1.3],[ 5.1],[ 9.6],[ 8.2],...
y_train:
array([[ 43525.],[135675.],[ 46205.],[ 66029.],[112635.],...
import io
import sagemaker
import sagemaker.amazon.common as smcl
sm_session = sagemaker.Session()
bucket = sm_session.default_bucket()
buffer = io.BytesIO()
# writing train data to the form of tensors:
smcl.write_numpy_to_dense_tensor(buffer, X_train, y_train.reshape(-1))
buffer.seek(0)
# Uploading to s3
file_name = 'Train_data'
folder_name = 'Test_folder'
path_to_train_data = os.path.join(folder_name,'train',file_name)
boto3.resource('s3').Bucket(bucket).Object(path_to_train_data).upload_fileobj(buffer)
I want to read them back from s3 and convery them t their original form:
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket)
buf = io.BytesIO()
bucket.download_fileobj(key_from_s3, buf)
filecontent_bytes = buf.getvalue()
The output of the fileconent_byte is something like this:
b'\n#\xd7\xce(\x00\x00\x00\n\x12\n\x06values\x12\x08\x12\x06\n\x04\x00\x00\x00@\x12\x12\n\...
How can I convert them to their original form? Thanks.
Solution 1:[1]
You will want to decode a byte array properly here. Depending on the format you are seeking you will need to use the appropriate library here. For example for a numpy array the code block would look like the following.
import numpy as np
s = b'hello world'
np.frombuffer(s, dtype='S1', count=5, offset=6)
array([b'w', b'o', b'r', b'l', b'd'], dtype='|S1')
Documentation: https://numpy.org/doc/stable/reference/generated/numpy.frombuffer.html
Also as a clarification you do not always need the data format to be buffered, depending on the algorithm you are using sometimes CSV or libsvm is also allowed, make sure to check what data formats the algorithm you are using can work with.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ram Vegiraju |
