'Java S3 upload large file (~ 1.5Tb) erroring out with ResetException. File is read/processed via InputStream
I have application running in Java. I have a large file that i encrypt and upload to S3. As file is huge, I cannot keep it in-memory and hence use PipedInput and PipedOutputStreams to do my encryption. I have BufferedInputStream wrapping PipedInputStream which then passed to S3 PutObjectRequest. I have already calculated the size of encrypted object and added that to Objectmetadata. Here are some code pieces:
PipedInputStream pis = new PipedInputStream(uploadFileInfo.getPout(), MAX_BUFFER_SIZE);
BufferedInputStream bis = new BufferedInputStream(pis, MAX_BUFFER_SIZE);
LOG.info("Is mark supported? " + bis.markSupported());
PutObjectRequest putObjectRequest = new PutObjectRequest(uploadFileInfo.getS3TargetBucket(),
uploadFileInfo.getS3TargetObjectKey() + ".encrypted",
bis, metadata);
//Set read limit to more than stream size expected i.e 20mb
// https://github.com/aws/aws-sdk-java/issues/427
LOG.info("set read limit to " + (MAX_BUFFER_SIZE + 1));
putObjectRequest.getRequestClientOptions().setReadLimit(MAX_BUFFER_SIZE + 1);
Upload upload = transferManager.upload(putObjectRequest);
My stack trace shows that reset() call to BufferedInputStream is throwing exception
[UPLOADER_TRACKER] ERROR com.xxx.yyy.zzz.handler.TrackProgressHandler - Exception from S3 transfer
com.amazonaws.ResetException: The request to the service failed with a retryable reason, but resetting the request input stream has failed. See exception.getExtraInfo or debug-level logging for the original failure that caused this retry.; If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1423)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1240)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998)
at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3734)
at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3719)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:258)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:189)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:121)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:143)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:48)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Resetting to invalid mark
at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.reset(MD5DigestCalculatingInputStream.java:105)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1421)
... 22 more
[UPLOADER_TRACKER] ERROR com.xxx.yyy.zzz.handler.TrackProgressHandler - Reset exception caught ==> If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
com.amazonaws.ResetException: The request to the service failed with a retryable reason, but resetting the request input stream has failed. See exception.getExtraInfo or debug-level logging for the original failure that caused this retry.; If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1423)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1240)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998)
at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3734)
at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3719)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:258)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:189)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:121)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:143)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:48)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Resetting to invalid mark
at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.reset(MD5DigestCalculatingInputStream.java:105)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1421)
However I am adding readLimit to MAX_BUFFER_SIZE + 1. This is a reliability tip from AWS. Anyone faced this issue earlier? Side points: since m encrypting the file i need to use inputstream as compared to File or FileInputStream. I dont have permissions to write on disk locally as well.
Solution 1:[1]
I think you're misinterpreting the recommendation. Quoting from the link you provided, with emphasis added:
For example, if the maximum expected size of a stream is 100,000 bytes, set the read limit to 100,001 (100,000 + 1) bytes. The mark and reset will always work for 100,000 bytes or less. Be aware that this might cause some streams to buffer that number of bytes into memory.
As I interpret this, it configures the client to be able to locally buffer content from the source stream, when that stream does not support mark/reset on its own. This is consistent with the documentation for RequestClientOptions.DEFAULT_STREAM_BUFFER_SIZE:
Used to enable mark-and-reset for non-mark-and-resettable non-file input stream
In other words, it's used to buffer an entire source stream within the client, not to specify how large a chunk to send from the source stream. And in your case, I think it's ignored, because (1) you're not buffering the entire stream, and (2) the stream that you pass does implement mark/reset on its own.
A multi-part upload, which is what TransferManager is doing in your example, breaks the input stream into chunks of at least 5 MB (the actual chunk size depends on the declared size of the stream; for a 1.5 TB file it's around 158 MiB). These are uploaded using the UploadPart API call, which attempts to send an entire chunk at a time. If a part fails from a retryable cause, then the client attempts to reset the stream to the start of the chunk.
You can probably make this work by setting the read limit on your BufferedInputStream to a size large enough to hold a single part. The calculation that the transfer manager uses is here; it's the size of the file divided by 10,000 (the maximum number of parts in a multi-part upload). So, again, 158 MiB. I'd use 200 MiB just to be safe (and because I'm sure you have bigger files).
If it were me, however, I would probably use the low-level multi-part upload methods directly. The main benefit of TransferManger, in my opinion, is to be able to upload a file, where it can utilize multiple threads to perform concurrent part uploads. With a stream, you have to process each part sequentially.
Actually, if it was me I'd seriously reconsider uploading a single 1.5 TB file. Yes, you can do it. But I can't imagine that you're downloading the entire file every time that you want to read it. Instead, I expect that you're downloading a byte range. In which case, you'll probably find it just as easy to work with, say, 1500 files that are each 1 GiB in size.
Solution 2:[2]
This seems to be a known issue with the S3 SDK and BufferedInputStream
See https://github.com/aws/aws-sdk-java/issues/427#issuecomment-273550783
The simplest solution (even if not ideal) is to download the file locally and pass the File object to the S3 SDK like so
InputStream inputStream = ...;
File tempFile = File.createTempFile("upload-temp", "")
FileUtils.copyInputStreamToFile(inputStream, file); // or any other copy utility
PutObjectRequest putObjectRequest = new PutObjectRequest(bucket, key, tempFile);
Upload upload = transferManager.upload(putObjectRequest);
tempFile.deleteOnExit();
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Parsifal |
| Solution 2 |
