'Google Cloud Storage not strongly consistent, returning 404 until 500ms after a resumable upload completes

I am making a file uploader using Cloud Storage which is experiencing non-consistent behaviour which seems contrary to the documentation.

When you upload an object to Cloud Storage, and you receive a success response, the object is immediately available for download and metadata operations from any location where Google offers service. This is true whether you create a new object or replace an existing object. Because uploads are strongly consistent, you will never receive a 404 Not Found response or stale data for a read-after-write or read-after-metadata-update operation. https://cloud.google.com/storage/docs/consistency#strongly_consistent_operations

... but I receive a 404 if I read it immediately after upload.

The process is as follows:

  1. my backend NodeJS API initiates a resumable upload creating a Session URI to a bucket
  2. then the user uploads the file directly to GCS via a PUT to the Session URI from the browser
  3. the front-end posts an update to my API to say the upload is complete.
  4. my API then tries to download the same file as a stream and ingest it

I got it all working, but then found that when a new file is uploaded (i.e. doesn't already exist in the bucket), there is a 500ms delay required between the upload finishing (step 2) and the read succeeding (step 4). If I do it without the delay I get a 404.

The docs states that generally uploads are available immediately, unless there's some caching in place.

Important: Cached objects that are publicly readable might not exhibit strong consistency. See Cache control and consistency for details.

I'm using the XMLHttpRequest to upload the file to GCS and using the load event to detect the completed upload. From what I read this should mean the 200 response has been received, and therefore the file is in place. Although debugging the load event shows it's just another "progress" event at 100%.

What I've tried

The workaround is to add a setTimeout(done, 500) to the final callback in the load event handler, before calling my API at step 3.

I've tested this dozens of times and it's reliable, repeatable where 0 - 400ms fails and about 500ms+ "fixes" it always.

I've tried adding the cache control headers to the original POST as recommended, which sets up the upload session to have no caching - adding no-store which seemed the right one. This I can see reflected in the headers of the PUT (it actually puts more no-cache options in the response than I set). This didn't seem to affect the behaviour at all.

If the file is already there in the bucked and gets overwritten, this doesn't happen. (Although I guess there might still be race condition in the contents if I uploaded a different file).

I can't seem to catch the exception so I don't really know which call to GCS is returning the 404, whether it's bucket.file() or remoteFile.createReadStream() or later reading from it (which is deep in some other library that I'm passing the readable stream into).

I haven't tried a try/retry loop because I can't even catch the error. That's what I'd like to do if I can't get consistent behaviour guaranteed.

I have tried using the gcs-resumable-upload package and the direct use of Storage.File, both seem to work the same.

The Code

The NodeJS API which starts the upload is like this:

1a) gcs-resumable-upload version

const {createURI} = require('gcs-resumable-upload');

        const sessionURI = await createURI({
            bucket: bucketName,
            file: filename,
            origin: origin,
            customRequestOptions: {                     //todo: this doesn't fixe the race
                headers: {
                    'Cache-Control': 'no-store',
                },
            },
        });

1b) Storage.File version

const {Storage, File} = require('@google-cloud/storage');
const storage = new Storage();
        const bucket = storage.bucket(bucketName);
        const file = bucket.file(filename);
        const resp = await file.createResumableUpload({origin: origin})
        const sessionURI = resp[0];
  1. The upload step looks like this, (JS in the browser) which opens the file and uploads it:
            var reader = new FileReader();
            var xhr = new XMLHttpRequest();

            xhr.upload.addEventListener("load", function(e){
                setTimeout(done, 500);// todo I get 404s in the next step without 500ms delay?
                // done();  // fails
            }, false);

            xhr.open("PUT", sessionUrl);
            xhr.overrideMimeType('text/plain; charset=x-user-defined-binary');
            reader.onload = function(evt) {
                xhr.send(evt.target.result);
            };
            reader.readAsBinaryString(file);
  1. The backend NodeJS API basically does this (with some error handling):
const {Storage} = require('@google-cloud/storage');
const storage = new Storage();

        const bucket = storage.bucket(bucketName);
        let remoteFile, stream;
        remoteFile = bucket.file(filename);
        stream = remoteFile.createReadStream()

stream is then returned and sent off to a library which uses it to read the contents.

This is where it errors, although it's erroring async in a tick event, and I haven't managed to try/catch it from anywhere yet (which is a bit odd).

The error

The error stack is:

 <ref *2> ApiError: No such object: MY-BUCKETNAME/MY-FILENAME
    at new ApiError (node_modules/@google-cloud/common/build/src/util.js:59:15)
    at Util.parseHttpRespMessage (node_modules/@google-cloud/common/build/src/util.js:161:41)
    at Util.handleResp (node_modules/@google-cloud/common/build/src/util.js:135:76)
    at Duplexify.<anonymous> (node_modules/@google-cloud/storage/build/src/file.js:880:31)
    at Duplexify.emit (events.js:314:20)
    at Duplexify.EventEmitter.emit (domain.js:548:15)
    at PassThrough.emit (events.js:314:20)
    at PassThrough.EventEmitter.emit (domain.js:548:15)
    at onResponse (node_modules/retry-request/index.js:208:19)
    at PassThrough.<anonymous> (node_modules/retry-request/index.js:155:11)
    at PassThrough.emit (events.js:326:22)
    at PassThrough.EventEmitter.emit (domain.js:548:15)
    at node_modules/teeny-request/build/src/index.js:184:27
    at processTicksAndRejections (internal/process/task_queues.js:93:5) 

and the error message is a big structure:

{
  code: 404,
  errors: [],
  response: <ref *1> PassThrough {
    _readableState: ReadableState {
      objectMode: false,
      highWaterMark: 16384,
      buffer: BufferList { head: null, tail: null, length: 0 },
      length: 0,
      pipes: [],
      flowing: false,
      ended: true,
      endEmitted: true,
      reading: false,
      sync: false,
      needReadable: false,
      emittedReadable: false,
      readableListening: false,
      resumeScheduled: false,
      errorEmitted: false,
      emitClose: true,
      autoDestroy: true,
      destroyed: true,
      errored: null,
      closed: true,
      closeEmitted: true,
      defaultEncoding: 'utf8',
      awaitDrainWriters: Set(0) {},
      multiAwaitDrain: true,
      readingMore: false,
      decoder: null,
      encoding: null,
      [Symbol(kPaused)]: true
    },
    _events: [Object: null prototype] {
      prefinish: [Function: prefinish],
      error: [Array],
      close: [Array],
      end: [Function: onend],
      finish: [Function: onfinish]
    },
    _eventsCount: 5,
    _maxListeners: undefined,
    _writableState: WritableState {
      objectMode: false,
      highWaterMark: 16384,
      finalCalled: false,
      needDrain: false,
      ending: true,
      ended: true,
      finished: true,
      destroyed: true,
      decodeStrings: true,
      defaultEncoding: 'utf8',
      length: 0,
      writing: false,
      corked: 0,
      sync: false,
      bufferProcessing: false,
      onwrite: [Function: bound onwrite],
      writecb: null,
      writelen: 0,
      afterWriteTickInfo: null,
      buffered: [],
      bufferedIndex: 0,
      allBuffers: true,
      allNoop: true,
      pendingcb: 0,
      prefinished: true,
      errorEmitted: false,
      emitClose: true,
      autoDestroy: true,
      errored: null,
      closed: true
    },
    allowHalfOpen: true,
    statusCode: 404,
    statusMessage: 'Not Found',
    request: {
      agent: false,
      headers: [Object],
      href: 'https://storage.googleapis.com/storage/v1/b/MY-BUCKETNAME/o/MY-FILENAME?alt=media'
    },
    body: [Circular *1],
    headers: {
      'alt-svc': 'h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"',
      'cache-control': 'private, max-age=0',
      connection: 'close',
      'content-length': '55',
      'content-type': 'text/html; charset=UTF-8',
      date: 'Tue, 23 Mar 2021 08:08:50 GMT',
      expires: 'Tue, 23 Mar 2021 08:08:50 GMT',
      server: 'UploadServer',
      vary: 'Origin, X-Origin',
      'x-guploader-uploadid': 'ABg5-Uz0P1kWSLFABXOpJ_mbQY5-4wEnMekQduBli1S4aYDWoIgqVKG1M5zlZ_ePd0iJDlzCl_ThYvmFpvcXpgwCcnN993kZog'
    },
    toJSON: [Function: toJSON],
    [Symbol(kCapture)]: false,
    [Symbol(kTransformState)]: {
      afterTransform: [Function: bound afterTransform],
      needTransform: false,
      transforming: false,
      writecb: null,
      writechunk: null,
      writeencoding: 'buffer'
    }
  },
  domainEmitter: PassThrough {
    _readableState: ReadableState {
      objectMode: false,
      highWaterMark: 16384,
      buffer: BufferList { head: null, tail: null, length: 0 },
      length: 0,
      pipes: [],
      flowing: true,
      ended: false,
      endEmitted: false,
      reading: true,
      sync: false,
      needReadable: true,
      emittedReadable: false,
      readableListening: false,
      resumeScheduled: false,
      errorEmitted: true,
      emitClose: true,
      autoDestroy: true,
      destroyed: true,
      errored: [Circular *2],
      closed: true,
      closeEmitted: false,
      defaultEncoding: 'utf8',
      awaitDrainWriters: null,
      multiAwaitDrain: false,
      readingMore: false,
      decoder: null,
      encoding: null,
      [Symbol(kPaused)]: false
    },
    _events: [Object: null prototype] {
      prefinish: [Function: prefinish],
      reading: [Function: makeRequest],
      data: [Function (anonymous)],
      end: [Function (anonymous)]
    },
    _eventsCount: 4,
    _maxListeners: undefined,
    _writableState: WritableState {
      objectMode: false,
      highWaterMark: 16384,
      finalCalled: false,
      needDrain: false,
      ending: false,
      ended: false,
      finished: false,
      destroyed: true,
      decodeStrings: true,
      defaultEncoding: 'utf8',
      length: 0,
      writing: false,
      corked: 0,
      sync: true,
      bufferProcessing: false,
      onwrite: [Function: bound onwrite],
      writecb: null,
      writelen: 0,
      afterWriteTickInfo: null,
      buffered: [],
      bufferedIndex: 0,
      allBuffers: true,
      allNoop: true,
      pendingcb: 0,
      prefinished: false,
      errorEmitted: true,
      emitClose: true,
      autoDestroy: true,
      errored: [Circular *2],
      closed: true
    },
    allowHalfOpen: true,
    _read: [Function: bound ],
    _write: [Function (anonymous)],
    [Symbol(kCapture)]: false,
    [Symbol(kTransformState)]: {
      afterTransform: [Function: bound afterTransform],
      needTransform: true,
      transforming: false,
      writecb: null,
      writechunk: null,
      writeencoding: null
    }
  },
  domainThrown: false
}


Solution 1:[1]

I have since discovered that the error happens when the first stream read occurs, which in my case was deep in a third-party library which attached on('data') events to the stream I passed it. So bucket.file(path) works, file.createReadStream() works and doesn't error, but as soon as you read from the stream it emits the "file not found error".

So I wrote a pre-test workaround where I open the stream myself, before passing it to the 3rd party library, read it a bit for one 'data' event, and then close it, and create and pass on a second stream. If it fails during this pre-reading, I catch the error, and wait recursively using a timer. This works and I found it catches and fixes errors about 30% of the time, sometimes waiting for about 2s.

I then discovered that the utility method file.exists() reports the same boolean as my test-read loop, so I could simplify the pre-test workaround to use that as the flag to wait.

    async waitTillFileExists(filename, file){
        let retries = 20;
        let delay = 500; //ms

        // Recursive setTimeout, with a promise wrapper that the caller can await.
        return new Promise((resolve, reject)=> {
            const fnTest = async () => {
                const fileExistsResponse = await file.exists()
                if(fileExistsResponse[0]){
                    return resolve(true);
                }
                else if(retries-- > 0){
                    setTimeout(fnTest, delay);
                }
                else reject(`waitTillFileExists ${retries} retries exhausted!`);
            }
            // begin
            setTimeout(fnTest,delay);
        });
    }

This isn't a solution to the original consistency problem but effectively solves it, by assuming the file save and subsequent fetch is not going to be immediately consistent.

Solution 2:[2]

Are you using Cloud CDN or any third-party CDN?

Asking for two reasons:

Cloud Storage is also compatible with third-party CDNs

source

For the best performance when delivering content to users, we recommend using Cloud Storage with Cloud CDN.

source

I would suggest first to look if you already have any CDNs in place that might be affecting object caching and therefore cause the delay you mentioned.

If that is not the case, I suggest using Cloud CDN as the docs state that combined with Cloud Storage it gives the best performance. Besides the optimized performance it might bring, Cloud CDN also has some caching settings you might find interesting.

Finally, you mentioned the use of the no-store flag in the HTTP request, but note the following:

Note: Cache-Control is also a header you can specify in your HTTP requests for an object; however, Cloud Storage ignores this header and sets response Cache-Control headers based on the stored metadata values.

source

Solution 3:[3]

I had similar issue and solved it by waiting for the last readystatechange event to determine the upload had ended instead of the load or loadend events.

xhr.onreadystatechange = (event) => {
    if (xhr.readyState === 4) {
        console.log(xhr.status);
        // upload is really completed
    }
};

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 scipilot
Solution 2 afarre
Solution 3 Pier