'Possible Stuckness: Google Cloud PubSub to Cloud Storage

I have a Dataflow streaming job that writes PubSub messages to a file that gets stored in Cloud Storage in 3-minute windows. After a few hours I notice on the "Data Freshness by stages" graph it displays "Possible Stuckness" and "Possible slowness".

I have checked the logs and the info logs displays the follow: "Setting socket default timeout to 60 seconds."; "socket default timeout is 60.0 seconds."; "Attempting refresh to obtain initial access_token."; "Refreshing due to a 401 (attempt 1/2)". That last log kept repeating every few minutes for four hours before the job displayed that there was possible slowness/stuckness.

I am not entirely sure what is happening here. Are these logs related to why the job slowed down and got stuck?



Solution 1:[1]

The "potential stuckness" and "potential slowness" are basically the same thing, they are documented here.

The logs might be red herrings.

You can view all available logs following here by their categories: job-message, worker, worker-startup and etc. Try

  • identify if there is any worker logs to determine whether workers are successfully started with dependencies installed;
  • search "Operation ongoing" to see whether any work item is taking too much time;
  • search if there is any error in workers that is blocking the streaming job from making progress.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ningk