'How do I capture the time a tcp socket is moving from ESTABLISHED TO WAIT state

I have a Java WebSocket server that uses Netty to establish WebSocket connections. This is basically a chat application where a few agents will be interacting with many players.

The agents connect to chat server from UI running on their browser. The agents can pick players from UI and chat with them, where as the players connect to server from the game websites.

The agents send pings every 5 seconds to keep connection alive which was OK until there was a Chrome update which introduced throttling, which was terminating socket connections when the agent tab goes idle for more than 5 minutes. To tackle this, I have increased the timeout for agents to 75 seconds on server side (earlier 30 seconds) and observed that pings were happening but every 1 minute even in throttle, which was sufficient to keep connection alive. What I want is that the agents should never disconnect, unless the person logs out or the internet disconnects.

There are about 10 agents and all of them connect to a remote machine via a VPN, which has very good configuration and highly available internet.

The problem I am facing is that some agents are experiencing random WebSocket disconnections, which I can confirm from the server logs.

I have an account on production and tested this myself, stayed idle for more than 2 hours (minimized browser), there was no disconnection, I am trying to figure out what else can cause disconnection, is my server terminating it or is it from the client.

I am trying to check socket state on server side using ss -apn as learned if I see a CLOSE_WAIT it means that termination request came from agent UI side, and a TIME_WAIT would indicate that my server is terminating connection, knowing upon which I can start debugging on the right side.

An example of the output of the command:

Netid  State  Recv-Q Send-Q Local Address:Port         Peer Address:Port              

tcp  TIME-WAIT  0      0   [::ffff:127.0.0.1]:1234   [::ffff:127.0.0.1]:45962

tcp  ESTAB      0      0   [::ffff:127.0.0.1]:1234   [::ffff:127.0.0.1]:26670  users:(("java",pid=7571,fd=2096))

How do I capture the exact moment when a state changes from ESTABLISHED TO a WAIT state? I wrote a script that would print the above output every 1 second with date, which I think is not right as the WAIT state may be cleared even before a second time.

In my server logs I know the exact time when the agent got disconnected, I would then like to check the socket state at that time.

Any thoughts please?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source