'Inconsistent behaviour with TCP socket's recv() watermark on Linux
I was delving into socket tuneables, and I encountered the SO_RCVLOWAT option, so I created a test-case to determine whether the watermark eliminated a specific issue I've had in developing servers before (namely premature data truncation:)
def run_server(host: str, port: int, low_watermark: int):
sk_server = socket.socket()
sk_server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, True)
sk_server.setsockopt(
socket.SOL_SOCKET, socket.SO_RCVLOWAT, low_watermark
)
sk_server.bind((host, port))
sk_server.listen(1)
cl_sockfd, cl_addr = sk_server.accept()
print(f"connected to client, receiving at least {low_watermark}"
" bytes of data")
assert len(cl_sockfd.recv(low_watermark)) == low_watermark
# *** never reached
print("all good!")
if __name__ == "__main__":
host, port = "localhost", 6969
watermark = 100
thd_server = threading.Thread(
target=run_server,
args=(host, port, watermark)
)
thd_server.start()
time.sleep(0.5)
client = socket.socket()
client.connect((host, port))
fragment_1 = b"a" * (n := watermark // 2)
fragment_2 = b"a" * (watermark - n)
print("sending first fragment, and waiting 1 second")
print("sent", client.send(fragment_1), "bytes")
time.sleep(1)
print("sending second fragment")
print("sent", client.send(fragment_2), "bytes")
print("done. waiting for server thread to finish")
thd_server.join()
In short, the client sends exactly watermark bytes to the server which is configured to receive at least watermark bytes, however the principal issue is that the server hangs on the cl_sockfd.recv(low_watermark).
As a sanity-check, I send a single fragment of size watermark, and of course this works as expected, and any single fragments below watermark are rejected. But, the curious observation I make is that with two fragments the only way the server will accept the send() is when the second fragment has size watermark, in which case the two fragments combine as expected, but the remainder length after watermark is discarded, but if I instead send three fragments, the first two of which combine to make length watermark, but the third fragment has size watermark // 2 then it all works, but now the third fragment is discarded.
This does not make sense at all. Is there any explanation for this?
Post-notes:
- I experimented by enabling
TCP_NODELAYso thatsend()buffers are immediately flushed, to question whether Linux was post-maturely flushing the buffer (after the third fragment triggered a flush,) but behaviour remained identical - In the scenario of three fragments, totalling 104 bytes of data, no extraneous data is left to
recv(), which is expected considering the watermark enforces a minimum, but then after sending the remaining amount to total2*watermarktherecv()works perfectly as expected even though it's spread over twosend()s, which again makes no sense considering it failed to receive the initial two fragments, despite them matching the same criteria.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
