'How to check what's causing write() syscall to hang? Python subprocesses stop working after a while

Here's what the strace results look like for when the Python script is working properly. Both the main process (412) and its subprocesses are working as intended. However, after a while, maybe 40-60 minutes later, the subprocesses start to fail and cannot successfully make write calls.

strace: Process 1693 attached
[pid  1693] close(4)                    = 0 <0.000028>
[pid  1693] openat(AT_FDCWD, "/dev/null", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 4 <0.000077>
[pid  1693] write(1, "options ['Forex']\n", 18) = 18 <0.000048>
[pid  1693] openat(AT_FDCWD, "csv/forex_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000064>
[pid  1693] read(6, "{\"feature\": \"Stocks\", \"speed\": \""..., 348) = 347 <0.000034>
[pid  1693] read(6, "", 1)              = 0 <0.000025>
[pid  1693] close(6)                    = 0 <0.000039>
[pid  1693] openat(AT_FDCWD, "feature_titles/forex.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000062>
[pid  1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0`\0\0\0 \4\3\0\0\0\37\356`"..., 4096) = 388 <0.000035>
[pid   412] close(5)                    = 0 <0.000064>
[pid   412] openat(AT_FDCWD, "csv/crypto_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000143>
[pid   412] read(5, "{\"feature\": \"Stocks\", \"speed\": \""..., 349) = 348 <0.000060>
[pid   412] read(5, "", 1)              = 0 <0.000040>
[pid   412] close(5)                    = 0 <0.000097>
[pid   412] openat(AT_FDCWD, "./display_images/Crypto.ppm", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000125>
[pid  1693] close(6)                    = 0 <0.000027>
[pid   412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000052>
[pid  1693] openat(AT_FDCWD, "/home/pi/logos/stocks/down-1.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000154>
[pid   412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000073>
[pid   412] read(5,  <unfinished ...>
[pid  1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\20\0\0\0\16\4\3\0\0\0\324\1\201"..., 4096) = 199 <0.000035>
[pid   412] <... read resumed> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 61440) = 50542 <0.000554>
[pid   412] read(5, "", 8192)           = 0 <0.000047>
[pid   412] close(5)                    = 0 <0.000095>
[pid  1693] close(6)                    = 0 <0.000044>
[pid  1693] openat(AT_FDCWD, "/home/pi/logos/currencies/EUR.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000074>
[pid  1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 968 <0.000040>
[pid  1693] openat(AT_FDCWD, "/home/pi/logos/currencies/USD.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 7 <0.000146>
[pid  1693] read(7, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 984 <0.000038>
[pid  1693] close(7)                    = 0 <0.000056>
[pid  1693] close(6)                    = 0 <0.000042>
[pid  1693] openat(AT_FDCWD, "/home/pi/logos/stocks/up-1.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000068>
[pid  1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\20\0\0\0\16\4\3\0\0\0\324\1\201"..., 4096) = 192 <0.000039>
[pid  1693] close(6)                    = 0 <0.000045>
[pid  1693] openat(AT_FDCWD, "/home/pi/logos/currencies/USD.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000112>
[pid  1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 984 <0.000039>
[pid  1693] openat(AT_FDCWD, "/home/pi/logos/currencies/JPY.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 7 <0.000151>
[pid  1693] read(7, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 769 <0.000047>
[pid  1693] close(7)                    = 0 <0.000060>
[pid  1693] close(6)                    = 0 <0.000043>
[pid  1693] openat(AT_FDCWD, "./display_images/Forex.ppm", O_RDWR|O_CREAT|O_TRUNC|O_LARGEFILE|O_CLOEXEC, 0666) = 6 <0.000359>
[pid  1693] write(6, "P6\n507 32\n255\n", 14) = 14 <0.000104>
[pid  1693] write(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 48672) = 48672 <0.000269>
[pid  1693] close(6)                    = 0 <0.000400>
[pid  1693] +++ exited with 0 +++
[pid   412] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1693, si_uid=1, si_status=0, si_utime=5, si_stime=9} ---
[pid   412] close(4)                    = 0 <0.000064>

About 40-60 minutes later of running the Python script, here are the strace results. PID 2220 seems to be stuck with the write(1, "options ['Forex']\n", 18 <unfinished ...> syscall? How can I check what is causing it to hang?

strace: Process 2220 attached
[pid   412] close(5)                    = 0 <0.000048>
[pid   412] openat(AT_FDCWD, "csv/crypto_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000163>
[pid   412] read(5, "{\"feature\": \"Stocks\", \"speed\": \""..., 349) = 348 <0.000071>
[pid  2220] close(4 <unfinished ...>
[pid   412] read(5,  <unfinished ...>
[pid  2220] <... close resumed> )       = 0 <0.000083>
[pid   412] <... read resumed> "", 1)   = 0 <0.000105>
[pid  2220] openat(AT_FDCWD, "/dev/null", O_RDONLY|O_LARGEFILE|O_CLOEXEC <unfinished ...>
[pid   412] close(5 <unfinished ...>
[pid  2220] <... openat resumed> )      = 4 <0.000137>
[pid   412] <... close resumed> )       = 0 <0.000136>
[pid   412] openat(AT_FDCWD, "./display_images/Crypto.ppm", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000099>
[pid   412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000046>
[pid  2220] write(1, "options ['Forex']\n", 18 <unfinished ...>
[pid   412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000079>
[pid   412] read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 61440) = 50542 <0.000322>
[pid   412] read(5, "", 8192)           = 0 <0.000067>
[pid   412] close(5)                    = 0 <0.000048>
[pid  2220] <... write resumed> )       = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <9.479316>
[pid  2220] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=412, si_uid=1} ---
[pid  2220] +++ killed by SIGTERM +++
[pid   412] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=2220, si_uid=1, si_status=SIGTERM, si_utime=0, si_stime=1} ---
[pid   412] close(4)                    = 0 <0.000218> 

Is there a way for me to check what is causing the write call to be hanging? Thanks.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source