'Linux - How to track all files accessed by a process?

Is there a way to track all file I/O for a given process? All I really need is the locations of files being read from/written to from a given process (and ideally if it was a read or write operation although that's not as important).

I can run the process and track it rather than needing to attach to an existing process which I would assume is significantly simpler. Is there any kind of wrapper utility I can run a process though that will monitor file access?



Solution 1:[1]

lsof:

Try doing this as a starter :

lsof -p <PID>

this command will list all currently open files, fd, sockets for the process with the passed process ID.

For your special needs, see what I can offer as a solution to monitor a php script :

php foo.php & _pid=$!
lsof -r1 -p $_pid
kill %1 # if you want to kill php script

strace:

I recommend the use of strace. Unlike lsof, it stays running for as long as the process is running. It will print out which syscalls are being called when they are called. -e trace=file filters only for syscalls that access the filesystem:

sudo strace -f -t -e trace=file php foo.php

or for an already running process :

sudo strace -f -t -e trace=file -p <PID>

Solution 2:[2]

Besides strace there is another option which does not substantially slow down the monitored process. Using the Liunx kernel's fanotify (not to be confused with the more popular inotify) it is possible to monitor whole mount-points for IO-activity. With unshared mountnamespaces the mounts of a given process can be isolated fromt the rest of the system (a key technology behind docker).

An implementation of this concept can be found in shournal, which I am the author of.

Example on the shell:

$ shournal -e sh -c 'cat foo > bar'
$ shournal --query --history 1
...
  1 written file(s):
     /home/user/bar
  1 read file(s):
     /home/user/foo 

Solution 3:[3]

strace is an amazing tool but its output is a bit verbose.
If you want you can use a tool I've written which processes strace output and provide a CSV report of all files accessed (TCP sockets too) with the following data:
1. Filename
2. Read/Written bytes
3. Number of read/write operations
4. Number of time the file was opened

It can be run on new processes or processes already running (using /proc/fd data).
I found it useful for debugging scenarios and performance analysis.
You can find it here: iotrace

Example output:

Filename, Read bytes, Written bytes, Opened, Read op, Write op
/dev/pts/1,1,526512,0,1,8904
socket_127.0.0.1:47948->127.0.0.1:22,1781764,396,0,8905,11
myfile.txt,65,0,9,10,0
pipe:[3339],0,0,0,1,0

Afterward, you can process the CSV data in Excel or other tools for sorting or other analysis required.
The downside is you need to download & compile and it isn't always 100% accurate.

Solution 4:[4]

Something like this may lessen the performance impact of the file activity monitoring.

$ watch -n 2.0 timeout 0.2 strace -p `pgrep myprogram` -fe trace=file

Where myprogram is the process name, 2.0 is the idle period between each monitoring period and 0.2 is the length of the monitoring period in seconds.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Flimm
Solution 2
Solution 3 Avner Levy
Solution 4 Roger Dahl