Tracing processes
It can be tricky to know exactly what a process is doing, especially if you don't have access to the source code for the program. Thankfully there are a few debugging tools which can provide some insight into what a process is up to.
Backtraces with pstack
It's often useful seeing which function a process is currently executing, you can quickly get a stack trace from a running process using pstack:
$ pstack 1764
#0 0x00007f9a7a614b83 in __select_nocancel () from /lib64/libc.so.6
#1 0x00007f9a7ad2b585 in apr_sleep () from /lib64/libapr-1.so.0
#2 0x00007f9a7c04ddc1 in ap_wait_or_timeout ()
#3 0x00007f9a71c6813e in prefork_run () from /etc/httpd/modules/mod_mpm_prefork.so
#4 0x00007f9a7c04d5ae in ap_run_mpm ()
#5 0x00007f9a7c046b46 in main ()
Using ltrace
Unlike pstack which shows a point in time, ltrace can be used
to see library calls made by a process over time. For example if you run
whoami using ltrace, you can see the calls being made to
geteuid, and then to getpwuid to work out
the username of the user running the command:
$ ltrace whoami > /dev/null
__libc_start_main(0x401510, 1, 0x7ffd52295f98, 0x4040e0 <unfinished ...>
strrchr("whoami", '/') = nil
setlocale(LC_ALL, "") = "en_GB.UTF-8"
bindtextdomain("coreutils", "/usr/share/locale") = "/usr/share/locale"
textdomain("coreutils") = "coreutils"
__cxa_atexit(0x401940, 0, 0, 0x736c6974756572) = 0
getopt_long(1, 0x7ffd52295f98, "", nil, nil) = -1
__errno_location() = 0x7fabe410f6c0
geteuid() = 0
getpwuid(0, 0x40438c, 0x7fabe3ef9280, -1) = 0x7fabe3efc2a0
puts("root") = 5
exit(0 <unfinished ...>
__fpending(0x7fabe3efa400, 0, 64, 0x7fabe3efaeb0) = 0
fileno(0x7fabe3efa400) = 1
__freading(0x7fabe3efa400, 0, 64, 0x7fabe3efaeb0) = 0
__freading(0x7fabe3efa400, 0, 2052, 0x7fabe3efaeb0) = 0
fflush(0x7fabe3efa400) = 0
fclose(0x7fabe3efa400) = 0
__fpending(0x7fabe3efa1c0, 0, 0x7fabe3efba00, 0xfbad000c) = 0
fileno(0x7fabe3efa1c0) = 2
__freading(0x7fabe3efa1c0, 0, 0x7fabe3efba00, 0xfbad000c) = 0
__freading(0x7fabe3efa1c0, 0, 4, 0xfbad000c) = 0
fflush(0x7fabe3efa1c0) = 0
fclose(0x7fabe3efa1c0) = 0
Limiting output
ltrace can very quickly produce too much output. The -e option can be used
to filter output based on an expression:
$ ltrace -e gethostname hostname
hostname->gethostname("server.example.com", 128) = 0
server.example.com
+++ exited (status 0) +++
You can also use the -c option to count library calls instead of displaying
each call:
$ ltrace -c python -c '2+2'
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
50.24 0.019741 19741 1 __libc_start_main
49.01 0.019257 19257 1 Py_Main
0.76 0.000297 297 1 exit_group
------ ----------- ----------- --------- --------------------
100.00 0.039295 3 total
Following processes
By default ltrace does not follow child processes or threads:
$ ltrace python -c 'import subprocess;subprocess.call("hostname")' > /dev/null
__libc_start_main(0x4006f0, 3, 0x7ffe699f7828, 0x4007e0 <unfinished ...>
Py_Main(3, 0x7ffe699f7828, 0x7ffe699f7848, 0x4007e0 <no return ...>
--- SIGCHLD (Child exited) ---
<... Py_Main resumed> ) = 0
+++ exited (status 0) +++
The -f option can be use to follow child threads and processes:
$ ltrace -f python -c 'import subprocess;subprocess.call("hostname")' > /dev/null
[pid 2909] __libc_start_main(0x4006f0, 3, 0x7fff48454048, 0x4007e0 <unfinished ...>
[pid 2909] Py_Main(3, 0x7fff48454048, 0x7fff48454068, 0x4007e0 <no return ...>
[pid 2910] --- Called exec() ---
[pid 2910] __libc_start_main(0x401230, 1, 0x7ffd5dc351c8, 0x401ea0 <unfinished ...>
[pid 2910] rindex("hostname", '/') = nil
[pid 2910] strcmp("hostname", "domainname") = 4
[pid 2910] strcmp("hostname", "ypdomainname") = -17
[pid 2910] strcmp("hostname", "nisdomainname") = -6
[pid 2910] getopt_long(1, 0x7ffd5dc351c8, "aAdfbF:h?iIsVy", 0x4028a0, nil) = -1
[pid 2910] __errno_location() = 0x7f6b6baa16c0
[pid 2910] malloc(128) = 0x25cb010
[pid 2910] gethostname("server.example.com", 128) = 0
[pid 2910] memchr("server.example.com", '\0', 128) = 0x25cb022
[pid 2910] puts("server.example.com") = 19
[pid 2910] +++ exited (status 0) +++
[pid 2909] --- SIGCHLD (Child exited) ---
[pid 2909] <... Py_Main resumed> ) = 0
[pid 2909] +++ exited (status 0) +++
Attaching to a live process
ltrace doesn't have to be used when a process starts, you can also attach to
a live process using the -p option:
$ ltrace -p "$(pgrep --oldest httpd)"
apr_proc_wait_all_procs(0x7ffd9ee48180, 0x7ffd9ee4817c, 0x7ffd9ee48178, 1) = 0x11176
apr_sleep(0xf4240, 0x7ffd9ee480bc, 3, 0) = 0
apr_proc_wait_all_procs(0x7ffd9ee48180, 0x7ffd9ee4817c, 0x7ffd9ee48178, 1) = 0x11176
...
Note: use ctrl+c to stop tracing the process.
Using strace
strace is very similar to ltrace, except it looks at system
calls instead of library calls. Unlike library calls, system calls are made to
the Kernel. For example if you were tracing hostname, you would want to
filter on something like the uname system call instead of the
gethostname library call:
$ strace -e gethostname hostname > /dev/null
strace: invalid system call 'gethostname'
$ strace -e uname hostname > /dev/null
uname({sys="Linux", node="server.example.com", ...}) = 0
+++ exited with 0 +++
Almost all of the ltrace options can be used with strace. For example if
you wanted to trace the parent httpd process and any new children it spawns,
you could use the -f and -p options:
$ strace -f -p "$(pgrep --oldest httpd)"
Process 1764 attached
select(0, NULL, NULL, NULL, {0, 598303}) = 0 (Timeout)
socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 13
...