Tracing processes

03 March 2017

It can be tricky to know exactly what a process is doing, especially if you don't have access to the source code for the program. Thankfully there are a few debugging tools which can provide some insight into what a process is up to.

Backtraces with pstack

It's often useful seeing which function a process is currently executing, you can quickly get a stack trace from a running process using pstack:

$ pstack 1764
#0  0x00007f9a7a614b83 in __select_nocancel () from /lib64/libc.so.6
#1  0x00007f9a7ad2b585 in apr_sleep () from /lib64/libapr-1.so.0
#2  0x00007f9a7c04ddc1 in ap_wait_or_timeout ()
#3  0x00007f9a71c6813e in prefork_run () from /etc/httpd/modules/mod_mpm_prefork.so
#4  0x00007f9a7c04d5ae in ap_run_mpm ()
#5  0x00007f9a7c046b46 in main ()

Using ltrace

Unlike pstack which shows a point in time, ltrace can be used to see library calls made by a process over time. For example if you run whoami using ltrace, you can see the calls being made to geteuid, and then to getpwuid to work out the username of the user running the command:

$ ltrace whoami > /dev/null
__libc_start_main(0x401510, 1, 0x7ffd52295f98, 0x4040e0 <unfinished ...>
strrchr("whoami", '/')                                              = nil
setlocale(LC_ALL, "")                                               = "en_GB.UTF-8"
bindtextdomain("coreutils", "/usr/share/locale")                    = "/usr/share/locale"
textdomain("coreutils")                                             = "coreutils"
__cxa_atexit(0x401940, 0, 0, 0x736c6974756572)                      = 0
getopt_long(1, 0x7ffd52295f98, "", nil, nil)                        = -1
__errno_location()                                                  = 0x7fabe410f6c0
geteuid()                                                           = 0
getpwuid(0, 0x40438c, 0x7fabe3ef9280, -1)                           = 0x7fabe3efc2a0
puts("root")                                                        = 5
exit(0 <unfinished ...>
__fpending(0x7fabe3efa400, 0, 64, 0x7fabe3efaeb0)                   = 0
fileno(0x7fabe3efa400)                                              = 1
__freading(0x7fabe3efa400, 0, 64, 0x7fabe3efaeb0)                   = 0
__freading(0x7fabe3efa400, 0, 2052, 0x7fabe3efaeb0)                 = 0
fflush(0x7fabe3efa400)                                              = 0
fclose(0x7fabe3efa400)                                              = 0
__fpending(0x7fabe3efa1c0, 0, 0x7fabe3efba00, 0xfbad000c)           = 0
fileno(0x7fabe3efa1c0)                                              = 2
__freading(0x7fabe3efa1c0, 0, 0x7fabe3efba00, 0xfbad000c)           = 0
__freading(0x7fabe3efa1c0, 0, 4, 0xfbad000c)                        = 0
fflush(0x7fabe3efa1c0)                                              = 0
fclose(0x7fabe3efa1c0)                                              = 0

Limiting output

ltrace can very quickly produce too much output. The -e option can be used to filter output based on an expression:

$ ltrace -e gethostname hostname
hostname->gethostname("server.example.com", 128)                    = 0
server.example.com
+++ exited (status 0) +++

You can also use the -c option to count library calls instead of displaying each call:

$ ltrace -c python -c '2+2'
% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 50.24    0.019741       19741         1 __libc_start_main
 49.01    0.019257       19257         1 Py_Main
  0.76    0.000297         297         1 exit_group
------ ----------- ----------- --------- --------------------
100.00    0.039295                     3 total

Following processes

By default ltrace does not follow child processes or threads:

$ ltrace python -c 'import subprocess;subprocess.call("hostname")' > /dev/null 
__libc_start_main(0x4006f0, 3, 0x7ffe699f7828, 0x4007e0 <unfinished ...>
Py_Main(3, 0x7ffe699f7828, 0x7ffe699f7848, 0x4007e0 <no return ...>
--- SIGCHLD (Child exited) ---
<... Py_Main resumed> )                                                            = 0
+++ exited (status 0) +++

The -f option can be use to follow child threads and processes:

$ ltrace -f python -c 'import subprocess;subprocess.call("hostname")' > /dev/null 
[pid 2909] __libc_start_main(0x4006f0, 3, 0x7fff48454048, 0x4007e0 <unfinished ...>
[pid 2909] Py_Main(3, 0x7fff48454048, 0x7fff48454068, 0x4007e0 <no return ...>
[pid 2910] --- Called exec() ---
[pid 2910] __libc_start_main(0x401230, 1, 0x7ffd5dc351c8, 0x401ea0 <unfinished ...>
[pid 2910] rindex("hostname", '/')                                                 = nil
[pid 2910] strcmp("hostname", "domainname")                                        = 4
[pid 2910] strcmp("hostname", "ypdomainname")                                      = -17
[pid 2910] strcmp("hostname", "nisdomainname")                                     = -6
[pid 2910] getopt_long(1, 0x7ffd5dc351c8, "aAdfbF:h?iIsVy", 0x4028a0, nil)         = -1
[pid 2910] __errno_location()                                                      = 0x7f6b6baa16c0
[pid 2910] malloc(128)                                                             = 0x25cb010
[pid 2910] gethostname("server.example.com", 128)                                  = 0
[pid 2910] memchr("server.example.com", '\0', 128)                                 = 0x25cb022
[pid 2910] puts("server.example.com")                                              = 19
[pid 2910] +++ exited (status 0) +++
[pid 2909] --- SIGCHLD (Child exited) ---
[pid 2909] <... Py_Main resumed> )                                                 = 0
[pid 2909] +++ exited (status 0) +++

Attaching to a live process

ltrace doesn't have to be used when a process starts, you can also attach to a live process using the -p option:

$ ltrace -p "$(pgrep --oldest httpd)"
apr_proc_wait_all_procs(0x7ffd9ee48180, 0x7ffd9ee4817c, 0x7ffd9ee48178, 1)         = 0x11176
apr_sleep(0xf4240, 0x7ffd9ee480bc, 3, 0)                                           = 0
apr_proc_wait_all_procs(0x7ffd9ee48180, 0x7ffd9ee4817c, 0x7ffd9ee48178, 1)         = 0x11176
...

Note: use ctrl+c to stop tracing the process.

Using strace

strace is very similar to ltrace, except it looks at system calls instead of library calls. Unlike library calls, system calls are made to the Kernel. For example if you were tracing hostname, you would want to filter on something like the uname system call instead of the gethostname library call:

$ strace -e gethostname hostname > /dev/null 
strace: invalid system call 'gethostname'

$ strace -e uname hostname > /dev/null 
uname({sys="Linux", node="server.example.com", ...}) = 0
+++ exited with 0 +++

Almost all of the ltrace options can be used with strace. For example if you wanted to trace the parent httpd process and any new children it spawns, you could use the -f and -p options:

$ strace -f -p "$(pgrep --oldest httpd)"
Process 1764 attached
select(0, NULL, NULL, NULL, {0, 598303}) = 0 (Timeout)
socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 13
...