Tracing processes
It can be tricky to know exactly what a process is doing, especially if you don't have access to the source code for the program. Thankfully there are a few debugging tools which can provide some insight into what a process is up to.
Backtraces with pstack
It's often useful seeing which function a process is currently executing, you can quickly get a stack trace from a running process using pstack:
$ pstack 1764
#0 0x00007f9a7a614b83 in __select_nocancel () from /lib64/libc.so.6
#1 0x00007f9a7ad2b585 in apr_sleep () from /lib64/libapr-1.so.0
#2 0x00007f9a7c04ddc1 in ap_wait_or_timeout ()
#3 0x00007f9a71c6813e in prefork_run () from /etc/httpd/modules/mod_mpm_prefork.so
#4 0x00007f9a7c04d5ae in ap_run_mpm ()
#5 0x00007f9a7c046b46 in main ()
Using ltrace
Unlike pstack
which shows a point in time, ltrace can be used
to see library calls made by a process over time. For example if you run
whoami
using ltrace
, you can see the calls being made to
geteuid, and then to getpwuid to work out
the username of the user running the command:
$ ltrace whoami > /dev/null
__libc_start_main(0x401510, 1, 0x7ffd52295f98, 0x4040e0 <unfinished ...>
strrchr("whoami", '/') = nil
setlocale(LC_ALL, "") = "en_GB.UTF-8"
bindtextdomain("coreutils", "/usr/share/locale") = "/usr/share/locale"
textdomain("coreutils") = "coreutils"
__cxa_atexit(0x401940, 0, 0, 0x736c6974756572) = 0
getopt_long(1, 0x7ffd52295f98, "", nil, nil) = -1
__errno_location() = 0x7fabe410f6c0
geteuid() = 0
getpwuid(0, 0x40438c, 0x7fabe3ef9280, -1) = 0x7fabe3efc2a0
puts("root") = 5
exit(0 <unfinished ...>
__fpending(0x7fabe3efa400, 0, 64, 0x7fabe3efaeb0) = 0
fileno(0x7fabe3efa400) = 1
__freading(0x7fabe3efa400, 0, 64, 0x7fabe3efaeb0) = 0
__freading(0x7fabe3efa400, 0, 2052, 0x7fabe3efaeb0) = 0
fflush(0x7fabe3efa400) = 0
fclose(0x7fabe3efa400) = 0
__fpending(0x7fabe3efa1c0, 0, 0x7fabe3efba00, 0xfbad000c) = 0
fileno(0x7fabe3efa1c0) = 2
__freading(0x7fabe3efa1c0, 0, 0x7fabe3efba00, 0xfbad000c) = 0
__freading(0x7fabe3efa1c0, 0, 4, 0xfbad000c) = 0
fflush(0x7fabe3efa1c0) = 0
fclose(0x7fabe3efa1c0) = 0
Limiting output
ltrace
can very quickly produce too much output. The -e
option can be used
to filter output based on an expression:
$ ltrace -e gethostname hostname
hostname->gethostname("server.example.com", 128) = 0
server.example.com
+++ exited (status 0) +++
You can also use the -c
option to count library calls instead of displaying
each call:
$ ltrace -c python -c '2+2'
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
50.24 0.019741 19741 1 __libc_start_main
49.01 0.019257 19257 1 Py_Main
0.76 0.000297 297 1 exit_group
------ ----------- ----------- --------- --------------------
100.00 0.039295 3 total
Following processes
By default ltrace
does not follow child processes or threads:
$ ltrace python -c 'import subprocess;subprocess.call("hostname")' > /dev/null
__libc_start_main(0x4006f0, 3, 0x7ffe699f7828, 0x4007e0 <unfinished ...>
Py_Main(3, 0x7ffe699f7828, 0x7ffe699f7848, 0x4007e0 <no return ...>
--- SIGCHLD (Child exited) ---
<... Py_Main resumed> ) = 0
+++ exited (status 0) +++
The -f
option can be use to follow child threads and processes:
$ ltrace -f python -c 'import subprocess;subprocess.call("hostname")' > /dev/null
[pid 2909] __libc_start_main(0x4006f0, 3, 0x7fff48454048, 0x4007e0 <unfinished ...>
[pid 2909] Py_Main(3, 0x7fff48454048, 0x7fff48454068, 0x4007e0 <no return ...>
[pid 2910] --- Called exec() ---
[pid 2910] __libc_start_main(0x401230, 1, 0x7ffd5dc351c8, 0x401ea0 <unfinished ...>
[pid 2910] rindex("hostname", '/') = nil
[pid 2910] strcmp("hostname", "domainname") = 4
[pid 2910] strcmp("hostname", "ypdomainname") = -17
[pid 2910] strcmp("hostname", "nisdomainname") = -6
[pid 2910] getopt_long(1, 0x7ffd5dc351c8, "aAdfbF:h?iIsVy", 0x4028a0, nil) = -1
[pid 2910] __errno_location() = 0x7f6b6baa16c0
[pid 2910] malloc(128) = 0x25cb010
[pid 2910] gethostname("server.example.com", 128) = 0
[pid 2910] memchr("server.example.com", '\0', 128) = 0x25cb022
[pid 2910] puts("server.example.com") = 19
[pid 2910] +++ exited (status 0) +++
[pid 2909] --- SIGCHLD (Child exited) ---
[pid 2909] <... Py_Main resumed> ) = 0
[pid 2909] +++ exited (status 0) +++
Attaching to a live process
ltrace
doesn't have to be used when a process starts, you can also attach to
a live process using the -p
option:
$ ltrace -p "$(pgrep --oldest httpd)"
apr_proc_wait_all_procs(0x7ffd9ee48180, 0x7ffd9ee4817c, 0x7ffd9ee48178, 1) = 0x11176
apr_sleep(0xf4240, 0x7ffd9ee480bc, 3, 0) = 0
apr_proc_wait_all_procs(0x7ffd9ee48180, 0x7ffd9ee4817c, 0x7ffd9ee48178, 1) = 0x11176
...
Note: use ctrl+c to stop tracing the process.
Using strace
strace is very similar to ltrace
, except it looks at system
calls instead of library calls. Unlike library calls, system calls are made to
the Kernel. For example if you were tracing hostname
, you would want to
filter on something like the uname system call instead of the
gethostname library call:
$ strace -e gethostname hostname > /dev/null
strace: invalid system call 'gethostname'
$ strace -e uname hostname > /dev/null
uname({sys="Linux", node="server.example.com", ...}) = 0
+++ exited with 0 +++
Almost all of the ltrace
options can be used with strace
. For example if
you wanted to trace the parent httpd
process and any new children it spawns,
you could use the -f
and -p
options:
$ strace -f -p "$(pgrep --oldest httpd)"
Process 1764 attached
select(0, NULL, NULL, NULL, {0, 598303}) = 0 (Timeout)
socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 13
...