pstree/pstack/strace

介绍

今天介绍的几个命令主要是追踪进程执行状态用的,在平时工作中经常遇到进程卡住的情况,导致后续任务失败,如果当前现场得以保留,我们可以用这三个命令来完成对问题根因的查找以及处理。

pstree

pstree命令主要以树状图的形式展现进程间的调用关系,通过ps获取到进程的进程id后传入pstree进行进程查看,官方介绍如下:

pstree shows running processes as a tree. The tree is rooted at either pid or init if pid is omitted.
If a user name is specified, all process trees rooted at processes owned by that user are shown.

以下为各选项的说明文档:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
pstree: invalid option -- '-'
Usage: pstree [ -a ] [ -c ] [ -h | -H PID ] [ -l ] [ -n ] [ -p ] [ -u ]
[ -A | -G | -U ] [ PID | USER ]
pstree -V
Display a tree of processes.

-a show command line arguments
-A use ASCII line drawing characters
-c don't compact identical subtrees
-h highlight current process and its ancestors
-H PID highlight this process and its ancestors
-G use VT100 line drawing characters
-l don't truncate long lines
-n sort output by PID
-p show PIDs; implies -c
-u show uid transitions
-U use UTF-8 (Unicode) line drawing characters
-V display version information
-Z show SELinux security contexts
PID start at this PID; default is 1 (init)
USER show only trees rooted at processes of this user

pstack

pstack跟pstree不太一样,pstree主要看进程间的调用关系,pstack主要是查看进程函数堆栈,核心实现是使用了gdb以及threadapply all bt 命令

一般用法为直接加pid,如下

pstack

命令结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
pstack 4551
Thread 7 (Thread 1084229984 (LWP 4552)):
#0 0x000000302afc63dc in epoll_wait () from /lib64/tls/libc.so.6
#1 0x00000000006f0730 in ub::EPollEx::poll ()
#2 0x00000000006f172a in ub::NetReactor::callback ()
#3 0x00000000006fbbbb in ub::UBTask::CALLBACK ()
#4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#6 0x0000000000000000 in ?? ()
Thread 6 (Thread 1094719840 (LWP 4553)):
#0 0x000000302afc63dc in epoll_wait () from /lib64/tls/libc.so.6
#1 0x00000000006f0730 in ub::EPollEx::poll ()
#2 0x00000000006f172a in ub::NetReactor::callback ()
#3 0x00000000006fbbbb in ub::UBTask::CALLBACK ()
#4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#6 0x0000000000000000 in ?? ()
Thread 5 (Thread 1105209696 (LWP 4554)):
#0 0x000000302b80baa5 in __nanosleep_nocancel ()
#1 0x000000000079e758 in comcm::ms_sleep ()
#2 0x00000000006c8581 in ub::UbClientManager::healthyCheck ()
#3 0x00000000006c8471 in ub::UbClientManager::start_healthy_check ()
#4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#6 0x0000000000000000 in ?? ()
....

strace

strace常用来跟踪进程执行时的系统调用和所接收的信号。 在Linux世界,进程不能直接访问硬件设备,当进程需要访问硬件设备(比如读取磁盘文件,接收网络数据等等)时,必须由用户态模式切换至内核态模式,通 过系统调用访问硬件设备。strace可以跟踪到一个进程产生的系统调用,包括参数,返回值,执行消耗的时间

如下为strace的参数定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
strace: invalid option -- '-'
usage: strace [-dDffhiqrtttTvVxx] [-a column] [-e expr] ... [-o file]
[-p pid] ... [-s strsize] [-u username] [-E var=val] ...
[command [arg ...]]
or: strace -c [-D] [-e expr] ... [-O overhead] [-S sortby] [-E var=val] ...
[command [arg ...]]
-c -- count time, calls, and errors for each syscall and report summary
-f -- follow forks, -ff -- with output into separate files
-F -- attempt to follow vforks, -h -- print help message
-i -- print instruction pointer at time of syscall
-q -- suppress messages about attaching, detaching, etc.
-r -- print relative timestamp, -t -- absolute timestamp, -tt -- with usecs
-T -- print time spent in each syscall, -V -- print version
-v -- verbose mode: print unabbreviated argv, stat, termio[s], etc. args
-x -- print non-ascii strings in hex, -xx -- print all strings in hex
-a column -- alignment COLUMN for printing syscall results (default 40)
-e expr -- a qualifying expression: option=[!]all or option=[!]val1[,val2]...
options: trace, abbrev, verbose, raw, signal, read, or write
-o file -- send trace output to FILE instead of stderr
-O overhead -- set overhead for tracing syscalls to OVERHEAD usecs
-p pid -- trace process with process id PID, may be repeated
-D -- run tracer process as a detached grandchild, not as parent
-s strsize -- limit length of print strings to STRSIZE chars (default 32)
-S sortby -- sort syscall counts by: time, calls, name, nothing (default time)
-u username -- run command as username handling setuid and/or setgid
-E var=val -- put var=val in the environment for command
-E var -- remove var from the environment for command

已经有很多文章对这个进行了详细的介绍,这里就不重复造轮子了,具体了解可以看如下链接:

https://man.linuxde.net/strace

总结

今天了解到的命令都是进程追踪的命令,当我们遇到进程出现问题需要追踪调试的时候可以使用。

-------------本文结束,感谢您的阅读-------------