Why Is My Process Exiting?

Last week during my DTrace class, I talked a bit about the proc DTrace provider. This provider allows one to watch fork(2), exec(2), and exit(2) events, as well as sending and receiving of signal.h(3HEAD) events and thread-related events. A question I am sometimes asked is: "I have a process that sometimes exits. I would like to know why it is exiting."

Processes exit for 2 reasons:

  • Either they call exit(2).
  • Or they get a signal.h(3HEAD).

The call to exit(2) is either made explicitly in the code, or is in the crt1.o (C runtime library) if the main() function returns. The entry point for most processes is the _start function. To see that, here is the entry point from the elf header for a given application.

# elfdump -e /bin/viELF Header  ei_magic:   { 0x7f, E, L, F }  ei_class:   ELFCLASS32          ei_data:       ELFDATA2LSB  ei_osabi:   ELFOSABI_SOLARIS    ei_abiversion: EAV_SUNW_CURRENT  e_machine:  EM_386              e_version:     EV_CURRENT  e_type:     ET_EXEC  e_flags:                     0  e_entry:             0x8078a34  e_ehsize:     52  e_shstrndx:  30  e_shoff:              0x18bc48  e_shentsize:  40  e_shnum:     31  e_phoff:                  0x34  e_phentsize:  32  e_phnum:      6## mdb /bin/vi> 8078a34::dis_start:                         pushl  $0x0_start+2:                       pushl  $0x0_start+4:                       movl   %esp,%ebp...

And here is the part of the _start that calls exit(2):

# mdb -p 18063  <- run mdb on pid 18063> _start::dis_start:                         pushq  $0x0_start+2:                       pushq  $0x0..._start+0x67:                    call   +0x17a   
_start+0x6c: pushq %rax_start+0x6d: pushq %rax_start+0x6e: movq %rax,%rdi_start+0x71: call -0xde _start+0x76: popq %rdi_start+0x77: popq %rdi_start+0x78: call -0xd5 _start+0x7d: hlt>

Let's try DTrace for the exit(2) system call.

# dtrace -n 'syscall::exit:entry{}'dtrace: invalid probe specifier syscall::exit:entry{}: probe description syscall::exit:entry does not match any probes

So, the actual system call is different. Let's see if we can find it.

# dtrace -l -n 'syscall::*exit*:entry{}'   ID   PROVIDER            MODULE                          FUNCTION NAME 9934    syscall                                               rexit entry10214    syscall                                            lwp_exit entry#

Looks like rexit is what we want. Let's try:

# dtrace -qn 'syscall::rexit:entry{printf("%s: pid %d exited with exit value %d\n", execname, pid, arg0);}'less: pid 18080 exited with exit value 0man: pid 18079 exited with exit value 0ksh93: pid 18082 exited with exit value 0man: pid 18081 exited with exit value 0...(^C)#

If the program is sent a signal, it typically does not call exit(2) directly. Instead, the operating system calls exit on behalf of the receiving process. This is done when the action for the signal is to terminate (unless the signal is being caught or ignored). We can use the proc provider to watch these events. Here's a script to watch for exits of a given process.

#!/usr/sbin/dtrace -qsproc:::exit/pid == $target && arg0 <= 3/{    printf("%d: exited due to %s\n", pid,      (arg0 == 1)? "call to exit system call":       ((arg0 == 2)? "receiving a signal": "receiving a signal and has a core dump"));}proc:::signal-send/args[1]->pr_pid == $target/{    printf("%d: sending signo %d to pid %d\n", pid, args[2], $target);}syscall::rexit:entry/pid == $target/{    printf("%d exited with exit value = %d\n", pid, arg0);    ustack();}

The script captures exit and signal events using the proc provider, and exit(2) system calls using the syscall provider.

Let's try it.

# ./exit.d -c "/bin/echo a"a18046 exited with exit value = 0              libc.so.1`0xfee70dd818046: exited due to call to exit system call#

Let's try with a simple program that should demonstrate all the possibilities. Here's the source for the program

/* exit in different ways */#include #include voidsub(void){    exit(101);}intmain(int argc, char *argv[]){    char c;    char *p = NULL;    c = getchar();    switch (c) {    case 's':        *p = 'a';  /* should SEGV */        break;    case 'p':        p = (char *)main;        *p = 'a';  /* also SEGV, protection error */        break;    case 'e':        sub();        break;    default:        pause();    }}

In one terminal window, we'll run the program:

# ./foo

In another window, we'll start the DTrace script from above.

# pgrep foo  <-- foo is the compiled C program18110## ./exit.d -p 18110

Back in the first window, typing control-c causes the following to appear in the window running the exit.d script.

# ./exit.d -p 181105468: sending signo 2 to pid 1811018110: exited due to receiving a signal## ptree 54682632  /usr/lib/ssh/sshd  5467  /usr/lib/ssh/sshd    5468  /usr/lib/ssh/sshd      5471  -bash#

So, typing a control-c causes sshd to send a SIGINT signal (signo = 2). Let's run it again, and this time we'll make it exit by sending a signal via kill(1).

# ./foo[1] 18122#

And in the second window, we'll run the script:

# ./exit.d -p 18122

Back in the first window:

# kill -TERM 18122[1]+  Stopped                 ./foo# fg./fooTerminated#

In the window running the DTrace script:

5471: sending signo 15 to pid 1812218122: exited due to receiving a signal## ptree 54712632  /usr/lib/ssh/sshd  5467  /usr/lib/ssh/sshd    5468  /usr/lib/ssh/sshd      5471  -bash#

The kill is a bash built-in. This bash is running in the terminal where the foo process was started.

Let's have the program get a SEGV signal.

# ./foo

Start the script again:

# pgrep foo18127# ./exit.d -p 18127

And in the foo window, typing an 's' gives:

sSegmentation Fault (core dumped)#

And we see this in the window running exit.d

18127: sending signo 11 to pid 1812718127: exited due to receiving a signal and has a core dump

So, the process "sent" the SEGV to itself.

Next week I'll blog about using DTrace to do debugging. Have fun!



Post written by Mr. Max Bruning