Debugging using system calls in Mac OS X

My name is Bryce. I build software. This website is a collection of my personal projects and writings.

github

prettymuchbryce

twitter

@prettymuchbryce

brycedneal@gmail.com

pgp

5DE7777595F5B778

subscribe via Atom

Debugging using system calls in Mac OS X

Published on 2016-07-30 — Permalink

Today I want to talk about an issue I was having with UDP sockets in Go, and how I learned more about my program by making use of two tools that ship with Mac OS X called dtruss, and DTrace.

I do most of my programming on a Mac, and I have been working on a project written in Go which makes use of UDP sockets. When trying to add some unit tests to my code I decided to create one around reconnections. In my test I created two sockets on localhost with two different ports, read/wrote from them, closed them, and then repeated this several hundred times. Eventually I would receive a "bind: address already in use" error upon the creation a socket. I wondered, why was this happening if I was closing the sockets ?

Understanding UDP in Go

In Go we can use the net.ListenPacket method to start listening for UDP packets on a given network address. This function returns a net.PacketConn interface.

According to the Go documentation for PacketConn, the Close method should either close the connection, or return an error:

type PacketConn interface {
...
    // Close closes the connection.
    // Any blocked ReadFrom or WriteTo operations will be unblocked and return errors.
    Close() error
...
}

If this were true then why was I getting these EADDRINUSE errors ? Time to investigate 🔎 But first a brief recap on system calls.

System calls

Systems calls are the way your program interfaces with your operating system. In unix-based x86 systems specifically your compiled code will contain a software interrupt which is passed a value corresponding to a function in the kernel. In our case we were interested in two system calls; bind and close. bind assigns a file descriptor to a network address. close deallocates a file descriptor.

How can we see when our program is making system calls like bind and close ?

dtruss

dtruss is a tool that ships with Mac OS X. It was written by Brendan Gregg. It is similar to strace on linux systems. You can use it to see which system calls a program is making to the kernel. This means we can see if a program is attempting to close a socket when we expect it to, or if in fact the Go standard library is lying to us. After running dtruss you will see something like this:

$ sudo dtruss -a -t bind go test -run TestReconnect
PID/THRD         RELATIVE  ELAPSD    CPU SYSCALL(args)               = return

82830/0x978a1c:   1357371      99     95 close(0x5)      = 0 0
82830/0x978a20:   1378137      20     16 socket(0x2, 0x2, 0x0)       = 5 0
82830/0x978a20:   1378139       4      0 fcntl(0x5, 0x2, 0x1)        = 0 0
82830/0x978a20:   1378142       2      0 fcntl(0x5, 0x3, 0x0)        = 2 0
82830/0x978a20:   1378144       4      1 fcntl(0x5, 0x4, 0x6)        = 0 0
82830/0x978a20:   1378148       6      2 setsockopt(0x5, 0xFFFF, 0x20)       = 0 0
82830/0x978a20:   1378161       8      4 bind(0x5, 0xC821AA576C, 0x10)       = -1 Err#48

So here we can see a bunch of interesting information.

Which threads are making which syscalls
How long they are taking
The arguments they're called with
The return value

One interesting thing here is that on the last line at the very end we see Err#48. This is our "bind: address already in use" error. In this particular trace we can see on the first line it looks like we close the 0x5 socket with no errors, and in the last line it says 0x5 is already in use when we try to bind it. Confusingly the 0x5 just refers to the file descriptor, and not the actual network address. The network address is stored in the second argument 0xC821AA576C above. This is a pointer to the sockaddr struct passed to the kernel, which when dereferenced would look something like this:

struct sockaddr {
   sa_family_t sa_family;
   char        sa_data[14]; // 14 bytes of protocol address
}

The part we're interested in is the sa_data field.

So we need to dereference the pointer somehow. How is this possible ? Enter DTrace!

DTrace

Dtrace is a powerful debugging tool originally built by Sun Microsystems for Solaris. It is also available on Mac OS X. The amazing thing about Dtrace is that it ships with its own scripting language called D. With these scripts you can write tracing programs which will execute upon entry/return of specified system calls. So using this we can actually write some tracing around bind to dereference the sockaddr struct, and ultimately find out which port is being bound.

dtrace -c "go test -run TestReconnect" -n 'bind:entry { trace(arg0); trace(copyinstr(arg1)); }' -f bind

This command is saying, "Execute my Go program, but whenever you see a bind system call invocation, run this D script." In the script itself I'm tracing the first two arguments. With the second argument I am tracing the result of a function called copyinstr. In D, the copyinstr function will dereference a pointer in user space and output the result as a string.

Running DTrace with these arguments gives me something like this:

 0    358                       bind:entry                 5
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 10 02 0b b9 7f 00 00 00 00 00 00 00 00 00 00 00  ................
        10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        ...

At the end of the first line we see the 5 file descriptor being used like before. On the next lines we see the output of copyinstr(arg1). This is the dereferenced pointer to the sockaddr struct. It looks like mostly unused memory. If we look at 3rd and 4th bytes in the above output we see the port number of the socket which is 0x0bb9 (3001). Now I can see that file descriptor 0x5 is binding to port 3001. Whew! 😅 Using this tool I was able to see that in fact an existing file descriptor was still bound to the same port when the subsequent bind call was made.

Conclusion

After digging through the Go standard library I eventually found my answer. Go documentation does mention that any blocked ReadFrom or WriteTo calls will be unblocked and return errors. What it fails to mention is that until that happens, Go will not close the socket. Meaning that even after calling Close() on a net.PacketConn there is no guarantee that the socket has actually been closed. This is due to a mutex around the underlying file descriptor in Go. The solution in my case was to use a semaphore in my code to ensure that all ReadFrom and WriteTo calls had returned before attempting to open a new socket on the same port.

The important thing about this exercise for me was being able to know for certain that I understood what my program was doing, and what my operating system was doing. This means I could be confident that the problem was sitting in between these layers in the stack.

Happy debugging! 😄