Today I want to talk about an issue I was having with UDP sockets in Go, and how I learned more about my program by making use of two tools that ship with Mac OS X called dtruss, and DTrace.
I do most of my programming on a Mac, and I have been working on a project written in Go which makes use of UDP sockets. When trying to add some unit tests to my code I decided to create one around reconnections. In my test I created two sockets on localhost with two different ports, read/wrote from them, closed them, and then repeated this several hundred times. Eventually I would receive a “bind: address already in use” error upon the creation a socket. I wondered, why was this happening if I was closing the sockets ?
Understanding UDP in Go
In Go we can use the
net.ListenPacket method to start listening for UDP packets on a given network address. This function returns a
According to the Go documentation for
Close method should either close the connection, or return an error:
If this were true then why was I getting these EADDRINUSE errors ? Time to investigate But first a brief recap on system calls.
Systems calls are the way your program interfaces with your operating system. In unix-based x86 systems specifically your compiled code will contain a software interrupt which is passed a value corresponding to a function in the kernel. In our case we were interested in two system calls;
bind assigns a file descriptor to a network address.
close deallocates a file descriptor.
How can we see when our program is making system calls like
dtruss is a tool that ships with Mac OS X. It was written by Brendan Gregg. It is similar to strace on linux systems. You can use it to see which system calls a program is making to the kernel. This means we can see if a program is attempting to close a socket when we expect it to, or if in fact the Go standard library is lying to us. After running dtruss you will see something like this:
So here we can see a bunch of interesting information.
- Which threads are making which syscalls
- How long they are taking
- The arguments they’re called with
- The return value
One interesting thing here is that on the last line at the very end we see
Err#48. This is our “bind: address already in use” error. In this particular trace we can see on the first line it looks like we close the
0x5 socket with no errors, and in the last line it says
0x5 is already in use when we try to
bind it. Confusingly the
0x5 just refers to the file descriptor, and not the actual network address. The network address is stored in the second argument
0xC821AA576C above. This is a pointer to the
sockaddr struct passed to the kernel, which when dereferenced would look something like this:
The part we’re interested in is the
So we need to dereference the pointer somehow. How is this possible ? Enter DTrace!
Dtrace is a powerful debugging tool originally built by Sun Microsystems for Solaris. It is also available on Mac OS X. The amazing thing about Dtrace is that it ships with its own scripting language called D. With these scripts you can write tracing programs which will execute upon entry/return of specified system calls. So using this we can actually write some tracing around
bind to dereference the
sockaddr struct, and ultimately find out which port is being bound.
This command is saying, “Execute my Go program, but whenever you see a
bind system call invocation, run this D script.” In the script itself I’m tracing the first two arguments. With the second argument I am tracing the result of a function called
copyinstr. In D, the
copyinstr function will dereference a pointer in user space and output the result as a string.
Running DTrace with these arguments gives me something like this:
At the end of the first line we see the
5 file descriptor being used like before. On the next lines we see the output of
copyinstr(arg1). This is the dereferenced pointer to the
sockaddr struct. It looks like mostly unused memory. If we look at 3rd and 4th bytes in the above output we see the port number of the socket which is
0x0bb9 (3001). Now I can see that file descriptor
0x5 is binding to port 3001. Whew! Using this tool I was able to see that in fact an existing file descriptor was still bound to the same port when the subsequent
bind call was made.
After digging through the Go standard library I eventually found my answer. Go documentation does mention that any blocked
WriteTo calls will be unblocked and return errors. What it fails to mention is that until that happens, Go will not close the socket. Meaning that even after calling
Close() on a
net.PacketConn there is no guarantee that the socket has actually been closed. This is due to a mutex around the underlying file descriptor in Go. The solution in my case was to use a semaphore in my code to ensure that all
WriteTo calls had returned before attempting to open a new socket on the same port.
The important thing about this exercise for me was being able to know for certain that I understood what my program was doing, and what my operating system was doing. This means I could be confident that the problem was sitting in between these layers in the stack.