²©ñRŠÊ˜·³Ç

IPv6: Why Bother?

Date: Apr 16, 2010 Article is provided courtesy of Pearson.
IPv6 has been coming Real Soon Now for well over a decade. David Chisnall looks at what the benefits it brings and how to support it.

A little while after I got my first modem, I was told that the Internet protocol was going to be replaced by a new and much better one soon. The person explaining this didn't really know why it was better, but was certain it was going to be great and everyone would be using it soon. Almost two decades later, we're mostly still using IPv4, the same protocol that I used with my first modem.

The new protocol, now called IPv6, is now finalized and is supported by most modern operating systems, but is still not widely used. In this article, I'll look at some of the benefits it provides and how to support it in your own applications.

The Point of IPv6

The most advertised feature of IPv6 is the larger address space. If you've read anything about IPv6, then you probably know that it increases the address size from 32 bits to 128. This is more than enough for every person ever born to have a private network bigger than the current Internet. Even if everything you own (including things that don't contain any electronics) had its own IPv6 address, then you would still not be using more than a tiny fraction of the address space.

This is quite important because it can make routing easier. Routers typically connect a relatively small number of networks together. The simplest case is your home router, which connects your local network to the Internet. For every packet that it receives, it must do one of three things: drop it, forward it to the internal network, or forward it to the external network.

For a typical home network, this is quite an easy decision: If the destination address is in one of the reserved private ranges, send it inside; otherwise send it out. Big commercial routers have to make much more complex decisions. Since the mid '90s, when IPv4 addresses started to be seen as a scarce resource, they have been allocated in 8-bit ranges. This means that you may get three adjacent blocks on completely different networks. With this allocation scheme, there are 2^24 possible networks, and a router needs to be able to decide which connection along which a packet destined for any of them should be sent. 2^24 is a little less than 17 million. Fortunately, a lot of these will be simpler, so you can combine their entries, but it's still difficult to make routing decisions.

With IPv6, there are enough addresses now that every country or major network can be assigned a large range. It can then assign subranges within that to networks that it connects to, and so on. This hierarchical assignment (in theory, at least) simplifies routing decisions.

One of the major complaints about IPv6 comes from people who think NAT is security and confuse “routable” with “accessible.” With IPv4, most home users (and almost all mobile users) use network address translation (NAT). Your computer has a private IP address, and the router has a public one. Every connected port on your private IP is mapped to a port on the public IP address. This does not provide any security. Most NAT implementations also default to denying connections originating outside, while some will forward these to a designated default host.

The policy of denying externally-originating connections provides security, but that is provided by the firewall part of the router and is not intrinsic to NAT. Most non-NAT firewalls will do the same.

Just because your computer has an externally routable IPv6 address does not mean that it's accessible. The firewall device that you plug in to your Internet connection still defines the policy of who can connect. Given the number of hacks used to penetrate NATs to make things like Voice over IP work, it's surprising that anyone still thinks it adds security, but apparently some do.

Secure by Default

Secure by Default

A few of the changes in IPv6 are relatively simple, just making things that are optional parts of IPv4 compulsory. One of the most interesting is IPsec. This is currently used mainly for VPNs, establishing an encrypted connection between two routers and preventing any intermediate packets from being intercepted.

With IPv6, you can guarantee that any endpoint will support IPsec, which means that you can always establish an encrypted connection. With IPv4, most of the time, you will use SSL for encryption. This operates slightly higher up the protocol stack and requires every application that uses it to be specifically configured to do so.

Currently, IPsec is most commonly used with encryption keys that are shared out of band. One alternative way of using it is to embed a public key that can be used to negotiate an IPsec connection in the DNS records. For this to work, the DNS record itself needs signing with DNSSEC. This is due to be supported on the major root domains over the next few months.

If you do your DNS queries over IPsec, then you can also get the request and response encrypted so no one can tell which sites you are looking up (although your ISP can obviously tell which IP addresses you connect to). Unlike SSL, IPsec works for every kind of connection, including UDP, so things like Voice over IP can benefit significantly from IPv6. The endpoints can connect directly without having to navigate NATs, and the entire connection can be encrypted.

The other addition is multicast. In IPv4, one address on each subnet is reserved as the broadcast address. Packets sent here are delivered to every machine on the subnet. This made a lot of sense when most networks were buses. All packets were broadcast anyway; this just told every receiver to look at them. Sending to the broadcast address was more efficient than sending two copies of a packet if two people wanted to look at it.

With switched networks, this is not the case. Each computer can send and receive a certain amount, and the switch will route these packets between the machines, so four computers on a 100Mbit network can be having two independent conversations at 100Mb/s. If you send a packet to the broadcast address, it's sent to everyone, even people who don't want it.

Multicast is a bit more clever. It defines groups of computers and assigns them a shared IP address. Packets sent to this address are routed to every computer that has opted into the group. Unlike broadcast, multicast packets are routable. You can have lots of computers on different networks in a multicast group and only generate two packets when the source packet reaches a router that has members on two downstream networks. If, for example, you had an Internet radio station using IPv6 multicast, then the station would send one stream of packets to your ISP. Your ISP would send a copy of the stream to each of its customers who were listening. When the packets hit your router, it would send a copy to each machine you have that is listening to the stream.

You can use the same mechanism for things such as conferencing calls. Currently, if you have ten people in a video conferencing session, then either each one needs to stream ten copies of his or hers camera's output, or you need a server somewhere that can handle the relaying. With multicast, everyone would send one copy to ten people. With consumer network connections, which typically have a lot more downstream bandwidth than upstream, this is particularly attractive.

Broadcast is also used when you don't know which computer you should be using[md]for example, for service discovery. IPv6 replaces this use with anycast. Anycast is somewhat like multicast in that a group of addresses is associated with the anycast address, but unlike multicast, a packet sent to that address is only delivered to one machine. This is useful for things such as autoconfiguration.

Sockets and Protocols

Sockets and Protocols

Hopefully now you can see that supporting IPv6 is useful. The question then becomes how? Most networking code either uses Berkeley sockets or a high-level API built on top of them. If you're using a high-level API, then you probably already have IPv6 support provided by someone else. If you're using sockets, then you need to make some small changes to your code.

The Berkeley socket API was based on the UNIX idea that everything is a file. You communicate with remote machines using file descriptors with some special features. Once you have a socket, the API is completely protocol-agnostic. Whether you're using IPv4, IPv6, AppleTalk, or some other protocol, the code is the same.

Unfortunately, creating sockets is where things start to go wrong. You create a socket with this function:

     int socket(int domain, int type, int protocol);

The three parameters all define some aspect of the protocol. Most of the time, you end up hard-coding the domain as PF_INET, meaning IPv4. Things then go even more wrong when you bind the socket to a local address and then either connect or accept connections. All of the relevant functions take a pointer to a sockaddr structure as an argument.

This structure is quite simple, but you never actually use it directly. Instead, you use something like a sockaddr_in structure, which starts the same way as a sockaddr structure (with a size and address family) but then contains an IPv4 address and port number. You usually use the gethostbyname() function to look these up.

If you want to support another protocol, then you needed to add a different code path with a different lookup mechanism. One of the recent improvements in the 2004 version of POSIX was the definition of the getaddrinfo() function. This looks up a server identified by a string description of an address and a service. This means that you don't have to hardcode port numbers either. Using it is quite simple. This code will connect to the InformIT HTTP server using any available connection:

    struct addrinfo hints, *results;
    memset(&hints, 0, sizeof(hints));
    hints.ai_family = PF_UNSPEC;
    hints.ai_socktype = SOCK_STREAM;
    int error = getaddrinfo("informit.com", "http", &hints, &results);
    if (error)  { /* fail */ return -1; }
    int s = -1;
    for (struct addrinfo *res = results;
        res != NULL && s < 0 ;
        res = res->ai_next)
    {
        s = socket(res->ai_family, res->ai_socktype,
            res->ai_protocol);
        //If the socket failed, try the next address
        if (s < 0)  { continue ; }
        //If the connection failed, try the next address
        if (connect(s, res->ai_addr, res->ai_addrlen) < 0)
        {
            close(s);
            s = -1;
            continue;
        }
    }
    freeaddrinfo(results);
    return s;

The first two arguments to getaddrinfo() are the name of the server and the name of the protocol. The resolver will look up the address of the server and service combination. On some platforms, such as OS X, this even includes resolving DNS SRV records, so this will set the port correctly even if the server is running on a non-standard port, as long as the DNS entry advertises that fact. The hints structure is used to specify some constraints on the returned address info. In this case, we only want stream sockets, and we're willing to accept them in absolutely any protocol the underlying network stack supports. The final argument is a pointer that is used to return an array of address info structures.

Note that all of the arguments to socket() and connect() are all from the address info returned by the getaddrinfo() call. None of this code is at all aware of whether you are using IPv4, IPv6, or some other new protocol. As long as your operating system supports a protocol, this code can use it.

IPv6: The Protocol of the Present

IPv6: The Protocol of the Present

Hopefully in this article you've seen that IPv6 provides some compelling features that make it worth supporting and, importantly, that supporting it is not much effort. There is no excuse for writing new socket code that doesn't support IPv6, and updating old code to support it is quite easy too.

I had to look up the getaddrinfo() function when I wrote this article, because I wrapped it up in a class that just takes the host and protocol names as arguments and returns a connected socket a while ago, and now I just use that class in all networking code.

With IPv4 addresses becoming increasingly scarce, it looks like more consumer ISPs are going to start implementing NAT for all of their customers, at which point the only ones that will be able to make end-to-end connections to each other will be the ones using apps that support IPv6 out of the box. If your code doesn't yet, now would be a good time to get hacking.