Thursday, March 15, 2018

NAT444 (CGN/LSN) and What it Breaks

Before we look at what breaks, I should probably make sure that you know what it is that I’m talking about here. If you already know all about traditional NAT and address overloading, skip to the NAT444 section. If you are familiar with that as well, feel free to skip right to the list of what it breaks. In any case; enjoy! =)

NAT, PAT and Address Overloading

When people talk about Carrier Grade NAT (CGN) or Large Scale NAT (LSN) they are talking primarily about NAT444. The NAT part of those acronyms stands for Network Address Translation and NAT is already very common in IPv4 networks, particularly on LAN/WAN gateway devices. The basic idea is to use non-globally-unique (“private”) addresses on the LAN (Local Area Network) and only use globally-unique (“public”) addresses on the WAN (Wide Area Network) / Internet facing interfaces. When deployed in this way, NAT is usually combined with less-often-spoke-of PAT (Port Address Translation) to allow address overloading on the WAN side.
Whoa – we just went into geek speak, I know. It’s really not too scary though; overloading just refers to a many to one relationship of inside (local/private) addresses to outside (global/public) addresses. This is sometimes called NAPT (Network Address Port Translation) or NAT overloading and it allows you to place a very large number of hosts (all with ‘private’ addresses) behind a NAT device with a much smaller number of globally unique (public) addresses. This is done by using port numbers as unique identifiers. Let’s walk through a quick example to see how this works (as always, click pics for full size).
NAT Example
On the far left of this diagram we have two hosts in our local network (a laptop and a tablet), the Internet is on the right, and there’s a NAT overload (NAPT) router in the middle connecting them. You can see that both of our local network hosts have locally unique (“private”) inside addresses (192.168.1.23 and .42) and the router has a single globally unique (“public”) outside address (203.0.113.57).
When the laptop wants to connect to another host out on the Internet, it sends its traffic to the router (it’s default gateway). The router takes that traffic and swaps it’s outside (global/public) address in place of the laptop’s inside (local/private) address and sends the traffic on, to the Internet. Now all Internet devices see this traffic as if it came from the router itself, so they send return traffic directly to the router. When the router receives that return traffic, it swaps the laptop’s inside address back in place of it’s own outside address and sends the traffic on, to the laptop. Simple enough.
Things get more interesting when both the laptop and the tablet need to talk to the Internet at the same time though. Now the router needs to know what return traffic goes to the tablet and which traffic is supposed to go to the laptop. This is where those port numbers I mentioned earlier come into play. Each two-way connection between network hosts is called a session (or sometimes a flow) and to keep these sessions from getting mixed up, the router uses a unique port number to identify each one.NAT Overload Example
Here you can see this in action. In this diagram, both the laptop and the tablet are sending traffic to the Internet (initiating one session each). When the router does the address translation (the swap), it not only changes the address but also adds a unique port number when sending the traffic out to the Internet (:2001 and :2002 in the diagram). In this way, when the router receives return traffic that is addressed to 203.0.113.57:2001 it knows that this should be swapped for 192.168.1.42 (which is sent to the laptop) and that 203.0.113.57:2002 translates to 192.168.1.23 and goes to the tablet. Easy peasy. It’s kind of like having a post office box. All the mail comes to the same physical street address (the outside IP address) but each letter is addressed to a unique box number (the port number), so everybody gets the right mail.
Of course, this example is very simple. In real life most networks have more than two hosts and almost all hosts open up many more than one session. Since each IP address has about 65,000 ports available, that is the absolute maximum number of sessions that one address can handle. This is plenty for most small networks, and larger ones can typically add a few outside addresses to what is called a NAT pool, so this scales pretty well at the LAN level. There are problems with this approach however.

Problems with NAT

The primary issue with NAT (we are still talking about NAPT/NAT-overlaod but just saying NAT is easier, it’s what most folks mean when they say NAT anyway) is that it breaks the end to end principle of networking. WTF does that mean? Well, basically, the end to end principle states that network communication should happen between the devices at the very edges of the network and that all devices in the middle should just pass that traffic along. This is a primary design goal of the Internet and for good reason. Without delving into all the gory philosophical details, we can over-simplify again and say that, in practical terms, the end to end principle allows network hosts to talk to each other unobstructed.
Aha! Now we start to see the problem with NAT: It introduces an obstruction into the communication path. In order for NAT to work, the device doing the NATing has to mangle each and every packet of data being sent and received. It must dig into those packets’ headersand change the source or destination address (and port number). The primary effect of this is that network hosts behind an overloaded NAT must initiate all communication. That is; inbound communication attempts are impossible because there is no way for a host outside of the local network to know or understand the inside (local/private) address of that host.
If we look at our laptop in the examples above, it has an RFC 1918 address of 192.168.1.42. Because this is a “private” address from a shared pool, it can’t be advertised on the public Internet. This means that other hosts on the Internet have no way to know the laptops address (and even if they did, there would be no route to get to it). See, we need that outside address and port combination set up on the router before any traffic can get in, to the laptop, so that the router knows where the traffic is supposed to go. This is fine for web browsing and many other client side applications but it is a major problem for server and peer to peer applications (VoIP, gaming, webcams, VPNs, bittorrent, video streaming, chat, etc) where communication needs to be initiated from the outside in. Major meaning they don’t work.

UPnP and NAT Traversal

Because there are simply not enough IPv4 addresses to provide a globally unique (public) address to all of the devices already connected to the Internet, network operators have been forced to deploy NAT in the LAN. As such, tools have been developed to help combat the problems it causes.
One notable hack is enabled through UPnP (Universal Plug and Play) and is called Internet Gateway Device (IGD) Protocol. UPnP-IGD is a protocol that gives the hosts behind a NAT some control over the NAT device. Things like discovering the router’s outside IP address and setting up static port mappings. This allows many of the applications that would otherwise be broken by the NAT to work in spite of it. NAT Port Mapping Protocol (NAT-PMP) offers similar functionality.
There are several other NAT traversal techniques and protocols that have been developed to get around the problems that NAT causes. These various tools are together a looming testament to two things:
  1. Human ingenuity and our ability to overcome obstacles.
  2. The brokenness that NAT introduces into inter-networking.

Security

While it pains me to do so, I feel compelled to address this largest myth of NAT: Network Address Translation is NOT a security technique!
Some folks will tell you that because hosts outside of a NAT’d network can not initiate communication with hosts inside of the NAT’d network that NAT is providing security. This is a lot like saying that since a key snapped off in the door lock will make it harder to get into my car, that it provides added security. While it may be true that jamming a door lock will make it harder to get in the car, I would have a hard time recommending that as an anti-theft method. There is a difference between broken access and access security.
Stateful packet inspection is what you want for network access security, not NAT brokenness.

The NAT444 Model

So far we have discussed traditional NAT (mainly NAPT/NAT overload) which is starting to be called NAT44 because it translates one IPv4 address for another (4 to 4). Now let’s explore what I (still) call NAT444, which was also called Carrier Grade NAT (CGN) at one point and is currently called Large Scale NAT (LSN) by all the cool geeks. I like NAT444 because it explains what is really going to occur in most places when LSN is implemented; a triple NAT (IPv4 to IPv4 to IPv4). Sounds like a nightmare already, doesn’t it? We just doubled the NAT, which means doubling the interference with network traffic and further impeding the end to end principle. Let’s see what that looks like.
This diagram illustrates the NAT444 model as envisioned in the IETF NAT444 draft and as it will most likely be deployed (unless we can figure out a way around it altogether). In this diagram you should notice two things right off the bat:
  1. Large Scale NAT for IPv4 will be deployed dual-stacked with global (“public”) IPv6.
  2. Large Scale NAT adds a second layer of NAT and thus a second area of “private”/inside addressing.
As you can guess from point two, NAT444 exacerbates all the problems that traditional NAT44 introduced. What may not be obvious at first is that LSN/NAT444 aggravates those issues in new ways as well.
In addition to adding a second layer of NAT that creates major problems with law enforcement and abuse logging as well as geolocation and others (all because many distinct customers are behind one provider address), we have to deal with the fact that the second layer of NAT is not going to participate in UPnP, NAT-PMP or other LAN-based NAT traversal protocols. No ISP (Internet Service Provider) in their right mind is going to open up their own routers (or other network devices) to customer control – which is exactly what these protocols require. They are simply not secure and the risk of one customer being able to impact other customers’ service is too great. So where does that leave us?

What NAT444 Breaks

We are left with a number of applications (and application types) that currently break when Large Scale NAT is introduced. To avoid the doom and gloom feeling that is sure to follow a list of just the broken stuff, let’s start with a list of what isn’t broken by NAT444/LSN:
  • Web browsing
  • Email
  • FTP download
    • Small files
  • BitTorrent and Limewire
    • Leeching (download)
  • Skype video and voice calls
  • Instant messaging
  • Facebook and Twitter chat
Not too shabby really, all things considered. That is quite a bit of functionality for being behind a fairly large kludge. If that were the end of the story I wouldn’t have written this article though. So, without further adieu, here is the list you’ve been waiting for; what NAT444 breaks:
  • FTP download
    • Large files
  • BitTorrent and Limewire
    • Seeding (upload)
  • On-line gaming
    • Xbox
    • PlayStation
    • Etc.
  • Video streaming
    • Hulu
    • Netflix
    • Slingcatcher
    • Etc.
  • Webcam
    • Remote viewing
  • Tunneling
    • 6to4
    • Teredo
    • Etc.
  • VPN & Encryption
    • IPSec
    • SSL
  • VoIP
    • Limited ALG/SIP support
  • All custom applications with the IP embedded
    • Lack of ALGs
Wow, is it just me or is that list a bit longer? There’s that doom and gloom feeling creeping up.
For our purposes here, “breaks” means that the service was degraded or completely failed. The data behind this list primarily comes from Assessing the Impact of NAT444 on Network Applications, an IETF draft which documents testing that was done by CableLabs, Time Warner Cable, and Rogers Communication on “many popular Internet services using a variety of test scenarios, network topologies, and vendor equipment.” If this kind of thing interests you at all, I highly recommend checking out the full draft, it’s a quick and informative read. I also have a bit of experience dealing with NAT444 myself, but that’s a story for another day.

Port Control Protocol

I would be remiss if I didn’t at least mention Port Control Protocol (PCP) in this discussion. The basic goal of PCP is to create a new, more advanced, technique to control port forwarding on NAT devices so that the brokenness fixed with protocols like UPnP-IGD and NAT-PMP in the LAN can be solved in a NAT444 (or other LSN) environment. I have not, as of yet, dug very deep into the work being done but I do see some challenges:
  1. New protocols require new equipment (or at least new code).
  2. Will providers sign on to allow customer application control of their network devices?
  3. Time. Is there enough? We will hit RIR exhaustion in at least 3 out of 5 regions before the proposed standards are published.

The Solution

You knew I was going to say it eventually: The only true solution is to deploy IPv6. This is why the NAT444 model includes global IPv6; the only way to get around the brokenness introduced by NAT is to eliminate it. Luckily we have the means to do so; Internet Protocol version 6 (IPv6). So, dual-stack today, with NAT444 if you must, and then do everything you can to get everyone you do business with to do the same.