TOC 
Network Working GroupA. Yourtchenko
Internet-DraftD. Wing
Intended status: Standards Trackcisco
Expires: February 26, 2011August 25, 2010


NAT confessions: revealing the hosts behind the translator
draft-yourtchenko-nat-reveal-hash-00

Abstract

When an IP address is shared among several subscribers, it is impossible to determine which subscriber has initiated that TCP connection.  This memo describes a technique to share the identity of a subscriber that initiated a TCP connection with the TCP server.. The proposed method avoids altering the application-level payload and works well with SSL-protected connections.

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

This Internet-Draft will expire on February 26, 2011.

Copyright Notice

Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.



Table of Contents

1.  Introduction
2.  Notational Conventions
3.  Description
4.  Calculating the Internal Address Mapping
5.  Calculating the Verifier
6.  Encoding of the VFY into the packet: IP ID encoding
7.  Encoding of the VFY into the packet: TSval encoding
8.  Operation of the mechanism
    8.1.  Translator Operation
    8.2.  Server Operation
9.  Interaction with TCP SYN cookies
10.  Other Mechanisms to Encode Client Identifier
    10.1.  Defining a new TCP option to store the address
    10.2.  Using TSecr in TCP SYN
    10.3.  Reserving the different port ranges per client
11.  Security Considerations
12.  IANA considerations
13.  Acknowledgements
14.  References
    14.1.  Normative References
    14.2.  Informative References
§  Authors' Addresses




 TOC 

1.  Introduction

There are several scenarios where it is valuable to know the identity of a TCP client, including geolocation, DoS blocking, and spam blacklists. Today, this is done by equating IPv4 address with 'identity'.  However, the identity of a TCP client is obscured when an IP address is shared I-D.ietf-intarea-shared-addressing-issues (Ford, M., Boucadair, M., Durand, A., Levis, P., and P. Roberts, “Issues with IP Address Sharing,” June 2010.) [I‑D.ietf‑intarea‑shared‑addressing‑issues].  IP address sharing is done by both network address and port translators (NAPT) and by application-layer proxies (e.g., HTTP or FTP proxies).

The current state of the art requires the address sharing alter the application-level payload and include the identity of the internal host -- usually the internal host's private IP address.  This incurs several drawbacks,

 With SSL-protected applications the current state of the art requires breaking the end-to-end encrypted connection. This results in several undesirable consequences:

This specification avoids the problems described above, and defines the method of communicating the TCP client's identity to the TCP server by overloading the TCP timestamp field and IP Identifier field of the initial TCP SYN.  

This extension is necessary because IP address sharing, deployed by NAT64 devices, will allow malicious users to connect to IPv4-capable servers.  Thus, until a server is only accessible via IPv6 (and inaccessible via IPv4), the IPv4-capable server will suffer from an inability to identify individual TCP clients as discussed in I-D.ietf-intarea-shared-addressing-issues (Ford, M., Boucadair, M., Durand, A., Levis, P., and P. Roberts, “Issues with IP Address Sharing,” June 2010.) [I‑D.ietf‑intarea‑shared‑addressing‑issues].



 TOC 

2.  Notational Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.) [RFC2119].



 TOC 

3.  Description

This proposal leverages the common deployment of  TCP timestamps and that a timestamp-aware TCP server will echo the timestamp..

The caveat with the above is that the remote peer must know in advance if the TCP client implements this technique or not -- the timestamp on the server side looks just the same. This could be resolved by manual configuration but that is impractical, so an automatic detection mechanism is proposed. The automatic mechanism  calculates a hash over the values of interest and placing the result into another field. The receiver can then perform the same operation and verify. If the received and computed values match, then the TCP timestamp received does contain the encoded internal address. The verifier value is computed as a hash function over the mapped value encoded into the timestamp, address after translation, and the TCP initial sequence number - i.e. the sequence number within the SYN segment. The usage of the TCP initial sequence number allows to avoid the verifier value being almost always the same. The reason for doing so is to satisfy the protocol constraints of the field that is used to convey this value.

In order to find some place for storing this verification value, we make another observation: TCP SYN segments are generally rather small, and the minimum MTU on IPv4 is 576. Typical stacks send the TCP SYN with DF=1. Therefore, they would never be fragmented. This means we could use the 16-bit value of the IP ID to put the verifier value in. The verifier is dependent on the initial sequence number (ISN) -- which is should have some randomness properties as described in RFC1948 (Bellovin, S., “Defending Against Sequence Number Attacks,” May 1996.) [RFC1948], therefore the IP ID will be reasonably different to still serve its purpose even in the extremely unlikely case that the TCP SYN is fragmented.

Using a 16-bit value as a verifier gives 1 in 65536 chances (or, 0.0015%) probability of erroneously judging that the timestamp contains the encoded internal address. This may be insufficient assurance for some of the scenarios. Therefore, we calculate the verifier (referred to as VFY value) to be a 32-bit integer - and store 16 or more bits of this value - at the expense of storing less bits of Internal Address Mapping (iAM). However, we expect that the range of iAM for a single public translation would be relatively small - so, no information will be lost in this process.



 TOC 

4.  Calculating the Internal Address Mapping

The main useful property of iAM is that it MUST stay the same for the same internal address unless the configuration on the translator has changed. Since the goal is to provide the stable mapping, rather than fully reveal the internal address, any method that has this property is acceptable - and the choice of it is left to the implementors of the translator. If the addresses to be translated are configured as a prefix, then the iAM can be obtained just by taking the host bits of the address within the prefix. If the assignment of these addresses is on an individual basis, then the simple enumeration might be used. If the internal addresses are assigned to the pool as set of subnets - then the combination of the two methods above (the host bits in the least significant part, and the enumeration in the most significant part) will give good results. This also stimulates allocation of the internal address in equal-sized chunks, which should make the maintenance of the network easier.

As a result, the calculation of the iAM on the outgoing SYN segment MUST return two values:

The minimum value of siAM being 9 was chosen based on the following logic:

By encoding only the significant bits of the internal address mapping the operator of the translator can minimize the probability of the error - all the unused bits are allocated for the value used to "fingerprint" the presence of the internal identifier. The more bits this "Verifier" value can contain - the less is the chance of accidental match - and erroneous record of the internal identifier when there is none.

The range from 9 bits to 24 bits allows to encode between 512 and 16777216 internal identifiers for a single public IP address.



 TOC 

5.  Calculating the Verifier

The verifier is calculated as a 32-bit result of a hash function. This hash function is not expected to be cryptographically strong (the 'Security considerations' section explains why), however it should have good distribution, good collision resistance, good avalanche behavior and be fast and cheap to compute. These properties are satisfied by Murmur hash (, “Murmur hash,” .) [URL.Murmur‑hash] function, therefore it is the hash that we will use.

The calculation of the VFY is performed as follows:

VFY = murmur(iAM | AddrPub | siAM, TCP-ISN)



 TOC 

6.  Encoding of the VFY into the packet: IP ID encoding

The low 16 bits of the VFY are encoded in network order into the IP ID of the packet after translation. the remaining 16 bits form the "VFYhi" value, which we attempt to fit into the TSval along with the other information.



 TOC 

7.  Encoding of the VFY into the packet: TSval encoding

The TCP timestamp field encodes the iAM and VFYhi as follows:

 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|E E E E|S S S S| iAM MSB ... iAM LSB  | VFYhi MSB .. VFYhi LSB |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The range of siAM gives 16 possible ways to store iAM (along with the same number of degrees of assurance for the detection). In order to distinguish between those, we introduce the encoding selector (S) field, which will determine how the lower 24 bits are split between the iAM and the upper 16 bit of VFY. Note that the smallest value of siAM being 9, we will never be able to store the most significant bit of VFY.

The value of S is the number of zero-fill right-shift operations it would take on the low 24 bit in order to "normalize" the iAM - or, in other words, it is the number of bits of VFYhi stored within the timestamp.

Best practices in I-D.ietf-tcpm-tcp-timestamps (Gont, F., “Reducing the TIME-WAIT state using TCP timestamps,” June 2010.) [I‑D.ietf‑tcpm‑tcp‑timestamps], mention that to reduce the TIME-WAIT state the timestamp value should be monotonously increasing across the connections with the same 5-tuple. To give the translators an opportunity to achieve this property, we reserve several most significant bits within the timestamp to signify the "Epoch" (E).This would require storing some additional state per 5-tuple, and the implementation of such a mechanism is outside of scope for this document. The implementations that do not implement the monotonously increasing timestamps, MUST keep the Epoch bits intact from the original value of the timestamp.



 TOC 

8.  Operation of the mechanism

This section outlines the use of this mechanism by the translators and servers.



 TOC 

8.1.  Translator Operation

The translator is involved into processing of the initial SYN segment (calculating the new version of the TCP timestamp and IP ID), as well as the SYN-ACK segments (restoring the original value of the TCP timestamp within the TSecr field).



 TOC 

8.2.  Server Operation

The server would operate on every SYN that is of interest for the logging. It would extract the candidate iAM, and calculate the VFY value based on the public address and TCP ISN within the received SYN segment. Then it would compare the VFY against the corresponding bits in the TSval and IP ID fields. If there is a match, it means (with a reasonable probability) that the iAM was a valid one calculated by the translator inbetween.  This information is stored for later access by the application listening on that socket (e.g., stored in the TCB).



 TOC 

9.  Interaction with TCP SYN cookies

TCP SYN cookies are commonly deployed to mitigate TCP SYN attacks RFC4987 (Eddy, W., “TCP SYN Flooding Attacks and Common Mitigations,” August 2007.) [RFC4987]. The mechanism described in this document requires the server store extra information which arrives on the TCP SYN, which increases the TCP server's attack surface.  To mitigate this, the translator should apply the similar algorithm to the timestamp of the ACK segment that is sent by the initiator of the connection in response to the server's SYN ACK. The authors considered that serverside might use the TSval in its SYN ACK segment, however this would interfere with the Extended syncookies. This section needs further discussion.



 TOC 

10.  Other Mechanisms to Encode Client Identifier

This section outlines other mechanisms that we considered, and outlines the reasons we consider them not applicable.



 TOC 

10.1.  Defining a new TCP option to store the address

This would be the cleanest and simplest approach, and is discussed in [ I-D.wing-reveal-address]. 



 TOC 

10.2.  Using TSecr in TCP SYN

This value is set to zero, and is effectively unused - so it looks like a convenient place. However this violates the RFC1323 (Jacobson, V., Braden, B., and D. Borman, “TCP Extensions for High Performance,” May 1992.) [RFC1323], and this would require much more thorough testing - and update to RFC1323 (Jacobson, V., Braden, B., and D. Borman, “TCP Extensions for High Performance,” May 1992.) [RFC1323].



 TOC 

10.3.  Reserving the different port ranges per client

This approach has an appeal due to its simplicity, but it would be specific to each NAPT device operated by each service provider.  That is, there is no way to identify the device or know the source port range assigned to an TCP client without contacting the administrator of the NAPT device.  Restricting clients to a specific range also exposes the clients to some security risk I-D.ietf-tsvwg-port-randomization (Larsen, M. and F. Gont, “Transport Protocol Port Randomization Recommendations,” August 2010.) [I‑D.ietf‑tsvwg‑port‑randomization].



 TOC 

11.  Security Considerations

The connections that happen, today, without aNAPT necessarily reveal the source address of the TCP client -- so revealing the identity of the client this should not be a concern except for the installations that attempt to use NAPT for "privacy" reasons. If such an installation exists, it is easy to see that any 1:1 remapping of e.g., IP ID would cause the failure of the validation algorithm - therefore "protecting the identity". 

Therefore, if an organization has more than one level of NAPT and wants to ensure that the internal translators do not disclose the information about the internal addresses, it can alter any of the elements used for the calculations - e.g. randomize the ISN, or remap the IP ID.

An attacker might might use this functionality to appear as if IP address sharing is occuring, in the hopes that a naive server will allow additional attack traffic. TCP servers and applications SHOULD NOT assume the mere presence of the functionality described in this paper indicates there are other  (benign) users sharing the same IP address.

The modification of the TSVal option value will break TCP-AO  RFC5925 (Touch, J., Mankin, A., and R. Bonica, “The TCP Authentication Option,” June 2010.) [RFC5925], which provides integrity protection of the  TCP SYN (including TCP options).  However, TCP-AO is already known to not survive address sharing (through a NAPT or through an application  proxy).  



 TOC 

12.  IANA considerations

None.



 TOC 

13.  Acknowledgements

Thanks to Nicholas Leavy for the review.



 TOC 

14.  References



 TOC 

14.1. Normative References

[RFC1323] Jacobson, V., Braden, B., and D. Borman, “TCP Extensions for High Performance,” RFC 1323, May 1992 (TXT).
[RFC2119] Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML).
[RFC5925] Touch, J., Mankin, A., and R. Bonica, “The TCP Authentication Option,” RFC 5925, June 2010 (TXT).


 TOC 

14.2. Informative References

[I-D.ietf-intarea-shared-addressing-issues] Ford, M., Boucadair, M., Durand, A., Levis, P., and P. Roberts, “Issues with IP Address Sharing,” draft-ietf-intarea-shared-addressing-issues-01 (work in progress), June 2010 (TXT).
[I-D.ietf-tcpm-tcp-timestamps] Gont, F., “Reducing the TIME-WAIT state using TCP timestamps,” draft-ietf-tcpm-tcp-timestamps-00 (work in progress), June 2010 (TXT).
[I-D.ietf-tsvwg-port-randomization] Larsen, M. and F. Gont, “Transport Protocol Port Randomization Recommendations,” draft-ietf-tsvwg-port-randomization-09 (work in progress), August 2010 (TXT).
[RFC1948] Bellovin, S., “Defending Against Sequence Number Attacks,” RFC 1948, May 1996 (TXT).
[RFC4987] Eddy, W., “TCP SYN Flooding Attacks and Common Mitigations,” RFC 4987, August 2007 (TXT).
[URL.Murmur-hash] Murmur hash.”


 TOC 

Authors' Addresses

  Andrew Yourtchenko
  cisco
  6a de Kleetlaan
  Diegem 1831
  BE
Phone:  +32 2 704 5494
Email:  ayourtch@cisco.com
  
  Dan Wing
  cisco
  170 West Tasman Drive
  San Jose CA 95134
  USA
Email:  dwing@cisco.com