Network Working Group | D.F. Skoll |
Internet-Draft | Roaring Penguin Software Inc. |
Intended status: Informational | October 21, 2011 |
Expires: April 23, 2012 |
Reputation Reporting Protocol
draft-dskoll-reputation-reporting-03
This draft specifies a protocol for reporting various events associated with IP addresses. These events can be collected and aggregated to form a database containing information about the reputation of IP addresses.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 23, 2012.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
Several organizations (Project Honeypot, Spamhaus, RepuScore, etc.) maintain databases detailing the reputation of various IP address. These organizations use various ad hoc methods to collect the reputation data. There is no standard for reporting events to reputation-collectors; this makes it hard to instrument a large number of systems to be data gatherers in a standard way.
This draft proposes a standard for reporting events back to a reputation-collecting system. A standard way to report events will make it easy to instrument systems to collect data and report it to one or more reputation-collection systems.
This standard has the following goals:
This protocol is currently in use by Roaring Penguin Software Inc's CanIt anti-spam products. A web page devoted to the protocol is found at http://www.mimedefang.org/reputation; this page includes a link to a mailing list for discussions of the protocol.
An EVENT is the basic unit of reputation reporting. An event consists of an IPv4 or IPv6 address and an associated event TYPE. Examples of event types are "SMTP client at IP address issued an SMTP RCPT command for a nonexistent recipient" or "SMTP client at IP address delivered an email message considered by a human to be spam."
A SUBREPORT is usually a list of one or more events. However, a subreport may contain other types of information such as software name, version, etc.
A SENSOR is a system that reports events.
An AGGREGATOR is a system that receives events and builds a reputation database.
A REPORT is a series of subreports sent from a sensor to an aggregator.
A USER NAME is included in each report. The user name is a key used by the aggregator to look up a shared secret.
A SHARED SECRET is used by the sensor and aggregator to authenticate reports.
A REPUTATION DATABASE is a dictionary indexed by IPv4 or IPv6 address that contains reputation information about a particular IP address. Note that the exact format of the reputation database as well as what constitutes "reputation" are beyond the scope of this document. We are concerned only with a standard for reporting events.
Reports are transported via UDP. The aggregator SHOULD listen for UDP packets on port 6568, the IANA-assigned port number for this protocol [PORTS].
A report consists of the following items:
A subreport consists of either the single byte 0, indicating the end of the subreports, or a three-byte preamble followed by the subreport contents. The three-byte preamble consists of:
The following subreport formats are defined. The value of the FORMAT byte is given in decimal.
An aggregator MUST skip over subreports with format values it does not understand. (It can do this by skipping ahead LENGTH bytes.)
An aggregator MUST ignore the entire report if any subreports have invalid LENGTHs. For example, it MUST reject a subreport of IPv4-EVENTS if the LENGTH is not a multiple of 5. The aggregator SHOULD log information about invalid reports.
The following sections define exactly how various subreports are formatted.
FORMAT values 1 and 2 specify IPv4 and IPv6 subreports, respectively. Each IPv4 and IPv6 subreport contains one or more events. The length of each IPv4 event is 5, and the length of each IPv6 event is 17. The events themselves consist of:
The following event types are defined for IPv4 and IPv6 events:
FORMAT values 3 and 4 specify REPEATED-IPv4 and REPEATED-IPv6 subreports, respectively. Each REPEATED-IPv4 and REPEATED-IPv6 subreport contains one or more events. The length of each REPEATED-IPv4 event is 6, and the length of each REPEATED-IPv6 event is 18. The events themselves consist of:
The EVENT TYPE field for a REPEATED-IPv4 or REPEATED-IPv6 event is exactly the same as for IPv4 or IPv6 events. The extra REPEAT byte allows a sensor to compress up to 255 identical IP address events into a single REPEATED event.
FORMAT value 5 specifies the VENDOR-NUMBER subreport. This subreport MUST have a LENGTH of 3. The 3-byte content of the subreport is a 24-bit unsigned integer in network byte order specifying an IANA-assigned Private Enterprise Number [PEN]. The VENDOR-NUMBER subreport is only required if there are subsequent VENDOR-SPECIFIC subreports.
FORMAT value 6 specifies the SOFTWARE-NAME subreport. The LENGTH can range from 1 to 63. The contents MUST be a valid UTF-8 string specifying the name of the software running on the aggregator. The string MUST NOT be zero-terminated. An aggregator MAY include a SOFTWARE-NAME subreport in each report, but MUST NOT include more than one SOFTWARE-NAME subreport.
FORMAT value 7 specifies the SOFTWARE-VERSION subreport. The LENGTH can range from 1 to 31. The contents MUST be a valid UTF-8 string (and SHOULD be a valid ASCII string) specifying the version of the software running on the aggregator. The string MUST NOT be zero-terminated.
An aggregator MAY include a SOFTWARE-VERSION subreport in each report, but MUST NOT include more than one SOFTWARE-VERSION subreport. If an aggregator includes a SOFTWARE-VERSION subreport, it MUST also include a SOFTWARE-NAME subreport.
FORMAT value 8 specifies an END-USER subreport. This report, if present, provides additional information about the specific user who generated subsequent subreports. For example, if an ISP is reporting events, the END-USER subreport may contain enough information for the ISP to determine which of its subscribers reported the events. The contents of the END-USER subreport are a sequence of bytes ranging in length from 1 to 31 bytes. The contents are opaque and have meaning only to the organization running the sensor.
FORMAT values 128 through 254 specify a VENDOR-SPECIFIC subreport. A VENDOR-SPECIFIC subreport MUST be preceded by a VENDOR-NUMBER subreport. A given report MAY contain more than one VENDOR-NUMBER subreport; the interpretation of a VENDOR-SPECIFIC subreport MUST be made according to the VENDOR-NUMBER subreport that most closely preceded the VENDOR-SPECIFIC subreport.
The contents of the subreport are vendor-specific and not defined here. An aggregator that does not understand a VENDOR-SPECIFIC subreport for a given VENDOR-NUMBER MUST ignore the subreport.
Normally, a sensor detects events directly (for example, it communicates with an MTA to detect events) and reports them to an aggregator, which builds the event database. However, it may be desirable to have an aggregator take all of the events from its sensors and report them to a higher-level "upstream" aggregator.
Allowing aggregators to report events to other aggregators introduces the possibility of reporting loops. To break these loops, aggregators MUST include a COLLECTOR-LEVEL subreport. It MUST be the first subreport in the report.
A sensor that detects events directly SHOULD NOT include a COLLECTOR-LEVEL subreport. However, if it does include one, it MUST specify a level of zero.
An aggregator that will forward reports to another aggregator MUST be configured to have an "intrinsic level" greater than zero. The aggregator MUST reject reports with a COLLECTOR-LEVEL greater than or equal to its intrinsic level. When the aggregator forwards reports, it MUST include a COLLECTOR-LEVEL subreport containing its intrinsic level. (If the original report included a COLLECTOR-LEVEL subreport, the aggregator MUST NOT include the original COLLECTOR-LEVEL report, but MUST replace it with its own COLLECTOR-LEVEL.) Note that assignment and management of intrinsic levels is beyond the scope of this document; such assignment must be agreed upon by aggregator managers based on the hierarchy of sensors and aggregators.
FORMAT value 127 specifies the COLLECTOR-LEVEL subreport. The LENGTH MUST be 2, and the contents are an unsigned 16-bit integer in network byte order specifying the intrinsic level of the agent sending the report. If the COLLECTOR-LEVEL subreport is present, it MUST be the first subreport. If it is missing, the aggregator MUST assume a COLLECTOR-LEVEL of zero.
A sensor MUST NOT report GREYLISTED or UNGREYLISTED events unless it implements greylisting. A sensor MUST NOT report HAND-SPAM or HAND-HAM events unless it has a reliable system for accepting a human's decision about a message and can show beyond a reasonable doubt that a human in fact made the decision about the reported message. A sensor MUST NOT report VALID-RECIPIENT or INVALID-RECIPIENT events unless it is capable of validating recipient addresses.
A sensor MUST NOT report events for an IPv4 address that is not a globally-routable unicast address. In particular, no IPv4 address in the Private Address Space of [RFC1918] should appear in a report, nor should any address in 127/8 nor 224/4. An aggregator MUST ignore such addresses and SHOULD log the fact that they were received.
A sensor MUST NOT report events for an IPv6 address that is not an Aggregatable Global Unicast Address as defined in [RFC4291]. An IPv4-compatible IPv6 address or an IPv4-mapped IPv6 address MUST be reported as an IPv4 event and not an IPv6 event. An aggregator MUST ignore events that violate this requirement and SHOULD log the fact that they were received.
A sensor MUST NOT report a VIRUS event unless a virus was detected using a signature-based virus scanner.
A sensor SHOULD include as many events in its report as necessary to make the report size at least 400 bytes. A sensor MAY send out a shorter report, but MUST NOT do so unless failing to do so would result in loss of data OR unless it has not sent a report within the last hour. For example, if a sensor process is about to exit and has buffered events in memory, it SHOULD report the buffered events before exiting, even if the report size would be less than 400 bytes and even if a report has been sent within the last hour.
A sensor SHOULD NOT send a report that would exceed 492 bytes unless it has a priori knowledge that such a large UDP datagram will be received intact by the aggregator. An aggregator MUST be prepared to handle a report as large as the largest possible UDP datagram (65507 bytes of actual data.)
A sensor MUST NOT send an empty report (that is, a report with no subreports.)
When an aggregator logs information about a report, it MUST log the originating IP address of the report. If the report is well-formed, it MUST log the user name associated with the report. It SHOULD log additional information concerning the disposition of the report and the reason for the disposition.
If Hierarchical Aggregation is being used, a sensor MUST NOT report events to more than one aggregator. A lower-level aggregator MUST NOT forward events to more than one higher-level aggregator. These restrictions are required to avoid the possibility of counting the same event more than once.
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VERSION (=2) | USERNAME LEN | USERNAME ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | USERNAME continued | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EIGHT RANDOM BYTES | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| | EIGHT RANDOM BYTES continued | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TIMESTAMP | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | FORMAT 1 | LENGTH 1 | DATA 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DATA 1 CONTINUED ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | FORMAT 2 | LENGTH 2 | DATA 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DATA 2 CONTINUED ... | EOR (=0) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | HMAC DIGEST | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | HMAC DIGEST continued | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | HMAC DIGEST continued | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The following diagram illustrates the layout of a report. The version byte starts immediately after the UDP header.
Note that there are no alignment requirements or padding bytes. The preceding figure shows some fields aligned to a four-byte boundary purely for ease of drawing.
02 -- Version 2 of the protocol 03 -- Length of user name is 3 bytes 64 66 73 -- "dfs" in UTF-8 2a 9a 82 d6 51 29 64 f7 -- 8 random bytes 4b d9 da eb -- 32-bit timestamp 01 00 0a -- Two IPv4 events (Fmt 1, Len 10) c0 00 02 02 03 -- 192.0.2.2 sent auto-spam (event type 3) c0 00 02 03 01 -- 192.0.2.3 was greylisted (event type 1) 03 00 06 -- One Repeated-IPv4 event (Fmt 3, Len 6) c0 00 02 04 08 03 -- 192.0.2.4 sent to invalid recip 3 times. 02 00 11 -- One IPv6 event (Fmt 2, Len 17 decimal) 20 01 0d b8 00 1d 00 e4 -- IPv6 address (first 8 bytes) 02 e0 18 ff fe ab 14 7f -- IPv6 address (last 8 bytes) 07 -- Event type 7 (Valid Recipient) 00 -- Subreport Format 0 (End of Reports) 0c 10 51 0f 5d -- 10-byte truncated SHA1 HMAC 7e a1 e0 aa 20 -- 10-byte truncated SHA1 HMAC cont'd
The following figure shows a sample report. The report is from the user "dfs" with password "foo". It contains two IPv4 events: 192.0.2.2 sent mail automatically classified as spam, and 192.0.2.3 was greylisted. There is a repeated IPv4 event: 192.0.2.4 sent to an invalid recipient 3 times. Finally, There is one IPv6 event: 2001:db8:1d:e4:2e0:18ff:feab:147f sent mail to a valid recipient. The bytes are shown in hexadecimal.
This document defines a one-byte SUBREPORT FORMAT. Formats 0 through 8 and 127 are defined in Section 4.2 of this document. The IANA should set up the "IP Reputation Reporting Format Numbers" registry for registering formats 9 through 126. Formats 128 through 254 are available for private use without requiring IANA registration. Format 255 should not be used.
This document defines a one-bye EVENT TYPE. Event types 1 through 9 are described in Section 5.1.1 of this document. Event type 0 is reserved. The IANA should set up the "IP Reputation Reporting Event Types" registry for registering event types 10 through 255.
Because reports are transmitted via UDP, aggregators are vulnerable to IP address spoofing. An aggregator MAY use techniques other than HMAC verification to reject reports. For example, if it knows that a particular user always sends reports from a restricted set of IP addresses, it MAY discard reports claiming to be from that user if they originate from outside the restricted set of addresses. The aggregator SHOULD log information about discarded reports.
Reports are transmitted in the clear. An eavesdropper can easily sniff user names from the reports. The eavesdropper can also gain information about SMTP traffic as seen by sensors. If this is undesirable, the sensor SHOULD arrange to transmit data to the aggregator over a secure channel using IPSec or some other VPN solution.
The shared secret SHOULD be at least 8 bytes long and SHOULD be generated by a cryptographically-secure random number generator. The shared secret SHOULD NOT be chosen by a human being because of the risk of picking a weak secret or a secret guessable from the user name.
An aggregator SHOULD use the time stamp and random-number fields to detect duplicate reports and fend off replay attacks. An aggregator SHOULD NOT accept a report whose timestamp is more than two minutes away from the current time. (For this reason, both aggregators and sensors SHOULD synchronize their clocks to a standard time source using NTP [RFC1305].)
Aggregators implicitly trust sensors. Therefore, aggregator administrators SHOULD ensure that sensor administrators follow security best-practices. Both aggregator and sensor administrators MUST take the utmost care to keep their shared secret from leaking out.
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC1305] | Mills, D., "Network Time Protocol (Version 3) Specification, Implementation", RFC 1305, March 1992. |
[RFC1918] | Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G. and E. Lear, "Address Allocation for Private Internets", BCP 5, RFC 1918, February 1996. |
[RFC4291] | Hinden, R. and S. Deering, "IP Version 6 Addressing Architecture", RFC 4291, February 2006. |
[RFC2104] | Krawczyk, H., Bellare, M. and R. Canetti, "HMAC: Keyed-Hashing for Message Authentication", RFC 2104, February 1997. |
[PEN] | IANA, , "Private Enterprise Numbers ", 2010. |
[PORTS] | IANA, , "Port Numbers ", 2010. |
[GREY] | Harris, E., "The Next Step in the Spam Control War: Greylisting ", August 2003. |