Internet-Draft | Network-Based Website Fingerprinting | September 2020 |
Goldberg, et al. | Expires March 12, 2021 | [Page] |
The IETF is well on its way to protecting connection metadata with protocols such as DNS-over-TLS and DNS-over-HTTPS, and work-in-progress towards encrypting the TLS SNI. However, more work is needed to protect traffic metadata, especially in the context of web traffic. In this document, we survey Website Fingerprinting attacks, which are a class of attacks that use machine learning techniques to attack web privacy, and highlight metadata leaks used by said attacks. We also survey proposed mitigations for such leakage and discuss their applicability to IETF protocols such as TLS, QUIC, and HTTP. We endeavor to show that Website Fingerprinting attacks are a serious problem that affect all Internet users, and we pose open problems and directions for future research in this area.¶
Source for this draft and an issue tracker can be found at https://github.com/chris-wood/ietf-fingerprinting.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on March 12, 2021.¶
Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
Internet protocols such as TLS 1.3 [RFC8446] and QUIC [I-D.ietf-quic-transport] bring substantial improvements to end-users. The IETF engineered these with security and privacy in mind by encrypting more protocol messages using modern cryptographic primitives and algorithms, and engineering against flaws found in previous protocols, yielding several desirable security properties, including: forward-secure session key secrecy, downgrade protection, key compromise impersonation resistance, and protection of endpoint identities. Combined, these two protocols are set to protect a significant amount of Internet data. However, significant metadata leaks still exist for users of these protocols. Examples include plaintext TLS SNI and application-specific extensions (ALPN), as well as DNS queries. This information can be used by a passive attacker to learn information about the contents of an otherwise encrypted network connection. Recently, such information has also been studied as a means of building unique user profiles [li2018can]. It has also been used to build flow classifiers that aid network management [foremski2014dns].¶
In the context of Tor, a popular low-latency anonymity network, a common class of attacks that use metadata for such inference is called Website Fingerprinting (WF). These attacks use machine learning techniques built with features extracted from metadata such as traffic patterns to attack web (browsing) privacy. Miller et al. [miller2014know] show how these attacks can be applied to web browsing traffic protected with HTTPS to reveal private information about users. Pironti et al. [pironti2012identifying] use similar attacks based on data sizes to identify individual social media clients using encrypted connections. Fingerprinting attacks using encrypted traffic analysis are also applicable to encrypted media streams, such as Netflix videos. (See work from Reed et al. [reed2017identifying] and Schuster et al. [schuster2017beauty] for examples of these attacks.) WF attacks have also been applied to other IETF protocols such as encrypted DNS, including dnscrypt, DNS-over-TLS, and DNS-over-HTTPS [siby2018dns][shulman2014pretty]. In the past, they have also been conducted remotely [gong2010fingerprinting], using buffer-based side channels in a victim's home router.¶
Protocols such as DNS-over-TLS and DNS-over-HTTPS [RFC8484], and work-in-progress towards encrypting the TLS SNI extension [I-D.ietf-tls-esni], help minimize metadata sent in cleartext on the wire. However, regardless of protocol and even network-layer fingerprinting mitigations, application layer specifics, e.g., web page sizes and client request patterns, reveal a noticeable amount of information to attackers. We argue that much more work is needed to protect encrypted connection metadata, especially in the context of web traffic.¶
In this document, we describe WF attacks in the context of IETF protocols such as TLS and QUIC. We survey WF attacks and highlight metadata features and classification techniques used to conduct said attacks. We also describe proposed mitigations for these attacks and discuss their applicability to IETF protocols. We conclude with a discussion of open problems and directions for future research and advocate for more work in this area.¶
In this section we review how most secure Internet connections are made today. We omit custom configurations such as those using VPNs and proxies since they do not represent the common case for most Internet users. The following steps briefly describe the sequence of events that normally occur when a web client, e.g., browser, curl, etc., connects to a website and obtains some resource. First an unencrypted DNS query is sent to an untrusted DNS recursive resolver to resolve a name to an IP address. Upon receipt, clients then open a TCP and TLS connection to the destination address. During this stage, metadata such as the TLS SNI and ALPN values are sent in cleartext. The SNI is used to denote the destination application or endpoint to which clients want to connect. Servers use this for several purposes, including selecting an appropriate certificate (one with the SNI name in the SubjectAlternativeName list) or routing to a different backend terminator. ALPN values are used to negotiate which application-layer protocol will be used on top of the TLS connection. Common values include "http/1.1", "h2", and (soon) "h3". Upon connection, clients then send HTTP messages to obtain the desired resource.¶
Connections look different (on the wire) with TLS 1.3, encrypted DNS via DNS-over-TLS or DNS-over-HTTPS, and encrypted SNI. DNS queries are encrypted to a (trusted) recursive resolver and TLS metadata such as SNI are encrypted in transit to the terminator. Despite the reduction in cleartext metadata sent over the wire, there still remains several sources of information that an adversary may use for malicious purposes, including: size and timing of DNS queries and responses, size and timing or application traffic, and connection attempts induced while loading a web resource, e.g., Javascript files. So while technologies such as Encrypted SNI, DoT, and DoH help protect some metadata, they are not complete solutions to the larger problem. In the following section, we discuss this overarching problem in detail.¶
Website Fingerprinting (WF) is a class of attacks that exploit metadata leakage to attack end-user privacy on the Internet. In the WF threat model, Adv is assumed to be a passive and local attacker. Local means that Adv can associate traffic with a given client. Examples include proxies to which clients directly connect. Passive means that Adv can only view traffic in transit. It cannot add, drop, or otherwise modify packets between the victim client and server(s). Use of reliable and encrypted transport protocols such as TLS limit on-path attackers to eavesdropping on encrypted packets. (In QUIC, however, reordering packets is possible.)¶
Traffic features used for classification include properties such as packet size, timing, direction, interarrival times, and burstiness, among many others [wang2016website]. Normally, features are restricted to those which are extractable as a passive eavesdropper, and not those which are viewable by modifying client or server behavior. Specifically, this means that attacks such as CRIME [CRIME] and TIME [TIME], which rely on an attacker abusing TLS-layer compression to leak contents of an encrypted connection, are out of scope.¶
Website Fingerprinting attacks have evolved over the years through three phases: (1) Closed-world WF on SSL/TLS, (2) Closed-world WF on Tor, and (3) Open-world WF on Tor.¶
Bissias et al. [bissias2005privacy] use cross correlation of inter-packet times in one second time windows as an WF attack. Danezis [danezis2009traffic] model websites using a Hidden Markov Model (HMM) and use it, along with TLS traffic traces revealing only approximate lengths, to identify requested resources on a page. Their results vary the amount of information available to an adversary when building the HMM. Even in cases where resource popularity is omitted, which reflects the case where an adversary scrapes static websites, resource recall was high (86\%). Liberatore and Levine [liberatore2006inferring] proposed two WF attacks using the Jaccard coefficient and the Naive Bayes classifier. Herrmann et al. [herrmann2009website] extended the work of Liberatore and Levine with a multinomial Naive Bayes classifier computed using three input frequency transformations. Results yielded higher accuracy than that of Liberatore and Levine. Herrmann's attack is the best in this category, but the authors assume packets which do not fill a MTU represent packet trailers. Therefore, uniqueness is only accurate modulo the MTU. Efficacy is limited if endpoints pad packets to the MTU or another fixed length. Modern protocols such as HTTP/2, QUIC, and TLS 1.3 all provide some form of application-controlled padding. (Note: These attacks are not successful on Tor.)¶
Wang et al. [wang2013improved] tuned the OSAD-based attack to improve its accuracy. Specific changes include use of Tor cells instead of TCP packets for packet and burst lengths, as well as heuristics to remove SENDME cells (those not carrying application data) from flows to recover true burst lengths. The authors also modified the distance computation by removing substitutions, increasing the weight for egress packets, and varying the transposition cost across the packet sequence (large weights at the beginning of a trace, and smaller weights near the end, where variations are expected across repeated page loads.) Wang et al. also developed an alternate classifier with lower accuracy yet superior performance (quadratic to linear time complexity). It works by minimizing the sum of two costs: sequence transpositions and sequence deletions or insertions. These two costs are computed separately, in contrast to the first approach which computes them simultaneously.¶
Hayes et al. [hayes2016k] developed an attack called k-fingerprinting, which uses a k-NN classifier with features ranked by random decision forests. Their feature set includes timing information, e.g., statistics on packets per second, among the higher ranked features. (Higher ranked features have more weight in the classification phase.) Yan et al. [yan2018feature] used similar (manually curated) features with a CNN-based classifier. Time-based features were among the more effective features identified. Rahman et al. [rahman2019tik] improved time-based features by focusing on bursts, e.g., burst length, variance, inter-burst delay, etc., rather than more granular per-packet statistics. (The latter tend to vary for inconsistencies across packet traces for websites.) This improved accuracy of existing Deep Learning attacks from Sirinam et al. [sirinam2018deep], especially when coupled with packet direction information.¶
Kota et al. [abe2016fingerprinting] were the first to use Deep Learning (DL) methods based on Stacked Denoising Autoencoders for WF attacks. (Autoencoders reduce feature input dimensions when stacked.) Kota et al. form input vectors from Tor cell directions (+1 or -1). They use no other features. Using a (small) data set from Wang [wang2016website], the classifier achieves a 86% true positive rate and 2% false positive rate in the open world model. Rimmer et al. [rimmer2018automated] applied DL for automated feature generation and classifier construction. Trained with 2,500 traces per website, their system achieves 96.3% accuracy in the open world model. Recently, Bhat et al. [bhat2018var], Oh et al. [oh2017pfp], and Sirinam et al. [sirinam2018deep] used Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) for WF attacks. Results from Sirinam et al. show the best results - 98% on Tor without recent defenses (in Section {{defenses}) - while performing favorably when select defenses are used for both open and closed world models.¶
Yan et al. [yan2018feature] studied manual high-information feature extraction from packet traces. They "exhaustively" examined different levels of features, including packet, burst, TCP, port, and IP address, summing to 35,683 in total, and distilled them into a diverse set of uncorrelated features for eight different communication scenarios. Rahman [rahman2018using] studied the utility of features derived from packet interarrival times, including: median interarrival time (per burst), burst packet arrival time variance, cross-burst interarrival median differences, and others. Using a CNN, results show that these features yield a non-negligible increase in WF attack accuracy.¶
For all WF attacks, one limitation worth highlighting is the base rate fallacy. This can be summarized as follows: highly accurate classifiers with a reliable false positive rate (FPR) decrease in efficacy as the world size increases. Juarez et al. [juarez2014critical] studied its impact by measuring the Bayesian detection rate (BDR) in comparison to the FPR as a function of world size. As the world size increases, the BDR approaches 0 while the FPR remains stable, meaning that the probability of incorrect classifier results increase as well. Juarez et al. partially address the base rate fallacy problem by adding a confirmation step to their classifier. Another problem is that web content is (increasingly) dynamic. Most WF attacks, especially those in closed world models, assume that traces are static. However, Juarez et al. [juarez2014critical] show this is not the case even for "simple" pages such as google.com. Thus, due to the base fallacy rate and dynamic nature of content, classifiers require continual retraining in order to ensure accuracy.¶
There are at least two types of WF defenses: traffic shaping or morphing algorithms, and traffic splitting algorithms. This section describes and illustrates examples of both.¶
WF defenses are deterministic or randomized algorithms that take as input application data or packet sequences and return modified application data or packet sequences. Viable defenses seek to minimize the transformation cost and maximum (theoretical and perfect) attacker accuracy. Naive defenses such as sending a constant stream of (possibly random) bytes between client and server may be effective though clearly not viable from a cost perspective. Relevant cost metrics include bandwidth overhead, added time or latency (and its impact on related metrics such as page load time), and even CPU cost, though the latter is often ignored in favor of the former two. Wang [wang2016website] describe defenses as either limited or general. A limited defense is one which only helps mitigate specific WF attacks by transforming packets in a way to obviate a particular (set of) feature(s) used by said attacks. In contrast, general defenses help mitigate a variety of attacks.¶
Several general defenses have been proposed, including BuFLO [dyer2012peek], which pads packets to a fixed length of 1500B (the normal MTU) and schedules packets for transmission at fixed period intervals (and sends fake data if nothing is yet available). Tamaraw [wang2014comparing] is an improvement over BuFLO that uses two different fixed lengths for packet transmission, rather than one, to save on bandwidth overhead. Tamaraw also uses two different scheduling rates for ingress and egress packets. The authors chose to make the ingress packet period smaller than the egress packet period since HTTP responses are often larger in size and count - if HTTP Push is used - than requests. While provably correct, both BuFLO and Tamaraw limit the rate at which clients send traffic, and requires all clients to send at a uniform rate. Both requirements therefore make it difficult to apply as a generic defense in IETF protocols.¶
Wang et al. also developed Supersequence [wang2016website], which attempts to approximate a bandwidth-optimal deterministic defense. This is done by casting the padding and flow control problem as the shortest common subsequence (SCS) of the transformed packet trace. Supersequence approximates the solution by learning the optimal packet scheduling rate; it uses the same padding scheme as Tamaraw.¶
Walkie-Talkie [wang2015walkie] is a collection of mechanisms for WF defense. It includes running the client (browser) in half-duplex mode to batch requests and responses together, as well as randomly padding traffic so as to mimic traffic of benign websites. It assumes knowledge of traffic patterns for benign websites, which can be information learned over time or provided by a cooperating peer. Goldberg and Wang also propose a "randomized" variant that pads real bursts of requests and generates random request bursts according to a uniform distribution. The half-duplex mode could be implemented as an extension to a protocol such as HTTP/2, QUIC, or TLS.¶
Many limited defenses have also been proposed. We list prominent works below.¶
Traffic splitting is a type of defense wherein application data is sent over multiple, disjoint network paths. Multipath TCP (MPTCP) is one type of "traffic splitting" protocol, wherein an endpoint may send TCP segments for a single connection over multiple interfaces. This is commonly done for multi-homed devices, such as mobile devices with cellular and WiFi or wired network connections. Traffic splitting assumes that guided traffic distribution reduces information available to an adversary, and thereby decreases the success probability of WF attacks. Traffic splitting defenses differ in the algorithm used for traffic distribution.¶
Henri et al. [henri2020protecting] studied several traffic splitting algorithms, including: weighted and non-weighted round-robin path splitting, incoming and outgoing path split, fixed-probability splitting, and variants of per-connection uniform probability splitting. The best results came from a per-connection path splitting variant where the maximum number of packets sent on any given path was limited by a random variable chosen from a geometric distribution. (Once this limit was reached, a new path was selected uniformly at random.) De la Cadena et al. [de2019poster] also study path splitting algorithmss. They conclude that a weighted random path selection algorithm works best. (The authors do not give specifics of path weight probability derivation.)¶
To date, WF attacks target clients running over Tor or some other anonymizing service, meaning that WF attacks are likely more accurate on normal TLS-protected connections. Moreover, attacks normally assume clients use HTTP/1.1 with parallel connections for parallel resource fetches. In recent years, however, protocols such as SPDY, HTTP/2, and QUIC with built-in padding support and multiplexed stream-based connections should make existing attacks more difficult to carry out. That said, it is unclear how exactly these protocol design trends will impact WF attacks. A non-exhaustive list of questions that warrant further research are below:¶
Research into the above questions will help the IETF community better understand the extent to which WF attacks are a problem for Internet users in general.¶
It is worth mentioning that traffic-based WF attacks may not be required to achieve the desired goal of learning a connection's destination. Network connections by nature reveal information about endpoint behavior. The relationship between network address and domains, especially when stable and unique, are a strong signal for website fingerprinting. Trevisan et al. [trevisan2016towards] explored use of this signal as a reliable mechanism for website fingerprinting. They find that most major services (domains) have clearly associated IP address(es), though these addresses may change over time. Jiang et al. [jiang2007lightweight] and Tammaro et al. [tammaro2012exploiting] also previously came to the same conclusion. Address-based website fingerprinting was also explored by Patil and Borisov [patil2019can], wherein they showed that addresses, especially when grouped together as part of a single web page load, leak a substantial amount of information about the corresponding domain. Thus, in general, classifiers that rely solely on network addresses may be used to aid website fingerprinting attacks.¶
New protocols such as TLS 1.3 and QUIC are designed with privacy-protections in mind. TLS 1.3, for example, supports record-layer padding [RFC8446], although it is not used widely in practice. Despite this, TLS connections still leak metadata, including negotiatied ciphersuites. (See [fordTLSMetadata] for a discussion of this issue.) QUIC is more aggressive in its use of encryption as both a mitigation for middlebox ossificatiion and privacy enhancement. IPsec Traffic Flow Confidentiality [RFC4303] and Traffic Flow Security [I-D.ietf-ipsecme-iptfs] are two mechanisms by which endpoints can ESP datagrams to mask size metadata.¶
Future protocols should follow these trends when possible to remove unnecessary metadata from the network.¶
This document surveys security and privacy attacks and defenses on encrypted TLS connections. It does not introduce, specify, or recommend any particular mitigation to the aforementioned attacks.¶
This document makes no IANA requests.¶
The authors would like to thank Frederic Jacobs and Tim Taubert for feedback on earlier versions of this document.¶