TOC |
|
A method for obscuring location information is described. Both static and changing location information can be obscured. A single distance measure is input to the process; this parameter controls the precision of location information that can be extracted by a recipient.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
This Internet-Draft will expire on July 18, 2011.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
1.
Introduction
2.
Method Characteristics and Applicability
3.
Obscuring Static Locations
3.1.
Known Point Locations
3.2.
Known Locations with Uncertainty
3.3.
Selecting a Offset Vector
3.3.1.
Angle and Distance Method
3.3.2.
Square Peg Method
3.3.3.
Randomness Requirements
3.4.
Multiple Reported Locations
4.
Obscuring Changing Locations
4.1.
Update Conditions
4.1.1.
Bad Triggers
4.1.2.
Hidden Trigger
4.2.
Consecutive Reported Locations
4.2.1.
Reducing Variation between Offset Vectors
4.2.2.
Trade-off in Reducing Variation
4.3.
Returning to the Same Location
4.3.1.
Positional Stability
4.3.2.
Triggering with Positional Stability
4.3.3.
Selecting a Grid
4.3.4.
Random Grid
4.3.5.
Linear Interpolation of Random Offsets
4.3.5.1.
Uniformly Distributed Interpolation
4.3.5.2.
Applying Uniformly Distributed Interpolation
4.3.5.3.
Selecting an Appropriate Grid Size
4.3.6.
The Wonky Grid
4.3.6.1.
Wonky Grid Points at the Poles
4.3.6.2.
Interpolation About the 180th Meridian
4.3.7.
Temporal Interpolation
5.
Examples
6.
Acknowledgements
7.
IANA Considerations
8.
Security Considerations
9.
Informative References
Appendix A.
Sample Implementation
§
Author's Address
TOC |
A method for obscuring location information is described. This method obscures location information such that it can be provided to recipients without revealing the location of the subject to within the desired distance.
Obscuring location has applications for protecting privacy, as described in [I‑D.ietf‑geopriv‑policy] (Schulzrinne, H., Tschofenig, H., Morris, J., Cuellar, J., and J. Polk, “Geolocation Policy: A Document Format for Expressing Privacy Preferences for Location Information,” October 2010.).
This method uses a single configuration parameter as input: an obscuring distance.
A location recipient (or recipient) is the entity that is given location about a target entity. The goal is to ensure that the recipient is unable to recover location information with better accuracy than is desired. Despite this obscuring the recipient should still be able to use the reported locations.
The obscuring process takes a series of known locations, which might have greater accuracy than the recipient is permitted to receive. The obscuring process produces a series of reported locations.
TOC |
The method described here is intended to provide limited protection for location information by constrained degradation. The method has the following characteristics:
- Simple Configuration:
- It might be possible to define a more complete solution for obscuring location information that is more configurable. However, a more configurable option would also demand greater involvement from users so that they would be able to specify a configuration that meets their goals. This method is designed to be easy to understand, which increases the chances that a user is able to successfully choose an appropriate configuration. The method has just one input parameter: the obscuring distance.
A separate parameter for the size of the grid used in the algorithm can affect results; a fixed value is recommended in this document.- Irreversible:
- Obscuring is intended to be irreversible. Information is lost by applying the process. Multiple applications of this process to the same input location is could reduce information more than a single application of the process with the largest obscuring distance.
- Increases Uncertainty:
- A recipient does not need to treat obscured location information any differently to location information that contains uncertainty. The uncertainty of the reported location is increased so that the reported location includes the known location. Thus, the information that is reported is correct, though the accuracy might be reduced. This document relies on a definition of uncertainty for location described in more detail in [I‑D.thomson‑geopriv‑uncertainty] (Thomson, M. and J. Winterbottom, “Representation of Uncertainty and Confidence in PIDF-LO,” May 2010.).
- Two Dimensions:
- The method described in this document operates in two dimensions only. Many of the principles might be applicable in a higher number of dimensions, though no effort has been made to validate their integrity. A three-dimensional location can be reduced to a two-dimensional form for use in this algorithm. This is not contrary to the goal of reducing the amount of information provided.
- Time Invariant:
- The method described in this document does not use time. An entity performing obscuring does not need to consider time in applying this method. Only the location is protected, not the time that the location was determined. The time from the known location is included in the reported location.
- Obscuring Distance Not Secret:
- No attempt is made to protect the obscuring distance as a secret. It is assumed that a recipient is able to learn this value.
- Minimal State:
- An entity that performs obscuring of locations often performs this service for the combination of many targets and recipients. This process requires only that the obscuring entity hold maintain a trigger location for each recipient. The additional state that an obscuring entity retains in order to apply this obscuring method is a small increment over what is typically required. The current known location does not need to be retained; it need only be reacted to when it changes.
TOC |
A static location doesn't change. That is, different locations are not attributed to a single target at different times.
The basic location obscuring case involves a single, isolated instance of location information.
It might be appropriate to apply just this section in protecting the privacy of a single location. A recipient must be unable to acquire multiple location instances for the same entity if this is the only form of obscuring used.
TOC |
A known point location can be obscured by adding a randomized offset vector to the location. The size of the offset vector is randomly selected so that the reported location could be anywhere within the obscuring distance of the known location, see Section 3.3 (Selecting a Offset Vector).
The uncertainty of the reported location is set to the obscuring distance. This ensures that the reported uncertainty region encloses the known location.
- Note:
- It's not sufficient to increase the uncertainty region so that it minimally includes the known location. Doing this reveals that the known location is at the boundary of the reported uncertainty region.
TOC |
A known location with uncertainty is reduced to a circular uncertainty region (see [I‑D.thomson‑geopriv‑uncertainty] (Thomson, M. and J. Winterbottom, “Representation of Uncertainty and Confidence in PIDF-LO,” May 2010.), Section 4.2). An irregularly shaped uncertainty region is difficult to evaluate against the scalar obscuring distance, and it might inadvertently reveal more information than intended.
A known location with uncertainty greater than the obscuring radius does not require additional obscuring. The radius of the circular uncertainty region is compared to the obscuring distance to determine if further obscuring is necessary. A location with sufficient uncertainty can be directly reported.
Randomization is needed if the known location contains insufficient uncertainty. As for a point location, an offset vector is added and the uncertainty increased to the obscuring distance. A smaller offset vector is necessary where the known location has uncertainty - this vector need only be of a size up to the obscuring distance, less the existing uncertainty.
The reported uncertainty is increased so that the reported location contains an uncertainty radius of at least the obscuring distance. An uncertainty in a known location cannot be recovered by a recipient of an obscured location unless it is larger than the obscuring distance.
Paradoxically, more accurate location determination methods are better suited to obscuring.
A location that is reported with uncertainty does not always have a uniform probability distribution. A non-uniform distribution is not conducive to obscuring, since a location with an unevently distributed probability distribution reveals that the location of the target is more likely to be in specific parts of the uncertainty region.
Information on the likely probability distribution cannot be conveyed in many systems, including presence (see [RFC4119] (Peterson, J., “A Presence-based GEOPRIV Location Object Format,” December 2005.), [RFC5491] (Winterbottom, J., Thomson, M., and H. Tschofenig, “GEOPRIV Presence Information Data Format Location Object (PIDF-LO) Usage Clarification, Considerations, and Recommendations,” March 2009.)). The location determination method can be reported, which can reveal characteristics of the probability distribution. Specific measures to counteract this effect are therefore not feasible.
Removing or replacing the location determination method parameter denies a recipient any information about probability distribution.
TOC |
There are two methods that can be used to generate a random vector. Both methods produce random vectors that are evenly distributed on the plane within the maximum size.
The angle and distance (polar) method is considerably simpler, but it is less well suited to the complete algorithm. The square peg method is more conducive to the interpolation used.
Both methods take two uniformly distributed random numbers as input.
TOC |
In the polar method, the first random value is used to select a random angle, the second to select a random distance.
Assuming a random() function produces a number distributed between 0 (inclusive) and 1 (exclusive) - that is, the range [0, 1) - the angle and length can be produced by the following:
angle = random() * 2 * pi length = sqrt(random()) * size or length = (1 - |random() - random()|) * size
...where sqrt(x) takes the square root of x and | takes the absolute value of the enclosed. size is the desired size of the random vector, which could be the obscuring distance less any existing uncertainty.
TOC |
In this method, the two random values are used to select a point in a 2x2 square between -1 and +1 on each axis. Vectors that are in the direction of a corner are reduced in length so that the total length of any vector is limited to 1 (as opposed to the square root of 2).
^y (-1,-1) | (1,1) +-------,,---+---..-------+ | ,-' | `-. | | ,' | `._,| | / | ,-'\ | |/ | _,-' \| | |,-' a | -----+------------+------------+---->x | | | |\ | /| | \ | / | | `. | ,' | | `-._ | _.-' | +--------``--+--''--------+ (-1,-1) | (1,-1)
The effect of this is that the probability of finding a value that is toward the corners of the square (angles of PI/4, 3PI/4, -PI/4, etc...) is twice the probability of finding a value along the axes (angles of 0, PI/2, PI, etc...). This can be corrected by applying the resulting cumulative distribution function to the angle.
x = random() * 2 - 1 y = random() * 2 - 1 length = sqrt(x*x + y*y) angle = atan2(y, x) a_side = round(angle * 2 / PI); a_rem = angle - a_side * PI / 2 length = length * cos(a_rem) * size angle = (tan(a_rem) / 8 + a_side / 4) * 2 * PI
...where atan2 produces the angle of the vector, round(x) produces the nearest whole number to x and the cosine and tangent functions are represented by cos(x) and tan(x) respectively.
This can be more efficiently calculated without trigonometric functions using:
x = random() * 2 - 1 y = random() * 2 - 1 length = max(|x|, |y|) * size if (x == 0) and (y == 0) >> return zero length vector if (|x| > |y|) angle = PI * y / x / 4 else angle = PI * (2 - x / y) / 4 if (y < -x) angle = angle + PI
...where max(x, y) chooses the more positive value of x and y.
TOC |
A recipient that is able to learn the state of the random number generator could use this to determine the offset vector. This would reveal the known location based on a given reported location. A secure pseudo-random number generator (Eastlake, D., Schiller, J., and S. Crocker, “Randomness Requirements for Security,” June 2005.) [RFC4086] provides an assurance that recovering the state of the random number generator is made considerably more difficult.
TOC |
Multiple applications of this algorithm produce different results. The intersection of multiple reported locations can be used to recover a better estimate of the known location. This recovered estimate has less uncertainty than the obscuring distance, which is not desirable.
Multiple reported locations for the same known location must not be produced. An entity that is responsible for obscuring location might achieve this by storing the reported location with the obscured location.
It is possible to implement obscuring for a static location without retaining state. Seeding a pseudo-random number generator with data that is not available to the recipient can ensure that the same result is produced from the same input. Taking a hash of the known location combined with a secret key ensures that this seed cannot be easily determined by a recipient (see Section 6.2 of [RFC4086] (Eastlake, D., Schiller, J., and S. Crocker, “Randomness Requirements for Security,” June 2005.) for alternative methods). A hash function that includes the values shown in Section 4.3.4 (Random Grid) might be sufficient for this task.
TOC |
Applications that use the location of a target over time, such as presence (Peterson, J., “A Presence Architecture for the Distribution of GEOPRIV Location Objects,” July 2005.) [RFC4079] require additional steps to ensure that the location a recipient acquires does not reveal more information than desired.
The first consideration is the frequency of updates. As the target moves, the known location changes. A continuous stream of reported locations could give a recipient sufficient information to determine the known location with low uncertainty in a fashion close to that described in Section 3.4 (Multiple Reported Locations).
- Note:
- It is not necessary to ensure that a recipient always has accurate location information. Early proposed algorithms wrongly assumed that the reported location was required to cover the known location at all times. Even in the absence of obscuring, changes in location result in a recipient having outdated information. The only necessary constraint is that the location be accurate at the time that it is reported (or the time associated with that report).
TOC |
To limit the amount of information provided to a recipient, new reported locations are not generated in response to all changes in the known location. The trigger for creating a new reported location can be defined.
Any trigger condition needs to be constructed in a way that does not reveal information. At the point that a new reported location is provided to a recipient, the fact that the trigger conditions are met at that point in time provides the recipient with significant information that could - if the trigger conditions were poorly defined - reveal significant information.
The goal is to provide a new reported location when the known location moves by approximately the obscuring distance. This limits the information that a recipient has available with similar accuracy to each individual location.
TOC |
One potential trigger is the movement of the target outside of the reported uncertainty region. At the point that a new reported location is generated, a recipient knows that the target is a) at the boundary of the last uncertainty region, and b) somewhere in the new uncertainty region. The intersection of these two regions produces an area that is significantly smaller than desired.
New Reported Location ..--"""--.. ..--"""--.. / .-' /=\. `-. ,' ,' \\ `. / / \\ \ / / \\ \ | | || | | | || | | | || | \ \ // / `. `. // \ .' `._ `._// \ _.' / `--..___..--' `--..__\..--' Last Reported \ Location Recovered Location Along Border
Figure 1: Trigger on Leaving the Reported Location |
Similarly, information is revealed if the trigger is movement based on the known location. A new reported location might be produced when the known location moves more than the obscuring distance from the known location from the last report.
That is, when a new location is reported, the corresponding known location is saved. A new reported location is determined when the current known location is more than the obscuring distance from the saved location.
If the recipient is able to assume that the target is moving in a straight line, the speed of the target is revealed.
TOC |
To limit the information that is revealed at the point that a new reported location is provided, the trigger conditions can be based on information that is not available to the recipient.
Applying randomization to the trigger reduces the ability of a recipient to make assertions about the significance of a new reported location.
A hidden trigger is established using the following process:
Each new reported location is randomized using the process described in Section 3 (Obscuring Static Locations).
This algorithm ensures that the centroid of the known location moves between 0.5 and 1.5 times the obscuring distance before a new reported location is produced. As a consequence, the uncertainty in the distance moved is equal to the obscuring distance.
TOC |
The obscuring method has a weakness that is as a direct consequence of the triggering conditions. These conditions grant a recipient this information:
For any two consecutive reported locations there is a pair of points that are less than 1.5 times the obscuring distance apart, with one point in the area described by each reported location. The first point is the known location at the time of the first reported location; the second point is the known location at the time of the second reported location.
At the time that a location is reported, the recipient can use this knowledge to determine that the current location of the target is at the intersection of the new reported location and a circle with a radius of 2.5 times the obscuring distance, centered on the last reported location, as shown in Figure 2 (Consecutive Reported Locations)
Known location . is in overlap \ Last \ \ New ,.--"--.. \ \ ,.--"--.. ,-' `-. \ |,-' `-. / \ \_ + \ | | /| | | o |<---------->|| o | | \ | --> 1.5OD \| | \ \ / + / `. \ ,' |`. ,' `-..___,.+' ; `-..___,.-' \ / |<------>|<----\->|<------>|<-/---->|<------>|<--... OD OD \ OD / OD OD \ ,' 2.5OD \ / \ _,' _\/' OD = obscuring _,,-' distance
Figure 2: Consecutive Reported Locations |
Two consecutive reported locations can have their centers up to 3.5 times the obscuring distance apart; making the closest points on each uncertainty region up to 1.5 times the obscuring distance apart. When consecutive reported locations are maximally distant, a recipient can recover the location of the target almost perfectly.
This relies on the recipient being able to determine the obscuring distance. As identified, the obscuring distance is not protected.
TOC |
This shortcoming can be addressed by reducing the difference between the random offset vector added to consecutive reported locations. The extreme case shown in Figure 2 (Consecutive Reported Locations) only arises because the absolute difference between the randomization vector used for in consecutive reported locations is twice the obscuring distance. The problem occurs when the difference between consecutive know locations approachs 1.5 times the obscuring distance in combination with this large difference between randomization vectors.
Reducing the amount that a offset vector can change between consecutive reported locations reduces. If the difference between offset vectors is constrained then the effect of this problem is reduced.
Using the same offset vector for all reported locations removes the problem entirely. However, using the same offset vector increases the chances of that vector being discovered. For instance, if the target is following a road, reported locations that have a fixed offset from the known location will reveal the shape of the road. From this it is trivial to learn the offset vector and hence all presence and past locations can be recovered.
Each time a location is randomized, the offset vector used can be the combination of a new random offset vector and the offset vector that was last used. The proportion of old and new vectors determines the trade-off between the probability that a recipient is able to learn a more accurate location with the probability that a recipient is able to learn the offset.
TOC |
It is more difficult to learn an offset vector if additional randomness is added to each new vector. An adversary that learsn a known location immediately has less information about subsequent known locations based on the amount of additional randomness. As long as the offset vector is able to change significantly as several locations are reported, learning a limited number of offset vectors is of limited use in recovering future known locations.
Too large a change in the offset vector increases the chances of revealing the known location to a small area. Too small a change provides an adversary that discovers a known location information more information about subsequent known locations. A trade-off is necessary.
The only way that the known location can be guaranteed to be unknown over the entire area is when the offset vector doesn't change at all. If the absolute difference in offset vectors is half the obscuring distance, in the worst case the recipient is able to determine the known location to be within 77 percent of the desired area. This varies based on o(diff), as follows:
a(diff) = ((1.5 + diff)^2 - 5.25) / (2*(1.5 + diff)) o(diff) = acos(a(diff)) + 6.25 * acos((1.5 + diff - a(diff)) / 2.5) - (1.5 + diff) * sqrt(1 - a(diff)^2)
...where acos(x) returns the inverse cosine of x. This only produces a result where diff is less than 2.
It might be useful in this case to create a offset vector that is no more than diff times the oscuring distance different to the previous vector. This might be done by taking a weighted average of the previous vector with a new random offset vector as follows:
o[new] = (o[prev] * (2 - diff) + o[random] * diff) / 2
...where o[new] is the new ofset vector, o[prev] is the new previous vector, and o[random] is a completely random vector of the same magnitude.
TOC |
A moving target might return to the same location several times. The method described thus far produces a different reported location each time. A recipient that is able to observe location over time could intersect reported locations to recover the known location as long as they make the assumption that the known location is the same each time.
This can be extended to reveal a path that is habitually followed in the same way. Each time the path is travelled, changing offset vectors eventually reveal a more accurate view of the path.
TOC |
The key to addressing this flaw is to have the randomization of offset vectors based on the known location. If the same known location produced a reported location that was equal or very close to it each time that the location was obscured, this would address the problem.
It might be possible to take the coordinates of the known location and pass them - along with a secret key - through a cryptographic hash function. The resulting bits could be used as randomness that produces an offset vector. This would ensure that the exact same location always produces the same random vector.
The drawback of this sort of method is that the location is obscured inconsistently when the known location changes even slightly. Such imprecision is commonplace in location determination methods, rendering this approach unsuitable.
The goal is to ensure that two known locations in close proximity produce a constant (or near almost constant) random offset vector. It is also desirable that the random vector change as the locations change. This has the consequence of reducing the difference in randomness between consecutive reported locations, provided that the random values do not vary significantly over shorter distances (see Section 4.2.1 (Reducing Variation between Offset Vectors)). The offset vector needs to change over a longer distance to limit the amount that an adversary benefits from learning both known and reported locations.
An approach similar to that described in [PERLIN] (Perlin, K., “An Image Synthesizer,” July 1985.) is used to achieve a continuously varying random field. In this, randomness is constrained to a grid of points with interpolation used to determine values for intervening points.
TOC |
No specific changes are required for the triggering process, though this does require that some state be maintained by the entity that performs obscuring. For a SIP entity that is maintaining a subscription, this is not expected to be onerous.
The advantage of having a specific trigger for providing a new reported location is that it reduces the information provided to a recipient. Providing updates at a higher rate provide a recipient with additional information that could be used to recover the offset.
TOC |
In selecting an appropriate grid with two dimensions, the curvature of the surface of the Earth presents a challenge. The simplest approach might be to select an origin at latitude 0, longitude 0. Grid points could be placed at increments based on a constant ratio between latitude and longitude and distance; for example, 9e-6 degrees per meter assumes a spherical planet of 6366197 meter radius, which is slightly smaller than the semi-major axis of the ellipsoid used in most Earth models.
For a two-dimensional grid with a multiple of m, the following equations identify the latitude and longitude of the four nearest grid points to a given location:
grid = m * obscuring distance * 9e-6 latitude[low] = floor(latitude / grid) * grid latitude[high] = latitude[low] + grid longitude[low] = floor(longitude / grid) * grid longitude[high] = longitude[low] + grid
...where floor(x) produces the nearest whole integer that is more negative than x.
Grid intervals can be set to a multiple of the obscuring distance that ensures that consecutive reported locations have continuously varying offset vectors. These vectors need to change at a rate that ensures maximum change over multiple reported locations without causing too much information to be revealed from two consecutive locations (as described in Section 4.2 (Consecutive Reported Locations)). Selecting a grid size is discussed in more detail in Section 4.3.5.3 (Selecting an Appropriate Grid Size).
The shortcoming of a grid of this nature is that changes in longitude are more rapid as locations get closer to the poles. At approximately 60 degrees of latitude (North or South), grid intervals on the East-West direction are twice as frequent as desired. For this reason, larger intervals between grid points might be chosen for longitudes.
A solution for this problem is described in Section 4.3.6 (The Wonky Grid). An alternative problem might select a local tangent plane removes the effect of the curvature of the Earth, though this introduces the problem of selecting an appropriate tangent plane as locations change and providing consistent transitions between different tangent planes.
In three dimensions, conversion to Earth-centered, Earth-fixed Cartesian coordinates renders this problem moot.
TOC |
At each of the points on the grid, a random offset vector is produced using the method described in Section 3.3.2 (Square Peg Method). Interpolation is used to produce the offset vector for points within each grid cell, as shown in Figure 3 (Grid Interpolation).
Rather than use a random number generator, random numbers are produced using a cryptographic hash function. The input to this hash might include:
The inclusion of a secret ensures that a recipient is unable to construct the offset vector. This secret is persistent so that later applications of the obscuring formula do not produce a different offset vector for the same location.
Section 3.3 (Selecting a Offset Vector) requires that multiple random numbers are produced. The additional identifier produces additional randomness where multiple random (or pseudo-random) numbers are required.
Using a hash in this fashion ensures that each target gets a different set of random offset vectors and that the same grid point coordinates produce the same result.
Though ordering need only be consistent between consequent applications of the obscuring algorithm, the following might be used to produce random bits:
random = HMAC(secret key, target identity | identifier | coordinate | coordinate | ...)
...where HMAC is the hash MAC function [RFC2104] (Krawczyk, H., Bellare, M., and R. Canetti, “HMAC: Keyed-Hashing for Message Authentication,” February 1997.) and | represents concatenation, which might require a delimiter to terminate variable length values.
Alternatively, the same sequence could be used to seed a secure pseudo-random number generator (Eastlake, D., Schiller, J., and S. Crocker, “Randomness Requirements for Security,” June 2005.) [RFC4086]. Extracting values in the same order makes the identifier unnecessary.
One consequence of this approach is that changes to the obscuring distance result in the noise pattern being completely changed. This can result in the same known location producing a significantly different reported location before and after the change.
TOC |
Once a grid of random offset vectors is established, an offset vector is calculated based on the centroid of the known location. Figure 3 (Grid Interpolation) shows a centroid at the point (x,y) and the values that are used in the interpolation process.
| | - ---o------------------------------o--- /| ^ /| (x1,y2) | | (x2,y2) | | | (y2-y) | | (x,y) | | | \ v | |<-(x-x1)->X<------(x2-x)----->| | ^ | | | (y-y1) | | v | - ---o------------------------------o--- /| /| (x1,y1) | (x2,y1) |
Figure 3: Grid Interpolation |
The offset vector at the identified point is produced by taking the weighted average of the offset vectors. Two weighted averages are taken between pairs of adjacent grid points along the same axis, then the weighted average of the two resulting vectors is taken along the other axis.
The following equations produce an linearly interpolated offset vector for any point in this grid cell:
tx = (x - x1) / (x2 - x1) ty = (y - y1) / (y2 - y1) w1 = o[x1,y1] * (1 - tx) + o[x2,y1] * tx w2 = o[x1,y1] * (1 - tx) + o[x2,y1] * tx offset = w1 * (1 - ty) + w2 * ty
...where o[x1,y1] is the random offset vector at the grid point (x1,y1).
TOC |
A consequence of performing a weighted average is that the resulting value is not uniformly distributed. Depending on the weighting factor (the value tx or ty in Section 4.3.5 (Linear Interpolation of Random Offsets)), the resulting probability distribution has a higher probability of producing values in the middle of the range of possible values.
For example, the probability distribution for a weighted average of two uniformly distributed random numbers between 0 and 1 is shown in Figure 4. The figure shows the case where t is less than 0.5, though the same distribution is produced for t and (1-t).
P(x) | | ,---------------. | /: :\ | / : : \ | / : : \ |/ : : \ '----+---------------+------ x 0 t (1-t) 1
Figure 4 |
In order to correct for this skewing of results toward the middle of the range, a smoothed interpolation is used.
Over the range from 0 to 1, the following produces a uniformly distributed interpolation between a and b:
r = a * (1 - t) + b * t IF r < t AND r < (1 - t) THEN: r = r * r / 2 / t / (1 - t) ELSE IF r > t AND r > (1 - t) THEN: r = 1 - (1 - r) * (1 - r) / t / (1 - t) ELSE IF t < 0.5 THEN: r = (2 * r - t) / 2 / (1 - t) ELSE: r = (2 * r - 1 + t) / 2 / t
This maps a linearly interpolated value to a smoothed value, using the cumulative distribution function for the weighted sum of a and b. This mapping produces a value between 0 and 1 for inputs between 0 and 1. The mapping is continuous. The mapping is not monotonically increasing for some values of a and b; the intent is to have a uniform distribution between 0 and 1, not between a and b.
For convenience, this interpolation function is represented in shorthand throughout the remainder of the document: uniformDistInterp(a, b, t).
Uniform interpolation alters the rate of change of the output. For a proportional movement in t of dt, the absolute change in output is at most:
dr = 1 - (1 - dt)^2
Toward the middle of the range, for values of a and b that are at the extents of the possible range and small values of dt, changes are magnified by up to two times their magnitude.
This interpolation function has similar characteristics to the smoothing function used in [PERLIN] (Perlin, K., “An Image Synthesizer,” July 1985.), except that the goal is not smoothing, but ensuring a uniform distribution of values in the output. Values are continuous, but their first derivative is not.
TOC |
The methods for producing random vectors described in Section 3.3 (Selecting a Offset Vector) produce a result that is uniformly distributed in a circular area. As a result, the cartesian coordinates produced are not evenly distributed on each axis. Similarly, the polar coordinates have a non-uniformly distributed magnitude. Rather than interpolate on the output of this process, the uniformly distributed interpolation is applied to the random inputs.
Interpolation is performed on a set of random numbers that are produced at each grid vertex. This is used to produce a single set of random numbers that are used as input to the random vector algorithm.
A consequence of this process with the simple polar method described in Section 3.3.1 (Angle and Distance Method) is that the angle of the random vector does not cross 360 degrees (2*pi) when being interpolated. In the worst case, interpolation between two points requires rotation through almost 360 degrees.
The alternative method of interpolating angles - linear interpolation using the shortest path - does produce an uniformly distributed output, but it also produces a discontinuity that could be exploited by a recipient when interpolation is applied in more than one dimension. It is possible to produce a change in the offset vector of up to twice the obscuring distance in size as the known location moves only a short distance.
The more complicated square peg method (Square Peg Method) results produces evenly distributed values without this problem.
TOC |
In the worst case, the polar method of generating a random vector in combination with uniformly distributed interpolation can result in twice the rate of rotation. Interpolation through a complete 360 degrees results in a maximum absolute change of:
d[p] = 2 * sin(pi * dr)))
...where dr is the distance moved as a proportion of the obscuring distance, which is no more than 0.5.
Using the maximum value from Section 4.3.5.1 (Uniformly Distributed Interpolation), the number of multiples required to limit movement can be calculated using:
m[p] = 1.5 / (1 - sqrt(1 - asin(d[p] / 2) / pi))
For an absolute change in the random vector of no more than the obscuring distance, the grid needs to be at least 17.22 multiples of the obscuring distance. If the absolute change is only half this amount, the grid needs to be larger, at 36.53 multiples of the obscuring distance.
Using such a large grid to deal with a low probability case is suboptimal. The square peg method allows for a much smaller grid, with a maximum absolute change being dependent on only the increased rate of change produced by the interpolation method:
d[sp] = 2 * dr = 2 * (1 - (1 - 1.5 / m[sp])^2) m[sp] = 1.5 / (1 - sqrt(1 - d[sp] / 2))
This means that a grid of 5.12 times the obscuring distance limits absolute difference in the offset vector to obscuring distance; a grid of 11.20 times the obscuring distances limits the difference to half.
Selecting a grid size at 8 times the obscuring distances ensures that the absolute change in offset vector is 0.680 times the obscuring distance. A complete change in offset vector can then occurs after linear movement of only 8 times the obscuring distance. In the worst case, movement reveals a location within 66.0% of the area of a circle with a radius of the obscuring distance.
TOC |
To address the concerns caused by the curvature of the Earth, a modified grid-like structure can be used. It is not strictly necessary that the grid be absolutely grid-like in structure. Therefore, it's possible that different grid intervals could be selected.
This structure uses a different interval for points at different latitudes, at the selected low latitude:
grid[llat] = grid / cos(latitude[low]) longitude[low,llat] = floor(longitude / grid[llat]) * grid[llat] longitude[high,llat] = longitude[low,llat] + grid[llat]
...and at the high latitude:
grid[hlat] = grid / cos(latitude[high]) longitude[low,hlat] = floor(longitude / grid[hlat]) * grid[hlat] longitude[high,hlat] = longitude[low,hlat] + grid[hlat]
...where cos(x) produces the cosine of x.
This produces fewer grid points for latitudes that are further from the Equator. At the poles (and above), a single offset vector is sufficient.
Interpolation of these points uses four distinct points, as shown in Figure 5 (Wonky Grid Interpolation).
(x-x1_2) (x2_2-x) |<-------->|<----------------->| | | | - ---o------------------------------o--- - /| | ^ |\ (x1_2,y2) : | | : (x2_2,y2) | | (y2-y) (x,y) ' | \ v X - ------ ^ : . : | (y-y1) | | | v - ---o---------------------------o--------------- - /| | |\ (x1_1,y1) |<--------------------->|<->| (x2_1,y1) (x-x1_1) (x2_1-x)
Figure 5: Wonky Grid Interpolation |
Linear interpolation uses the amended equations:
tx_1 = (x - x1_1) / (x2_1 - x1_1) w1 = uniformDistInterp(r[x1_1,y1], r[x2_1,y1], tx_1) tx_2 = (x - x1_2) / (x2_2 - x1_2) w2 = uniformDistInterp(r[x1_2,y2], r[x2_2,y2], tx_2)
Note that this uses the uniformly distributed random values selected at each grid point, rather than the offset vectors. Each random value is a uniformly distributed random value in the range [0, 1).
TOC |
At 90 degrees North and South, the cosine used to determine the wonky grid produces a zero. This produces an undefined grid spacing.
To avoid this problem, produce a single value at each pole: (90, 0) and (-90, 0). This value replaces w1 or w2 in the interpolation equations. Retaining the same weighting (that is, ty) for determining the final offset is desirable, so that the rate of change is not artificially increased.
TOC |
At 180 degrees East (or West), longitude values cross from positive to negative values. This produces a discontinuity in the values used. This could be exploited to learn when the known location cross the 180th meridian.
180/-180 180/-180 | | +ve: x1a | x2a x1a | x2a ---o---o--X--------o---o--- ... ---o---o--X--------o---o--- x1b | x2b x1b | x2b . | . . | . | | | | |<------------->| |<------------->| overlap = grid interval overlap = grid interval
Figure 6: Interpolation About 180 Degrees |
This problem might only manifest for one of the two interpolations performed across changing longitude values in a wonky grid. To address this, the values produced by the negative and positive aspects are independently generated, then these values are interpolated over a span of one grid interval.
For any point within half of one grid interval from the 180th meridian, this algorithm is used. Perform interpolation using the selected grid points, then add or subtract 360 degrees from the original value to get a value that is either more than 180 degrees or less than -180 degrees. Perform interpolation on this second point.
The two interpolated values are then interpolated using a different proportion. This interpolation is taken on the overlap interval that crosses the 180th meridian, as shown in the Figure 6 (Interpolation About 180 Degrees). This proportion is produced by taking the positive input value (that is, the longitude value, with 360 degrees added if it is negative) and applying the following:
grid = m * obscuring distance * 9e-6 / cos(latitude) IF longitide + grid / 2 > 180 OR longitude - grid / 2 < -180 THEN: t = ((longitude + 360) % 360 - 180 - grid / 2) / grid random[o] = uniformDistInterp(random[+ve], random[-ve], t) ENDIF
...where % represents the modulo operation. The final interpolated value is determined using the uniformly distributed weighted average method described in Section 4.3.5.1 (Uniformly Distributed Interpolation).
TOC |
Providing different values over time is difficult to balance against the need to obscure the same location in the same way. It is possible to add additional dimensions upon which to interpolate the offset vector. Adding time as one such dimension would allow the offset vector to change gradually over time as well as with respect to space.
A form of temporal interpolation might allow the obscuring entity to change the secret key that it maintains over time. However, this does not provide positional stability unless the interpolation is performed over a period that is significantly longer than the period over which the known location might return to the same location. Changing the offset vector applied to the same location would negate much of the benefit derived from the algorithm.
In practice, the period over which the offset would change would have to be significantly longer than the time taken for all potential visited locations to completely change in all aspects. This implies that temporal interpolation is likely only useful on geological time scales.
TOC |
Obscuring a known location at latitude -34.401072, longitude 150.636361 with 100 meter obscuring distance first requires calculation of the grid size and the grid points:
gridsize = 8 * obscuring_distance * 9e-6 = 0.0072
Once that is determined, the two latitude values used for the grid are determined:
lowlat = floor(-34.401072 / 0.0072) * 0.0072 = -34.4016 highlat = lowlat + 0.0072 = -34.3944
For each latitude value, two longitude values are determined using a modified grid size to find the final set of of grid points:
- Note:
- Intermediate values in this example are rounded for presentation purposes.
grid[lowlat] = gridsize / cos(lowlat * pi / 180) = 0.0087262 lowlng[lowlat] = floor(150.636361 / 0.0087262) * 0.0087262 = 150.632339 highlng[lowlat] = lowlng[lowlat] + 0.0087262 = 150.641066 grid[highlat] = 0.0087255 lowlng[highlat] = 150.628105 highlng[highlat] = 150.636831
This gives a set of points for which random values are produced. The actual random values used depend on many factors (see Section 4.3.4 (Random Grid)). The following values are used in this example:
random[-34.4016, 150.632339] = 0.4228538586758077 random[-34.4016, 150.641066] = 0.9430289615411311 random[-34.3944, 150.628105] = 0.9174296103883535 random[-34.3944, 150.636831] = 0.008725488356129405
The random values are interpolated along the same latitude using a t value that is based on the distance from the corresponding low longitude value. The two resulting values are interpolated along the same longitude using a t value that is based on the distance from the low latitude value. Uniformly distributed interpolation is used in both cases.
t[-34.4016] = (150.636361 - 150.632339) / 0.0087262 = 0.460866 r[-34.4016] = 0.770898 t[-34.3944] = (150.636361 - 150.628105) / 0.0087255 = 0.946145 r[-34.3944] = 0.440578 t = (-34.401072 - -34.4016) / 0.0072 = 0.0733055 r = 0.7661978449732944
This first random value is used for the x component. A second random value for y is chosen using the same process, producing 0.16585607985072537.
These values are then input into the square peg algorithm:
d = 100 * max(0.7661978449732944, 0.16585607985072537) = 76.61978449732944 -- since |x| > |y| a = y * pi / x / 4 = 5.3380813420741795 -- no further change since y > -x
Therefore, the location is moved 76.62 meters on a bearing of 305.84 degrees. The resulting reported location is moved along the local tangent plane to {-34.400719, 150.635772} and a circle of 100 meter radius is described.
Finally, a random point is chosen within 50 meters of the original point. No more location is provided until the known location moves more than 100 meters from that point. In this case, the trigger point is set to {-34.401388, 150.636471}. If the known location is updated to {-34.401816, 150.636361}, no new location is reported.
Moving to {-34.400621, 150.635717} is more than 100 meters from the trigger point, even though this is very close to the last reported location. This results in a new location being reported at {-34.400346, 150.634929}.
TOC |
Thanks go to Robert Sparks for identifying key shortcomings in early attempts to obscure location. Richard Barnes, Jorge Cuellar, Cullen Jennings, Warren Kumari, and Hannes Tschofenig variously provided input, feedback, criticisms and insightful ideas.
TOC |
This document has no IANA actions.
[RFC Editor: please remove this section prior to publication.]
TOC |
This document describes a method for obscuring location. An effort has been made to ensure that reported locations do not reveal any more information than the input dictates. However, obscuring location is not a substitute for withholding location information if the goal is to ensure that a recipient remains ignorant of the known location. Alternatively, a recipient might be provided with completely falsified location information.
There is little point in obscuring location when other location-related information is included in a composite document, like a presence document (Sugano, H., Fujimoto, S., Klyne, G., Bateman, A., Carr, W., and J. Peterson, “Presence Information Data Format (PIDF),” August 2004.) [RFC3863]. Removing other information, such as dynamic location information (Shafranovich, Y., Levine, J., and M. Kucherawy, “An Extensible Format for Email Feedback Reports,” August 2010.) [RFC5965] is necessary to ensure that this cannot be used to recover the known location.
A reported location can inadvertently reveal far more information than intended to a recipient in possession of additional information. A recipient might be able to apply this additional information to determine the location of the target with less uncertainty than desired. Additional information includes information about the reported location or information about the Target.
For instance, a recipient with a map might be able to identify areas on that map that a target is more likely to be found. A recipient can combine any additional information with the knowledge that the reported location is correct at the time it is reported to recover a better estimate of the known location. Aside from map-based data, other information that could be used to acquire a more accurate estimate of the location of a target might include knowledge of the target's past behavior, personality traits, or aggregated demographic data.
Increasing the obscuring distance might increase the uncertainty in the location that a recipient with additional information can ultimately recover. The complexity involved and the large volume of additional data involved makes more specific measures difficult.
TOC |
[I-D.ietf-geopriv-arch] | Barnes, R., Lepinski, M., Cooper, A., Morris, J., Tschofenig, H., and H. Schulzrinne, “An Architecture for Location and Location Privacy in Internet Applications,” draft-ietf-geopriv-arch-03 (work in progress), October 2010 (TXT). |
[I-D.ietf-geopriv-policy] | Schulzrinne, H., Tschofenig, H., Morris, J., Cuellar, J., and J. Polk, “Geolocation Policy: A Document Format for Expressing Privacy Preferences for Location Information,” draft-ietf-geopriv-policy-22 (work in progress), October 2010 (TXT). |
[I-D.thomson-geopriv-uncertainty] | Thomson, M. and J. Winterbottom, “Representation of Uncertainty and Confidence in PIDF-LO,” draft-thomson-geopriv-uncertainty-05 (work in progress), May 2010 (TXT). |
[PERLIN] | Perlin, K., “An Image Synthesizer,” ACM SIGGRAPH Computer Graphics v.19 n.3, p.287-296, July 1985. |
[RFC2104] | Krawczyk, H., Bellare, M., and R. Canetti, “HMAC: Keyed-Hashing for Message Authentication,” RFC 2104, February 1997 (TXT). |
[RFC3863] | Sugano, H., Fujimoto, S., Klyne, G., Bateman, A., Carr, W., and J. Peterson, “Presence Information Data Format (PIDF),” RFC 3863, August 2004 (TXT). |
[RFC4079] | Peterson, J., “A Presence Architecture for the Distribution of GEOPRIV Location Objects,” RFC 4079, July 2005 (TXT). |
[RFC4086] | Eastlake, D., Schiller, J., and S. Crocker, “Randomness Requirements for Security,” BCP 106, RFC 4086, June 2005 (TXT). |
[RFC4119] | Peterson, J., “A Presence-based GEOPRIV Location Object Format,” RFC 4119, December 2005 (TXT). |
[RFC5491] | Winterbottom, J., Thomson, M., and H. Tschofenig, “GEOPRIV Presence Information Data Format Location Object (PIDF-LO) Usage Clarification, Considerations, and Recommendations,” RFC 5491, March 2009 (TXT). |
[RFC5965] | Shafranovich, Y., Levine, J., and M. Kucherawy, “An Extensible Format for Email Feedback Reports,” RFC 5965, August 2010 (TXT). |
TOC |
This javascript implements the obscuring algorithm.
/** * Location obscurer: * var f = new GeoShape.Fuzzer(100, secret, target); * var reported = f.fuzz(known); * This object retains state. */ GeoShape.Fuzzer = function(dist, secret, targetIdentity) { this.distance = dist; var key = Hash.HMAC(secret, targetIdentity, Hash.SHA1); this.random = new GeoShape.UIRandom(key, dist); this.trigger = null; this.used = 0; return this; }; GeoShape.Fuzzer.prototype = { /** * Main obscuring function. * @param {GeoShape} a shape * @returns {GeoShape.GeoCircle} a fuzzed circle */ fuzz: function(shape) { var cu = shape.to2d().calculateCentroid(); /* * cu contains two attributes: * centroid: a WGS84 point uncertainty: a distance in metres */ if (!cu.uncertainty) { cu.uncertainty = 0; } if (this.hasMoved(cu.centroid)) { var addunc = Math.max(0, this.distance - cu.uncertainty); var centre = this.fuzzPoint(cu.centroid, addunc); var unc = Math.max(cu.uncertainty, this.distance); this.fuzzed = new GeoShape.GeoCircle(centre, unc); var td = this.distance / 2; this.trigger = this.randomize(cu.centroid, td); this.used = 0; } this.used++; return this.fuzzed; }, /** * Determine if the location has moved sufficient distance * from the trigger to require fuzzing. */ hasMoved: function(centroid) { if (!this.trigger) { return true; } return this.trigger.distanceTo(centroid) > this.distance; }, /** * Use a continuously varying random grid to move a point. */ fuzzPoint: function(point, dist) { this.random.reset(); var x = this.random.next(point.lat, point.lng) * 2 - 1; var y = this.random.next(point.lat, point.lng) * 2 - 1; if (x === 0 && y === 0) { return point; } var d = dist * Math.max(Math.abs(x), Math.abs(y)); var a; if (Math.abs(x) > Math.abs(y)) { a = y / x; } else { a = 2 - x / y; } if (y < -x) { a += 4; } return point.movePoint(d, a * Math.PI / 4); }, /** * Move a point randomly (polar method). */ randomize: function(point, dist) { var d = Math.sqrt(Math.random()) * dist; var a = Math.random() * 2 * Math.PI; return point.movePoint(d, a); } }; /** * A uniformly distributed pseudorandom number generator that * produces the same value for the same key, location and * grid size. * * @param secret a unique, secret key sequence * @param gridSize the desired size of the grid, in metres */ GeoShape.UIRandom = function(secret, gridSize) { this.key = secret; this.grid = 8 * gridSize * 9e-6; this.reset(); return this; }; GeoShape.UIRandom.prototype = { /** * Get next pseudorandom value for a latitude and longitude. */ next: function(lat, lng) { var lowlat = Math.floor(lat / this.grid) * this.grid; var bottom = this.interpLongitude(lowlat, lng); var top = this.interpLongitude(lowlat + this.grid, lng); var tlat = (lat - lowlat) / this.grid; this.rCount++; /* next time produces a different answer */ return this.uniformDistInterp(bottom, top, tlat); }, reset: function() { this.rCount = 0; }, /* Takes a point and produces a "random" value. */ hashRandom: function(lat, lng) { /* need to fix the lat and lng: 7 decimal places */ var flat = Math.round(lat * 1e7).toString(); var flng = Math.round(lng * 1e7).toString(); var input = [].concat(this.rCount, UTF8(flat), 0xff, UTF8(flng)); var h = Hash.HMAC(this.key, input, Hash.SHA1); var r = 0; for (var i = 0; i < h.length; ++i) { r ^= h[i] << ((i % 4) * 8); } /* add 0.5 to deal with sign bit */ return r / Math.pow(2, 32) + 0.5; }, /* interpolate a and b using t, with a uniform distribution */ uniformDistInterp: function(a, b, t) { var r = a * (1 - t) + b * t; if (r < t && r < (1 - t)) { r = r * r / 2 / t / (1 - t); } else if (r > t && r > (1 - t)) { r = 1 - (1 - r) * (1 - r) / 2 / t / (1 - t); } else { r = 0.5 + (r - 0.5) / Math.max(t, 1 - t); } return r; }, interpLongitude: function(lat, lng) { if (Math.abs(lat) >= 90) { return this.hashRandom((lat > 0) ? 90 : -90, lng); } var size = this.grid / Math.cos(lat * Math.PI / 180); if ((lng - size / 2) < -180 || (lng + size / 2) > 180) { var lngpos = (lng + 360) % 360; var rpos = this.interpLongSimple(lat, lngpos, size); var lngneg = lngpos - 360; var rneg = this.interpLongSimple(lat, lngneg, size); var t = ((lng + 360) % 360 - 180 - size / 2) / size; return this.uniformDistInterp(rpos, rneg, t); } return this.interpLongSimple(lat, lng, size); }, interpLongSimple: function(lat, lng, size) { var lowlng = Math.floor(lng / size) * size; var rlow = this.hashRandom(lat, lowlng); var rhigh = this.hashRandom(lat, lowlng + size); var t = (lng - lowlng) / size; return this.uniformDistInterp(rlow, rhigh, t); } };
TOC |
Martin Thomson | |
Andrew Corporation | |
Andrew Building (39) | |
Wollongong University Campus | |
Northfields Avenue | |
Wollongong, NSW 2522 | |
AU | |
Phone: | +61 2 4221 2915 |
Email: | martin.thomson@andrew.com |