Applications Area Working Group | F.E. Echtler |
Internet-Draft | Munich Univ. of Applied Sciences |
Intended status: Informational | March 07, 2011 |
Expires: September 08, 2011 |
GISpL: Gestural Interface Specification Language
draft-echtler-gispl-specification-00
This document introduces GISpL, the Gestural Interface Specification Language. GISpL enables the unambiguous description of gestures used in human-computer interfaces. This includes gestures on touch and multi-touch screens, with digital pens, with hand-held controllers or in free air. A matching engine analyzes motion data produced by the input device(s) and triggers registered gestures.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 08, 2011.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
This document describes GISpL (Gestural Interface Specification Language), a formal language for describing human-computer interfaces which use gestures. The term "gesture" is used in a very wide sense here, meaning any motion of the user which can be captured by an input device.
As novel types of human-computer interfaces such as multi-touch screens, digital pens, hand-held controllers or even free-air gestures become more and more common, so does the number of special-purpose applications built to interact with these devices.
Using a dedicated formal language to describe the various types of gestural interaction which are possible with an application has advantages for various distinct groups:
GISpL serves two purposes:
Moreover, GISpL should be usable across a very wide range of platforms, even unconventional ones. One important example is the Firefox web browser which has recently acquired the capability to deliver multi-touch events to web applications.
Consequently, the requirements regarding GISpL are:
Therefore, the decision was made to base GISpL on JSON [RFC4627]. Compared to XML, JSON gives a better balance between readability for humans and code size. It is supported in nearly all programming languages and platforms.
GISpL consists of three core elements: regions, gestures and features.
For the full formal ABNF specifictation, please see the appendix. This section will use a relaxed syntax for easier readability, with rule names enclosed by angle braces < >. JSON-related markup ( { } , [ ] "" ) will be reproduced verbatim.
In the context of GISpL, the terms input object and input event will be used. An input object is any physical object which is detectable by the input hardware, such as a mouse, a Wiimote or the user's hand. Input objects are classified into one of 32 categories as given in the appendix. Every input object is given a unique ID number according to the capabilities of the hardware. I.e., a uniquely identifiable tangible object always has the same ID, while anonymous touch points on an interactive surface will have IDs so that each touch point is uniquely identifiable during the duration of the touch. An input event is a single spatial measurement regarding the position (and optionally, orientation, dimensions etc.) of an input object as captured by one or more of the sensors used.
Regions define spatial areas in which a certain set of gestures is valid and which capture motion data that falls within their boundaries. Regions are defined in one of two reference coordinate systems:
Regions are managed in an ordered list. Arriving input events are checked against this list, starting from the first element. If the position of the input event falls within the boundaries of the region and if the bit in the filter bitmask which corresponds to the category of the input event is set, the the input event is captured by the region. Captured events and their history are subsequently used to check whether one or more of the gestures attached to this region have been triggered.
point = [ <number>, <number>, <number> ] pointlist = [ null/<point> *( , <point> ) ] regionflags = "poly" / "hull" region = { "id":<string>, "flags":<regionflags>, "filters":<number>, "points":<pointlist>, "gestures":<gesturelist> }
Gestures appear in two variants: either as gesture specifications (describing what motions the user has to execute for a certain effect) or as gesture events (describing a) the fact that a matching motion event just took place and b) the motion data which triggered the event).
Whether a gesture object is a gesture event can be determined from looking at the flags. Should they contain the string "result", then it is a gesture event and the features composing this gesture will contain valid data in their "result" array. Otherwise, the object is a gesture specification and the features will contain valid data in their "constraints" array.
When a gesture has the "oneshot" flag, then it can only be triggered once by a given set of input IDs. Repeated triggering is only possible when the set of IDs captured by the containing region changes.
When a gesture has the "default" flag set, it is added to a pool of default gestures. When a gesture specification with an empty feature list is encountered, then the name is looked up in the default gesture pool and if a match is found, the feature list is copied from the default gesture. Otherwise, the empty gesture is ignored.
gesturelist = [ null/<gesture> *( , <gesture> ) ] gestureflag = "oneshot" / "default" / "result" gestureflags = [ null/<gestureflag> (* , <gestureflag> ) ] gesture = { "name":<string>, "flags":<gestureflags>, "features":<featurelist> }
are the building blocks of gestures and are atomic mathematical properties of the raw motion data, such as the average motion vector or the diameter of the convex hull of all motion points.
featurelist = [ null/<feature> *( , <feature> ) ] featureitem = <point>/<number> itemlist = [ null/<featureitem> *( , <featureitem> ) ] feature = { "type":<string>, "filters":<number>, "constraints":<itemlist>, "result":<itemlist> }
A feature is classified by its type, a filter bitmask for input events as described in Section 2.2, a type-specific list of constraints and a type-specific list of results. Depending on whether the containing gesture is a "result" gesture or not (see Section 2.3), either the constraint list or the result list will be empty. All temporal relations (such as motion vector) are expressed relative to a time unit of one sensor frame, i.e. for a sensor setup running at 60 Hz, the temporal unit will be 16.66 ms.
Features are divided into two groups. The first one are single-match features, which will only generate a single result instance regardless of the number of input objects.
Feature Type | Constraint Types | Result Types | Description |
---|---|---|---|
Motion | <point>, <point> | <point> | Average motion vector, lower and upper limits per component |
Rotation | <number>, <number> | <number> | Relative rotation angle in rad, lower and upper limits |
Scale | <number>, <number> | <number> | Relative size change, lower and upper limits |
Path | <point> *(, <point>) | <number> | Match for template path |
Count | <number>, <number> | <number> | Number of input objects, lower and upper limits |
Delay | <number>, <number> | <number> | Number of frames since first object entered, lower and upper limits |
Single-Match Features
The second group of features are multi-match features, which will potentially generate one result instance for each input object (depending on the constraint values).
Feature Type | Constraint Types | Result Types | Description |
---|---|---|---|
ObjectID | <number>, <number> | <number> | Match range of unique IDs (e.g. for tangible objects) |
ObjectParent | <number>, <number> | <number> | Match range of unique parent IDs (e.g. for fingers belonging to specific user) |
ObjectPosition | <point>, <point> | <point> | Position, lower and upper limits per component |
ObjectDimension | <point>, <point>, <point>, <point>, <number>, <number> | <point>, <point>, <number> | Object dimensions, lower and upper limits per component (major axis, minor axis, size) |
ObjectGroup | <number>, <number>, <number> | <number>, <point> | Match group of input objects (min. count, max. count, max. radius). Result = count/centroid |
Multi-Match Features
This section details the behavior of the gesture matching engine.
The following steps are performed every time an input event has been received:
The following steps are performed every time after a full set of input events has been received:
For illustration purposes, this section gives some usage examples of GISpL.
TBD
Gesture events, if intercepted while in transit over the network, may disclose private data about the users' actions. Consequently, they should preferably be transmitted over secure channels such as private networks, VPNs or SSL-encrypted links.
This document has no actions for IANA.
[RFC5234] | Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. |
[RFC4627] | Crockford, D., "The application/json Media Type for JavaScript Object Notation (JSON)", RFC 4627, July 2006. |
The following formal specification of GISpL is given in ABNF [RFC5234]. JSON-related rule definitions (e.g. <number>, <string> ...) are to be found in [RFC4627].
quote = quotation-mark point = begin-array number value-separator number value-separator number end-array item = point / number itemlist = begin-array item *( value-separator item ) end-array pointlist = begin-array point *( value-separator point ) end-array gesturelist = begin-array gesture *( value-separator gesture ) end-array featurelist = begin-array feature *( value-separator feature ) end-array filters = quote "filter" quote name-separator number value-separator regionid = quote "id" quote name-separator string value-separator regionflags = quote "flags" quote name-separator ( ( quote "poly" quote ) / ( quote "hull" quote ) ) value-separator gesturename = quote "name" quote name-separator string value-separator gestureflag = ( quote "oneshot" quote ) / ( quote "default" quote ) gestureflags = quote "flags" quote name-separator begin-array gestureflag *( value-separator gestureflag ) end-array value-separator featuretype = quote "type" quote name-separator string value-separator results = quote "result" quote name-separator itemlist constraints = quote "constraints" quote name-separator itemlist value-separator feature = begin-object featuretype constraints results end-object gesture = begin-object gesturename gestureflags featurelist end-object region = begin-object regionid regionflags filters pointlist value-separator gesturelist end-object
TBD - see TUIO 2.0 spec