|
This document describes an IMAP protocol extension enabling server to perform searches with inexact matching and assigning relevancy scores for matched messages.
A revised version of this draft document will be submitted to the RFC editor as a Proposed Standard for the Internet Community. Discussion and suggestions for improvement are requested, and should be sent to morg@ietf.org.
This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.
This Internet-Draft will expire on July 27, 2010.
Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the BSD License.
In examples, "C:" indicates lines sent by a client that is connected to a server. "S:" indicates lines sent by the server to the client.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [Kwds] (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.).
When humans perform searches in IMAP clients, they typically want to see the most relevant search results first. IMAP servers are able to do this in the most efficient way when they're free to internally decide how searches should match messages. This document describes a new SEARCH=FUZZY extension that provides such functionality.
FUZZY search key takes another search key as its argument. Server is allowed to perform all matching in an implementation-defined manner for this search key. Typically this would be used to search for strings, for example:
C: A01 SEARCH FUZZY (SUBJECT "IMAP break")
S: * SEARCH 1 5 10
S: A01 OK Search completed.
Besides matching messages with subject "IMAP break", the above search may also match messages with subjects "broken IMAP", "IMAP is broken", or anything else the server decides that might be a good match.
Servers SHOULD assign a search relevancy score for each matched message when the FUZZY search key is given. Relevancy scores are given in range 1-100, where 100 is the highest relevancy. The relevancy scores SHOULD use the full 1-100 range, so that clients can show them to users in a meaningful way, such as a percentage value.
As the name already tells, relevancy scores specify how relevant to the search the matched message is. It's not necessarily the same as how precisely the message matched. For example a message whose subject matches fuzzily the search string might get a higher relevancy score than a message whose body had the exact string in the middle of a sentence.
If server advertises the ESEARCH capability as defined by [ESEARCH] (Melnikov, A. and D. Cridland, “IMAP4 Extension to SEARCH Command for Controlling What Kind of Information Is Returned,” November 2006.), the relevancy scores can be retrieved using the new RELEVANCY return option for SEARCH:
C: A02 SEARCH RETURN (RELEVANCY ALL) FUZZY TEXT "Helo"
S: * ESEARCH (TAG "A02") ALL 1,5,10 RELEVANCY (4 99 42)
S: A02 OK Search completed.
The RELEVANCY return option MUST NOT be used unless FUZZY search key is also given.
Fuzzy matching is not limited to just string matching. All search keys SHOULD be matched fuzzily, although what exactly that means for different search keys is left up to server implementations to decide -- including deciding that fuzzy matching is meaningless for a particular key, and falling back to exact matching. Some suggestions are given below.
Dates: A typical example could be when a user wants to find a message "from Dave about a week ago". A client could perform this search using SEARCH FUZZY (FROM "Dave" SINCE 21-Jan-2009 BEFORE 24-Jan-2009). Server could return messages outside the specified date range, but the further away the message is, the lower the relevancy score.
Sizes: These should be handled similar to dates. If a user wants to search for "about 1 MB attachments", the client could do this by sending SEARCH FUZZY (LARGER 900000 SMALLER 1100000). Again the further away the message size is from the specified range, the lower the relevancy score.
Flags: Server could return messages that don't have the specified flags, but with a lower relevancy score.
UIDs, sequences, modification sequences: These are examples of keys for which exact matching is probably what makes sense. Alternatively, a server might choose, for instance, to expand a UID range by 5% on each side.
If server advertises the SORT capability as defined by [SORT] (Crispin, M. and K. Murchison, “Internet Message Access Protocol - SORT and THREAD Extensions,” June 2008.), the results can be sorted by the new RELEVANCY sort criteria:
C: A03 SORT (RELEVANCY) UTF-8 FUZZY SUBJECT "Helo"
S: * SORT 5 10 1
S: A03 OK Sort completed.
The message with the highest score is returned first. As with RELEVANCY return option, RELEVANCY sort criteria MUST NOT be used unless FUZZY search key is also given.
If server advertises the ESORT capability as defined by [CONTEXT] (Cridland, D. and C. King, “Contexts for IMAP4,” July 2008.), the relevancy scores can be retrieved using the new RELEVANCY return option for SORT:
C: A04 SORT RETURN (RELEVANCY ALL) (RELEVANCY) FUZZY TEXT "Helo"
S: * ESEARCH (TAG "A04") ALL 5,10,1 RELEVANCY (99 42 4)
S: A04 OK Sort completed.
The following syntax specification uses the augmented Backus-Naur Form (BNF) as described in [ABNF] (Crocker, D., Ed. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” January 2008.). It includes definitions from [RFC3501] (Crispin, M., “INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1,” March 2003.), [IMAP‑ABNF] (Melnikov, A. and C. Daboo, “Collected Extensions to IMAP4 ABNF,” April 2006.) and [SORT] (Crispin, M. and K. Murchison, “Internet Message Access Protocol - SORT and THREAD Extensions,” June 2008.).
capability =/ "SEARCH=FUZZY" score = 1*3DIGIT ;; (1 <= n <= 100) score-list = "(" [score *(SP score)] ")" search-key =/ "FUZZY" SP search-key search-return-data =/ "RELEVANCY" SP score-list ;; Conforms to <search-return-data>, from [IMAP-ABNF] search-return-opt =/ "RELEVANCY" ;; Conforms to <search-return-opt>, from [IMAP-ABNF] sort-key =/ "RELEVANCY"
This document is believed not to have any security implications.
IMAP4 capabilities are registered by publishing a standards track or IESG approved experimental RFC. The registry is currently located at:
http://www.iana.org/assignments/imap4-capabilities
This document defines the X-DRAFT-I02-SEARCH=FUZZY [anchor7] (Note to RFC Editor: fix before publication) IMAP capability. IANA is requested to add it to the registry.
Alexey Melnikov, Zoltan Ordogh and Barry Leiba have helped with this document.
[ABNF] | Crocker, D., Ed. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” RFC 5234, January 2008. |
[CONTEXT] | Cridland, D. and C. King, “Contexts for IMAP4,” RFC 5267, July 2008. |
[ESEARCH] | Melnikov, A. and D. Cridland, “IMAP4 Extension to SEARCH Command for Controlling What Kind of Information Is Returned,” RFC 4731, November 2006. |
[IMAP-ABNF] | Melnikov, A. and C. Daboo, “Collected Extensions to IMAP4 ABNF,” RFC 4466, April 2006. |
[Kwds] | Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” RFC 2119, March 1997. |
[RFC3501] | Crispin, M., “INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1,” RFC 3501, March 2003 (TXT). |
[SORT] | Crispin, M. and K. Murchison, “Internet Message Access Protocol - SORT and THREAD Extensions,” RFC 5256, June 2008. |
Timo Sirainen | |
Email: | tss@iki.fi |