Re: [wcsplus] more on asynchronous response

To: Paolo Mazzetti <mazzetti@xxxxxxxxxxx>
Subject: Re: [wcsplus] more on asynchronous response
From: Ethan Davis <edavis@xxxxxxxxxxxxxxxx>
Date: Tue, 23 Oct 2007 18:11:03 -0600

Hi Paolo,

Paolo Mazzetti wrote:

Hi Ethan,
I am trying to summarize our respective positions and find a commonpoint of view useful to finalize a discussion paper. These are mineopinions and temptative conclusion. Since I think that these issuesconcern too much technical details for the mailing-list, I send mythoughts directly to you (and Stefano in cc).

I hope you don't mind, I'm CCing the list because I think a number ofothers would be interested in these details. Also, the other reason forthe list is to archive the discussions and I'd really like to keep allof this conversation in one place.

(Sorry to make any uninterested parties hit the "delete" key more thannecessary. If anyone really wants this conversation taken off-list, letus know.)

a) On resources and representation. I agree with your interpretationof what resources and representations are in the WCS domain in thesense that different subsets, interpolation, etc. identify differentresources and not simply different representations. This means thatthe query string parameters are not the set of input parameters for asingle processing service resource, but actually parts of differentresources identifiers. (Indeed only the parameter FORMAT should beconsidered affecting the representation and not identifying theresource. In a perfect REST world its content should be provided inthe Accept header field.). Our (Stefano's and mine) previous notespeaking of 'representation' storage was misleading.In my opinion, what is provided by the possible redirection is not anew resource but a (temporary) URI which is an alias of the originalURI for the same resource (a resource can have more than an URI). Forexample the resource http://someserver.net/coverages/foo?bbox=... isassigned a temporary identifierhttp://someserver.net/coverages/temp/xyz. Anyway the resource is stillretrievable at the original (and authoritative URI). This alias isuseful because, for example, in the time range of its validity theretrieving of the resource representation could be faster than theretrieving from the original (canonical) URI.

b) On creation and redirection. Taking into account also the previousinterpretation I still prefer the redirection response (302 code). Inparticular, I think that a GET should not create any resource. RFC1945 (HTTP/1.0) explicitely stated that "/Of the methods defined bythis specification, only POST can create a resource/.". In HTTP/1.1this statement was suppressed, I suppose, for the introduction ofmethods other than GET, HEAD and POST but I think that its originalmeaning (GET and HEAD methods cannot create resources) should remainvalid. Moreover I think that 302 responses could be cached and the URIprovided used more than one time. The RFC says that "/Since theredirection might be altered on occasion, the client SHOULD continueto use the Request-URI for future requests. This response is onlycacheable if indicated by a Cache-Control or Expires header field./"(Upper case as in the original). I interpret it on a weak sense suchas "If you are not sure about the validity of the redirection then usethe original uri" but if the server knows the redirection validity itcan provide it in the header and the client can refer to it.

I think dealing with asynchronous responses requires a flexible view ofGET vs POST, creation, and "resource". An asynchronously createdresource is, in general, only temporarily available and so doesn'taffect the long-term state of the system. Even if the response is storedmore permanently it still does not change the state of the system as thestored resource could be requested again with the original URI.

The key point in my thinking then is the intent of the request. Theintent is to retrieve a resource and not change the underlying data (orcause any other "side-effects"). So, I think the intent of the requestis both "safe" and "idempotent" in which case GET seems appropriate.

Of course, that is for a server determined asynchronous response. When aclient makes a "store=true" request, the intent of the request is tocreate a new (though possibly temporary) resource. [Idempotent but notsafe?] So, maybe a POST is more appropriate in this case.

Concerning the other two points that you touched in your last email,these are my opinions:
1) delayed/non-stored/pull case
What happens if two users make the same request around the same time?Does the server have to do the same processing twice?
Yes I think that if two users make the same request than the serverhas to do the same processing twice. (Obviously a smart server couldrecognize that the requests are the same and make use of a sort ofinternal cache, but this is an implementation problem. By the way, itis not easy to recognize that two requests are the same, in particulardue to the query string which is made of non-hierarchical parameters.E.g. two requests could only differ for the parameters order.)

And even worse, a small difference in a BBOX value might result in thesame resource.

Why would anyone prefer the delayed/non-stored/pull case overdelayed/stored/pull?
By the client point of view the non-stored use case has the only(really small) advantage of avoiding the redirection. But the serverhas other advantages (especially in terms of simplicity) and coulddecide to not support the stored use-case for all or some of itsresources.
Ah ha. Upon re-reading the "202 Accept" section of the HTTP spec, Irealize that there is nothing in the spec that says anything aboutthe results of the accepted processing. The 202 response seems tohave been targeted only at requests for processing where knowing ithas been completed is all that is important. Not, as I haveinterpreted it, that processing is done and may have resulted in anew resource (all encoded in the body of the response or the resultsof a status monitor). I think our interpretation of the 202 responseis the root of the difference in some of our responses. Though Istill find the 202 response the cleanest mapping to an asynchronousresponse. Whether the accepted processing results in an externallyaccessible artifact or not, the 202 response seems to capture what isgoing on. It is up to the body of the 202 response and any responseto the "status monitor" to communicate information about anyartifacts of the accepted processing.
Yes the 202 specification is very plain. Sending 202 the serverinforms that the request has been accepted but gives no otherinformation about the processing. It simply avoids to mantain theconnection open for long-running processes. It seems to be designed asthe minimal basis for allowing asynchronous interaction over HTTP. Itcan be used as is for a polling approach. A more meaningful semanticsis demanded to the body content. This is the reason we should define a(XML?) schema for providing information about processing status/result.

I definitely agree that we need to define some XML schema to providethis information.

Taking into account all the previous points we could consider thefollowing approach for asynchronous operations:
a) the Client performs a GET on URI Ures
b) If the availability is delayed the server sends a 202 providing alink to a status monitor resource (identified by the URI Ustatus)c) the client observes the status monitor (by polling or with a pushapproach in the future)
d) When the resource is available the status monitor responds:
   d1) 200 and content if storage is not required
d2) 302 with redirection to alias URI U2 and expiration information(if storage is required)
I think that this approach could be considered really close to whatRFC says. Let me know what you think.

That sounds good. Though I think of the status monitor as an extensionof the body of the 202 response (which is the XML document mentionedabove that we need to define). Perhaps this is part of why I had notthought of using redirects. I see this status monitor XML document asremoving the asynchronous response from the realm of the HTTPspecification (sort of) and instead moving it into the xlink:href world.So, rather than the status monitor response code redirecting us to thenew resource, the body of the status monitor response would indicate thenew resource was available and provide a link to the new resource. So,here's my take:


a) client GETs the Ures URI
b) if delayed, response 202 code with
b1) Location header providing status monitor URI (Ustatus)

b2) Body containing XML document with status, estimate of completion,and link to status monitor URI (Ustatus)

c) client GETs the Ustatus URI:

c1) if still not available, response code 200 with XML document same asresponse b2 (maybe without Ustatus link).c2) if available, response code 200 with XML document (similar toresponse b2?) that indicates the resource is ready and provides a linkto the resulting resource.


Some very simple XML possibilities ...
For b)
<asynchResponse status="processing" completionEstimate="2007-10-24T02:34">
 <statusMonitor xlink:href="some URI" />
</asynchResponse>

For c1)
<asynchResponse status="processing" completionEstimate="2007-10-24T02:34" />

For c2)
<asynchResponse status="done">
 <generatedResource xlink:href="some URI" />
</asynchResponse>


Best regards,
  Paolo


Thanks. This is a great discussion.

Ethan

--
Ethan R. Davis                                Telephone: (303) 497-8155
Software Engineer                             Fax:       (303) 497-8690
UCAR Unidata Program Center                   E-mail:    edavis@xxxxxxxx
P.O. Box 3000
Boulder, CO  80307-3000                       http://www.unidata.ucar.edu/
---------------------------------------------------------------------------

Follow-Ups:
- Re: [wcsplus] more on asynchronous response
  - From: Dominic Lowe