Re: THREDDS API Question

To: Ethan Davis <edavis@xxxxxxxxxxxxxxxx>
Subject: Re: THREDDS API Question
From: Nathan Potter <ndp@xxxxxxxxxxx>
Date: Mon, 11 Jun 2007 16:47:20 -0700



Ethan,

I've been thinking about it too. It is probably the case that this isan intractable issue. The THREDDS architecture benefits from theability to write catalogs independent of the data source/archive.THREDDS is designed so that I can write a catalog and serve it on mysystem, while the data sources referenced in the catalog may exist ondifferent systems - often far away and outside my sphere of influence.

The idea that the Access Link (the link that THREDDS generates thatallows data access) can be used to back track into the THREDDScatalog is cross purposes to the THREDDS design. It would be possiblefor a particular server implementation to have very carefullyconstructed THREDDS catalogs that would allow it to do this, but ageneral case is simply not possible.


At least as far as I can see it.

Nathan



On Jun 11, 2007, at 4:09 PM, Ethan Davis wrote:

Hi Nathan,
Sorry this is taking awhile. I'm trying to figure out some of thetrade offs and such involved in a variety of ways of handling this.I should have a more detailed response tomorrow.
Ethan


Nathan Potter wrote:
Ethan et al.,
After talking with Ethan on the phone today I think I can statethe issue more clearly:
The current THREDDS Servlet Framework (TSF) does not allow thecollection/dataset information to be retrieved via the request URL.
The API method DataRootHandler.getCatalog(java.lang.String path,java.net.URI baseURI) expects the "path" parameter to be the pathin the THREDDS catalog to the catalog file. There is norestriction on the file name of the catalog file. The path in theTHREDDS catalog to the file may be different that the access URL.
What this means is that when a servlet receives an access request,even one that comes from a valid access link in a THREDDS catalog(.html), the servlet only knows about the request URL, nothingmore. If the servlet needs to get the THREDDS dataset/collectioninformation (and associated metadata if any) then it has norecourse but to attempt to search the catalog from the highestlevel looking for a dataset with a matching "urlPath" attribute.This activity may fail if:
- The THREDDS catalog employs <catalogRef> elements.

- The "urlPath" is not unique within the catalog.
I think that the TSF API should be augmented with accessor methodsthat allow the DataRootHandler to return InvDataset an InvCatalogto be retrieved based on information that a servlet has access toat run time, i.e. data that can be retrieved from theHttpServletRequest object.
Nathan





On Jun 4, 2007, at 5:00 PM, Nathan Potter wrote:
On Jun 4, 2007, at 1:05 PM, Ethan Davis wrote:
Hi Nathan,
Can you explain the context for these questions. This is on theserver side (in Hyrax)?
Yes, server side.
Nathan Potter wrote:
Greetings,
So I am using the THREDDS API in an attempt to get the<property> elements for a dataset. I've run into a couple of(possibly related) problems.
Just to clarify our terminology. When you say "THREDDS API" youmean both the thredds.catalog and thredds.servlet packages? Igenerally split those apart and call the thredds.catalog packagethe "THREDDS Catalog API" and call the thredds.servlet packagethe "THREDDS Servlet Framework" (TSF).
[Note: the TSF is probably only useful for those writing servers.]
I wasn't distinguishing. But since DataRootHandler is in the TSFthen that is where I am suggesting an API change.
** 1) I can't get the dataset information without searching.

In the HttpServletRequest I have the URL for the dataset, say:

http://localhost:8080/opendap/wcs/MODIS/Grid/test.hdf.html
Is this URL for an OPeNDAP HTML response?
Right, but the requested response isn't really meaningful in thisdiscussion since all I am really after is the THREDDS datasetinformation for the atom/leaf/dataset test.hdf
Are you trying to get the property from the THREDDS catalog soyou can use it in the OPeNDAP response?
Well... In truth it's much more complex than that, but since Iwill have to do that too we can roll with that vision for themoment.
In order for me to get THREDDS to divulge the <property>elements for the dataset I have to:
- take the dataset name "wcs/MODIS/Grid/test.hdf.html" and backtrack to the
  collection name, "wcs/MODIS/Grid/".
- ask the DataRootHandler for the InvCatalog for "wcs/MODIS/Grid/"
- Ask the InvCatalog for the InvDataset for "wcs/MODIS/Grid/"
- Search the child datasets of the "wcs/MODIS/Grid/" InvDatasetfor the
  one whose name (lexically) matches "wcs/MODIS/Grid/test.hdf.set"
- Read the properties of that InvDataset
That seems awfully complex. (Of course there may a morestraight forward way that I am not aware of.)
That is about as simple as it gets. Though I would suggest youmake sure the THREDDS configuration (TSF) knows about thisdataset first by getting the CrawlableDataset that matches thedataset URL:DataRootHandler.getCrawlableDataset("wcs/MODIS/Grid/test.hdf")// I dropped of the trailing ".html" assuming it was theOPeNDAP dataset URL extension
When I tried this I could only get CrawlableDataset objects forcatalogs that were part of a <datasetScan>
Are you using InvDataset.findDatasetByName( String name) to findthe child dataset?
No.
Also, depending on how you setup your dataset IDs, you could askthe catalog to find the dataset by ID, like
      cat.findDatasetByID( "wcs/MODIS/Grid/test.hdf")
Ahhh... I just tried that and it works. So, that greatlysimplifies that step, thanks!
** 2) When I ask for a catalog I have to know the name of theXML file in which it resides.
In the above example, when I ask the DataRootHandler for theInvCatalog I ask for: " wcs/MODIS/Grid/catalog.xml" Which isall well and good if all of the catalogs are stored in filescalled catalog.xml. Essentially this means that anyoneconfiguring a THREDDS catalog has to create a hierarchy ofdirectories that mimics the organizatiopn of the collections,and all of the THREDDS information must be stored in filescalled "catalog.xml".
Why do you need to create this hierarchy of directoriesmimicking the data collection hierarchy? The TSF should keeptrack of your config catalogs and the automatically generatedcatalogs.
Right, but if all of the THREDDS catalog files have the name"catalog.xml" they can't all be in the same directory, so theyhave to live in some kind of directory hierarchy - I just figuredit made sense to mimic the collection organization, but that'snot necessary.
THREDDS does not actually require this - I can make a complexhierarchy of collections by using either a single (complex) toplevel catalog.xml file, or a collection of XML files in asingle directory that employ <catalogRef> elements to createtheir organizations.
However the API breaks down in both cases.
If the catalog is composed of a collection of XML files in asingle directory that employ <catalogRef> elements to createtheir organizations, then in order to retrieve cataloginformation I would have to KNOW how the information wasorganized (file names, directory hierarchy , etc.) But I don'tknow - since the catalog may be created by a user after compiletime (although THREDDS does know this since it parsed all ofthe catalog information at start up) - and I shouldn't have toknow. For me to know would require that I parse the top levelcatalog.xml file and build the XML doc tree myself. At whichpoint it I can get the elusive <property> elements from the XMLdoc in memory.
If the catalog is composed of a single (complex) top levelcatalog.xml file then I would have to know that and just askfor the top level catalog.
(Searching the entire catalog from the top down for my datasetdoesn't seem to work either...)
I'm sorry, I'm having a hard time following here. What are youtrying to do and why?
For any request that is looking for one of the OPeNDAP dataresponses I need to search the THREDDS catalog for the dataset,and if found, I need to extract any metadata that may in thecatalog for that dataset.
Is the problem that you may not know if the dataset is containedin a catalog generated because of a datasetScan element orcontained directly in one of the THREDDS config catalogs?
I think that's a separate issue.
All of these methods of writing and organizing catalogs arelegitimate in THREDDS, and users writing THREDDS catalogs wouldlikely employ one or more of these methods when writing theircatalogs.
I propose that the THREDDS API be extended so that one cansimply ask the DataRootHandler for an InvDataset or anInvCatalog. Like:
    InvDataset id = drh.getDataSet("wcs/MODIS/foo.nc");
    InvCatalog id = drh.getCatalog("wcs/MODIS/");

or possible the InvDataset that represents a collection:

    InvDataset id = drh.getDataSet("wcs/MODIS/");


If the DataRootHandler doesn't have it, return null.


Is that unreasonable?
I'll have to take a closer look at this.

Ethan
Nathan
=Nathan Potter ndp at opendap.org
OPeNDAP, Inc.                        541.752.1852
==============================================================================
To unsubscribe thredds, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
==============================================================================
--
Ethan R. Davis Telephone: (303)497-8155Software Engineer Fax: (303)497-8690UCAR Unidata Program Center E-mail:edavis@xxxxxxxx
P.O. Box 3000
Boulder, CO 80307-3000 http://www.unidata.ucar.edu/---------------------------------------------------------------------------
=Nathan Potter ndp at opendap.org
OPeNDAP, Inc.                        541.752.1852
=Nathan Potter ndp at opendap.org
OPeNDAP, Inc.                        541.752.1852
==============================================================================
To unsubscribe thredds, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
==============================================================================
--
Ethan R. Davis Telephone: (303)497-8155Software Engineer Fax: (303)497-8690UCAR Unidata Program Center E-mail:edavis@xxxxxxxx
P.O. Box 3000
Boulder, CO 80307-3000 http://www.unidata.ucar.edu/---------------------------------------------------------------------------

=Nathan Potter ndp at opendap.org

OPeNDAP, Inc.                        541.752.1852


==============================================================================
To unsubscribe thredds, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
==============================================================================

Follow-Ups:
- Re: THREDDS API Question
  - From: Ethan Davis

References:
- THREDDS API Question
  - From: Nathan Potter
- Re: THREDDS API Question
  - From: Ethan Davis
- Re: THREDDS API Question
  - From: Nathan Potter
- Re: THREDDS API Question
  - From: Nathan Potter
- Re: THREDDS API Question
  - From: Ethan Davis