Hi all,
This is a fascinating conversation -- perhaps too fascinating. I'd
really like to break it out into components which is what I was trying
to do by proposing a core and extensions approach to standardization.
In my plan we start with a core standard. In the NASA version, 14
pages sufficed. It contains all the information needed to understand
a "netCDF object." I like that.
http://www.esdswg.org/spg/rfc/esds-rfc-011/ESDS-RFC-011v1.00.pdf
<http://www.esdswg.org/spg/rfc/esds-rfc-011/ESDS-RFC-011v1.00.pdf>
*NetCDF core standard plus extensions for each CF convention*
I also propose we develop the first (of perhaps several) extensions.
That extension will describe the CF-conventions for gridded data
which are mature and in wide use.
http://cf-pcmdi.llnl.gov/documents/cf-conventions
http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.4/cf-conventions.html#grid-mappings-and-projections
The CF conventions are essential for understanding the "what, where,
when" semantics of a netCDF object. In the future I envision
extensions for additional data types -- as Gerry suggests. The
recently proposed CF conventions for point/station data would be an
obvious next step. But in my plan, it would be a next step. We
would not try to do this all at once.
*API extensions to the core?*
As to the API, Jon Blower rightly points out that there are many
netCDF APIs. Perhaps at some point, we could also develop extensions
for each API. On the other hand maybe some aspects of this
standardization can be accomplished by pointing to the netCDF
documentation which is carefully written and consistently maintained.
I believe the KML spec does this, but I have to look into examples of
API standardization more carefully.
*The netCDF data model*
The basic elements of the netCDF data model are described in the NASA
expression of the standard. In addition, there is a formal
publication for which Stefano was the lead author that maps the netCDF
Commmon Data Model to the ISO 19123 data model. I'll try to dig up
the URL for the online version of this publication.
*Bottom line*
This discussion reinforces my conviction that we have to take a
stepwise approach to this, starting with the core standard where we
already have something in place with NASA and at least one extension,
namely, the CF conventions for gridded data. As we work on these, we
can continue the discussion of additional extensions for different
APIs and different CF conventions .
I would like to have the core standard in place within 6 months and
the first extension shortly after that. At that point we will have
discussed these other possible extensions and have a plan in place for
making them happen as we come to agreement on them.
-- Ben
PS Unfortunately I'm out of the office until next week, so I'll only
be in touch intermittently. Have at it!!
On Thu, Jul 16, 2009 at 6:37 AM, Gerry Creager <gerry.creager@xxxxxxxx
<mailto:gerry.creager@xxxxxxxx>> wrote:
Jon does a good job of identifying the issues and explaining
common usage, something I'd identified on the to-do list but had
not approached yet. I agree with him on all points below.
I remain concerned, though, that by pressing ahead in this,
without identifying a method for CF to address irregular grids and
better address point coverages in NetCDF, we will create a
situation where we ratify a new standard, and then turn around and
immediately have to revisit a lot of the same issues, but from a
different view-point. And understand: This isn't in the pattern
of creating a spec and then empaneling an RWG immediately, but
rather, sweeping the difficult part (irregular grid coverages,
common in ocean/coastal/marine applications) under the rug while
we ratify what we know works well already. I would like to see
that hard part addressed as an element initially. The fundamental
data model needs to address this issue, and there's not a good
point to ratifying what's effectively a standard already, through
common use (note Jon's comment on how NetCDF is used in the
community already), with the knowledge that we have to redo the
whole exercise, almost immediately.
gerry
Jon Blower wrote:
Hi Ron, all,
I think it's confusing to talk about "the NetCDF API", because in
reality there are lots of APIs at work in reading data using what
might loosely be called "NetCDF technologies". So when we
talk about
"standardizing NetCDF APIs through OGC" we could be talking about
several different things:
1) Standardizing the NetCDF data model as a means of structuring
array-based information (this could be an implementation of a
Coverage, in fact Bryce Nordgren has compared the NetCDF data
model
with ISO19123 Coverages). The data model describes a kind of
language-independent API. Importantly, lots of file formats
can be
modelled using the NetCDF data model.
2) Standardizing the NetCDF file format as a means of encoding
data on
disk. There are APIs in many languages for reading this format.
3) Standardizing the Climate and Forecast metadata conventions
as a
means of georeferencing the arrays and adding semantics. The
interpretation of these conventions requires another API.
4) Standardizing the Data Access Protocol as a request-response
mechanism for getting data using web services. The
request-response
mechanism is another API.
In the NetCDF community, we are very accustomed to simply
using the
second type of API in our programs, with the rest of the APIs
being
handled transparently behind the scenes in our tools.
The following expansion is intended for those who are
unfamiliar with
NetCDF technologies - Unidata guys can go to sleep now!
Very briefly, the NetCDF data model considers Datasets, which
contain
Variables (temperature, salinity etc), which contain Arrays of
data.
There are structures for holding coordinate systems for the
data in
the Arrays. Georeferencing is achieved through the use of
attributes,
whose names are standardized in the Climate and Forecast (CF)
conventions.
In terms of data transport, we always have the possibility to just
transfer NetCDF files from place to place. However, Steve hit the
nail on the head when he said:
It is the direct connection
between data and applications (or intermediate services)
-- i.e. the
disappearance of the "physical" (binary) file -- which
seems like the
service-oriented vision.
We can create "virtual" datasets, then expose them through the
Data
Access Protocol. The data model of the DAP is very close to
that of
NetCDF, so data transport on the wire is very nearly lossless.
The
client can get a handle to a Variable object, which might actually
reside physically on a remote server, and whose data might
actually be
spread across different files. It's extremely powerful and
useful.
(It's even more powerful when you consider that the NetCDF
data model
can be applied to many different file formats such as GRIB,
the WMO
standard. This means that the "NetCDF Variable" in question might
actually be a virtual variable consisting of a thousand individual
GRIB files.)
One key difference between this method and GML/WFS is that the DAP
protocol knows nothing about geographic information: this
information
is carried in the (CF-compliant) attributes, which require
interpretation by an intelligent client. Also, the data are
transported as arrays in compressed binary format so there's
little
chance of a human being able to interpret the data stream on
the wire.
However, this allows the efficient transport of large data
volumes.
The opaqueness of the DAP is handled through the use of tools:
humans
hardly ever construct DAP requests manually.
Hope this helps,
Jon
On Wed, Jul 15, 2009 at 11:53 PM, Ron Lake<rlake@xxxxxxxxxxxxx
<mailto:rlake@xxxxxxxxxxxxx>> wrote:
Hi John:
See comments below.
Ron
From: John Graybeal [mailto:graybeal@xxxxxxxxx
<mailto:graybeal@xxxxxxxxx>]
Sent: July 15, 2009 3:29 PM
To: Ron Lake
Cc: Steve Hankin; Ben Domenico; Unidata Techies; Unidata
GALEON; Mohan
Ramamurthy; Meg McClellan
Subject: Re: [galeon] plan for establishing CF-netCDF as
an OGC standard
Ron, I am unfamiliar with GML, and I am not sure I
understand what you are
saying. I think of the encoding for _transport_ as a very
different thing
than an encoding for files. If I am not mistaken, the
netCDF API provides an
encoding for transport also. No?
OK – perhaps I misspoke – I am not that familiar with
NetCDF API – often an
API defines just Request/Response and uses something else
for the transport
– that is the case for GML/WFS or GML/WCS. I have just
often observed in
OGC a conflict between encodings and API’s when we should
focus on the two
together. Sometimes the API folks want to enable many
transport encodings
and the encoding people want to support many
request/response API’s etc .
Which brings me to Steve's email, with which I agree in
broad terms. One
thing that CF has that is not explicit/required in the
netCDF API definition
is at least the possibility of providing one standard name
for each
variable. (More would be better, but one step at a
time....) I am sure this information makes it across the
API when it is provided, but to be honest, in this day and
age spending a lot of time standardizing the API, while
remaining quiet about the semantics of the transported
information, does not seem cost-effective. I think there
might be some easy strategies for bridging that gap
(mostly by insisting on CF-compliant data on the far side
of the interface).
John
On Jul 15, 2009, at 12:30 PM, Ron Lake wrote:
Hi,
I think one needs to standardize BOTH – an access API and
an encoding, AND
to do this in a way that they work with one another. It
is for this reason
(as an example) that GML exposes the source data model (as
well as acting as
the data encoding for transport) so that WFS can define
requests in a
neutral manner. It should NOT be a matter of ONE or the
OTHER. You might
also look at the work of the XQuery Data Model group.
R
From: galeon-bounces@xxxxxxxxxxxxxxxx
<mailto:galeon-bounces@xxxxxxxxxxxxxxxx>
[mailto:galeon-bounces@xxxxxxxxxxxxxxxx
<mailto:galeon-bounces@xxxxxxxxxxxxxxxx>] On Behalf Of
Steve Hankin
Sent: July 15, 2009 12:18 PM
To: Ben Domenico
Cc: Unidata Techies; Unidata GALEON; Mohan Ramamurthy; Meg
McClellan
Subject: Re: [galeon] plan for establishing CF-netCDF as
an OGC standard
Hi Ben,
Firstly -- applause, applause! This is an important step.
Thanks so much
for leading it.
If it is not too late, however, I'd like to open a
discussion on a rather
significant change in the approach. As outlined at the
URL you provided the
approach focuses on "CF-netCDF as an OGC binary encoding
standard".
Wouldn't out outcomes be more powerful and visionary if
instead we focussed
on the netCDF API as an OGC standard? Already today we
see great volumes of
GRIB-formatted data that are served as-if NetCDF through
OPeNDAP -- an
illustration of how the API as a remote service becomes a
bridge for
interoperability. The vital functionalities of
aggregation, and
augmentation via NcML are about exposing *virtual* files
-- again, exposing
the API, rather than the binary encoding.
It is the ability to access remote subsets of a large
netCDF virtual
dataset, where we see the greatest power of netCDF as a
web service. While
this can be implemented as a "fileout" service (the binary
encoding standard
approach) -- and that has been done successfully in WCS
and elsewhere -- it
does not seem like the optimal strategy. It is the
direct connection
between data and applications (or intermediate services)
-- i.e. the
disappearance of the "physical" (binary) file -- which
seems like the
service-oriented vision. This would not eliminate the
ability of the
standard to deliver binary netCDF files in the many cases
where that is the
desired result. Simple REST fileout services are
desirable and should
perhaps be included as well in this standards package.
David Artur (OGC representative) indicated at the meeting
where we met with
him in May that there were other examples of standardizing
APIs within OGC.
He also mentioned that with a community-proven
interoperability standard the
OGC process can be relatively forgiving and streamlined
(fingers crossed ...
lets hope). As I understand it, the most recent documents
from GALEON allow
for an OPeNDAP URL as the payload of WCS. So the concept
of the API
standard -- the reference to the file, rather than the
binary file itself --
has already made its way into the GALEON work, too. I
imagine there have
already been discussions about this point. Very
interested to hear yours
and other's thoughts.
- Steve
--
Gerry Creager -- gerry.creager@xxxxxxxx
<mailto:gerry.creager@xxxxxxxx>
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
------------------------------------------------------------------------
_______________________________________________
galeon mailing list
galeon@xxxxxxxxxxxxxxxx
For list information, to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/