Re: [galeon] plan for establishing CF-netCDF as an OGC standard

To: Ben Domenico <Ben@xxxxxxxxxxxxxxxx>
Subject: Re: [galeon] plan for establishing CF-netCDF as an OGC standard
From: Steve Hankin <Steven.C.Hankin@xxxxxxxx>
Date: Thu, 16 Jul 2009 10:23:16 -0700

Hi All,

Jon -- thanks for filling in so many of the important backgroundconcepts. Fundamental agreements:


  1. the foundation concept is the netCDF data model
  2. there are multiple APIs that speak to that data model

The NASA document that Ben Domenico pointed to provides an excellentwrite-up on the data model. The APIs for Java, c, and FORTRAN flavorsare already very well documented in Unidata Users' Guides. In short,there are very good starting documents, whatever approach is taken.

Ben has suggested a stepwise approach to standardization. Maybe that isbest, though I'm not completely convinced, myself. It is a strategymatter. Lets first see if we are agreeing on our fundamental target.I'd argue that our target is to identify and standardize the elements ofnetCDF-CF-DAP that are the wellspring of its successes atinteroperability for our community and to get them standardized as fastas we can reasonably achieve. That successful interoperability is allabout clients that know how to speak the standard; and the ability torapidly develop and adapt those clients to meet user needs. FornetCDF-CF-DAP it is through the APIs and the data model that clientsconnect to the data sources. It is not through the binary data format(as per earlier discussions of virtual files, where binary format isirrelevant).

Do we agree that the data model and APIs are the appropriate target forstandardization? So far it has sounded like the answer is yes. Ifso, then lets move on to the questions Ben has opened about the optimalstrategy for reaching that target.


   - Steve

=====

P.S. I offer these points about standard processes (opinions? orcommon sense?)


   * it is proven technologies (like netCDF-CF-DAP) that most deserve
     standardization.  New and insufficiently tested technologies are

high risk as "standards". Particularly if they are complex.They often lead to divergence rather than convergence in practice.

   * the standardization process has to be rapid relative to the rate
     that the underlying foundation technologies evolve.  Otherwise its
     achievements are rendered irrelevant.
   * community interoperability standards and industry interoperability
     standards need not always follow the same path.  Industry
     standards must be painfully simple and general to gain initial
     acceptance.  Communities can standardize on narrower and more
     complex solutions if they are well-proven.

If others agree that this also an important topic to reach agreement (orperhaps "higher awareness") on, then perhaps we should open anotheremail thread and see where this leads. I often feel that as a communitywe lose sight of the fundamentals behind "why we standardize", when wefind ourselves excited about the potentials from next-generationtechnologies.


=============================

Ben Domenico wrote:

Hi all,

This is a fascinating conversation -- perhaps too fascinating. I'dreally like to break it out into components which is what I was tryingto do by proposing a core and extensions approach to standardization.In my plan we start with a core standard. In the NASA version, 14pages sufficed. It contains all the information needed to understanda "netCDF object." I like that.http://www.esdswg.org/spg/rfc/esds-rfc-011/ESDS-RFC-011v1.00.pdf<http://www.esdswg.org/spg/rfc/esds-rfc-011/ESDS-RFC-011v1.00.pdf>

*NetCDF core standard plus extensions for each CF convention*

I also propose we develop the first (of perhaps several) extensions.That extension will describe the CF-conventions for gridded datawhich are mature and in wide use.

    http://cf-pcmdi.llnl.gov/documents/cf-conventions


    
http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.4/cf-conventions.html#grid-mappings-and-projections

The CF conventions are essential for understanding the "what, where,when" semantics of a netCDF object. In the future I envisionextensions for additional data types -- as Gerry suggests. Therecently proposed CF conventions for point/station data would be anobvious next step. But in my plan, it would be a next step. Wewould not try to do this all at once.


*API extensions to the core?*

As to the API, Jon Blower rightly points out that there are manynetCDF APIs. Perhaps at some point, we could also develop extensionsfor each API. On the other hand maybe some aspects of thisstandardization can be accomplished by pointing to the netCDFdocumentation which is carefully written and consistently maintained.I believe the KML spec does this, but I have to look into examples ofAPI standardization more carefully.


*The netCDF data model*

The basic elements of the netCDF data model are described in the NASAexpression of the standard. In addition, there is a formalpublication for which Stefano was the lead author that maps the netCDFCommmon Data Model to the ISO 19123 data model. I'll try to dig upthe URL for the online version of this publication.


*Bottom line*

This discussion reinforces my conviction that we have to take astepwise approach to this, starting with the core standard where wealready have something in place with NASA and at least one extension,namely, the CF conventions for gridded data. As we work on these, wecan continue the discussion of additional extensions for differentAPIs and different CF conventions .

I would like to have the core standard in place within 6 months andthe first extension shortly after that. At that point we will havediscussed these other possible extensions and have a plan in place formaking them happen as we come to agreement on them.


-- Ben

PS Unfortunately I'm out of the office until next week, so I'll onlybe in touch intermittently. Have at it!!

On Thu, Jul 16, 2009 at 6:37 AM, Gerry Creager <gerry.creager@xxxxxxxx<mailto:gerry.creager@xxxxxxxx>> wrote:


    Jon does a good job of identifying the issues and explaining
    common usage, something I'd identified on the to-do list but had
    not approached yet.  I agree with him on all points below.

    I remain concerned, though, that by pressing ahead in this,
    without identifying a method for CF to address irregular grids and
    better address point coverages in NetCDF, we will create a
    situation where we ratify a new standard, and then turn around and
    immediately have to revisit a lot of the same issues, but from a
    different view-point.  And understand: This isn't in the pattern
    of creating a spec and then empaneling an RWG immediately, but
    rather, sweeping the difficult part (irregular grid coverages,
    common in ocean/coastal/marine applications) under the rug while
    we ratify what we know works well already. I would like to see
    that hard part addressed as an element initially.  The fundamental
    data model needs to address this issue, and there's not a good
    point to ratifying what's effectively a standard already, through
    common use (note Jon's comment on how NetCDF is used in the
    community already), with the knowledge that we have to redo the
    whole exercise, almost immediately.

    gerry

    Jon Blower wrote:

        Hi Ron, all,

        I think it's confusing to talk about "the NetCDF API", because in
        reality there are lots of APIs at work in reading data using what
        might loosely be called "NetCDF technologies".  So when we
        talk about
        "standardizing NetCDF APIs through OGC" we could be talking about
        several different things:

        1) Standardizing the NetCDF data model as a means of structuring
        array-based information (this could be an implementation of a
        Coverage, in fact Bryce Nordgren has compared the NetCDF data
        model
        with ISO19123 Coverages).  The data model describes a kind of
        language-independent API.  Importantly, lots of file formats
        can be
        modelled using the NetCDF data model.

        2) Standardizing the NetCDF file format as a means of encoding
        data on
        disk.  There are APIs in many languages for reading this format.

        3) Standardizing the Climate and Forecast metadata conventions
        as a
        means of georeferencing the arrays and adding semantics.  The
        interpretation of these conventions requires another API.

        4) Standardizing the Data Access Protocol as a request-response
        mechanism for getting data using web services.  The
        request-response
        mechanism is another API.

        In the NetCDF community, we are very accustomed to simply
        using the
        second type of API in our programs, with the rest of the APIs
        being
        handled transparently behind the scenes in our tools.


        The following expansion is intended for those who are
        unfamiliar with
        NetCDF technologies - Unidata guys can go to sleep now!

        Very briefly, the NetCDF data model considers Datasets, which
        contain
        Variables (temperature, salinity etc), which contain Arrays of
        data.
        There are structures for holding coordinate systems for the
        data in
        the Arrays.  Georeferencing is achieved through the use of
        attributes,
        whose names are standardized in the Climate and Forecast (CF)
        conventions.

        In terms of data transport, we always have the possibility to just
        transfer NetCDF files from place to place.  However, Steve hit the
        nail on the head when he said:

            It is the direct connection
            between data and applications (or intermediate services)
            -- i.e. the
            disappearance of the "physical" (binary) file -- which
            seems like the
            service-oriented vision.


        We can create "virtual" datasets, then expose them through the
        Data
        Access Protocol.  The data model of the DAP is very close to
        that of
        NetCDF, so data transport on the wire is very nearly lossless.
         The
        client can get a handle to a Variable object, which might actually
        reside physically on a remote server, and whose data might
        actually be
        spread across different files.  It's extremely powerful and
        useful.
        (It's even more powerful when you consider that the NetCDF
        data model
        can be applied to many different file formats such as GRIB,
        the WMO
        standard.  This means that the "NetCDF Variable" in question might
        actually be a virtual variable consisting of a thousand individual
        GRIB files.)

        One key difference between this method and GML/WFS is that the DAP
        protocol knows nothing about geographic information: this
        information
        is carried in the (CF-compliant) attributes, which require
        interpretation by an intelligent client.  Also, the data are
        transported as arrays in compressed binary format so there's
        little
        chance of a human being able to interpret the data stream on
        the wire.
         However, this allows the efficient transport of large data
        volumes.
        The opaqueness of the DAP is handled through the use of tools:
        humans
        hardly ever construct DAP requests manually.

        Hope this helps,
        Jon

        On Wed, Jul 15, 2009 at 11:53 PM, Ron Lake<rlake@xxxxxxxxxxxxx
        <mailto:rlake@xxxxxxxxxxxxx>> wrote:

            Hi John:

            See comments below.

            Ron



            From: John Graybeal [mailto:graybeal@xxxxxxxxx
            <mailto:graybeal@xxxxxxxxx>]
            Sent: July 15, 2009 3:29 PM
            To: Ron Lake
            Cc: Steve Hankin; Ben Domenico; Unidata Techies; Unidata
            GALEON; Mohan
            Ramamurthy; Meg McClellan
            Subject: Re: [galeon] plan for establishing CF-netCDF as
            an OGC standard



            Ron, I am unfamiliar with GML, and I am not sure I
            understand what you are
            saying.  I think of the encoding for _transport_ as a very
            different thing
            than an encoding for files. If I am not mistaken, the
            netCDF API provides an
            encoding for transport also. No?



            OK – perhaps I misspoke – I am not that familiar with
            NetCDF API – often an
            API defines just Request/Response and uses something else
            for the transport
            – that is the case for GML/WFS or GML/WCS.  I have just
            often observed in
            OGC a conflict between encodings and API’s when we should
            focus on the two
            together.  Sometimes the API folks want to enable many
            transport encodings
            and the encoding people want to support many
            request/response API’s etc .




            Which brings me to Steve's email, with which I agree in
            broad terms. One
            thing that CF has that is not explicit/required in the
            netCDF API definition
            is at least the possibility of providing one standard name
            for each
            variable. (More would be better, but one step at a
            time....)  I am sure this information makes it across the
            API when it is provided, but to be honest, in this day and
            age spending a lot of time standardizing the API, while
            remaining quiet about the semantics of the transported
            information, does not seem cost-effective.  I think there
            might be some easy strategies for bridging that gap
            (mostly by insisting on CF-compliant data on the far side
            of the interface).



            John



            On Jul 15, 2009, at 12:30 PM, Ron Lake wrote:

            Hi,



            I think one needs to standardize BOTH – an access API and
            an encoding, AND
            to do this in a way that they work with one another.  It
            is for this reason
            (as an example) that GML exposes the source data model (as
            well as acting as
            the data encoding for transport) so that WFS can define
            requests in a
            neutral manner.  It should NOT be a matter of ONE or the
            OTHER.  You might
            also look at the work of the XQuery Data Model group.

            R



            From: galeon-bounces@xxxxxxxxxxxxxxxx
            <mailto:galeon-bounces@xxxxxxxxxxxxxxxx>
            [mailto:galeon-bounces@xxxxxxxxxxxxxxxx
            <mailto:galeon-bounces@xxxxxxxxxxxxxxxx>] On Behalf Of
            Steve Hankin
            Sent: July 15, 2009 12:18 PM
            To: Ben Domenico
            Cc: Unidata Techies; Unidata GALEON; Mohan Ramamurthy; Meg
            McClellan

            Subject: Re: [galeon] plan for establishing CF-netCDF as
            an OGC standard



            Hi Ben,

            Firstly -- applause, applause!  This is an important step.
             Thanks so much
            for leading it.

            If it is not too late, however, I'd like to open a
            discussion on a rather
            significant change in the approach.  As outlined at the
            URL you provided the
            approach focuses on "CF-netCDF as an OGC binary encoding
            standard".
            Wouldn't out outcomes be more powerful and visionary if
            instead we focussed
            on the netCDF API as an OGC standard?  Already today we
            see great volumes of
            GRIB-formatted data that are served as-if NetCDF through
            OPeNDAP -- an
            illustration of how the API as a remote service becomes a
            bridge for
            interoperability.  The vital functionalities of
            aggregation, and
            augmentation via NcML are about exposing *virtual* files
            -- again, exposing
            the API, rather than the binary encoding.

            It is the ability to access remote subsets of a large
            netCDF virtual
            dataset, where we see the greatest power of netCDF as a
            web service.  While
            this can be implemented as a "fileout" service (the binary
            encoding standard
            approach) -- and that has been done successfully in WCS
            and elsewhere -- it
            does not seem like the optimal strategy.   It is the
            direct connection

            between data and applications (or intermediate services)
            -- i.e. the
            disappearance of the "physical" (binary) file -- which
            seems like the
            service-oriented vision.  This would not eliminate the
            ability of the

            standard to deliver binary netCDF files in the many cases
            where that is the
            desired result.  Simple REST fileout services are
            desirable and should
            perhaps be included as well in this standards package.

            David Artur (OGC representative) indicated at the meeting
            where we met with
            him in May that there were other examples of standardizing
            APIs within OGC.
            He also mentioned that with a community-proven
            interoperability standard the
            OGC process can be relatively forgiving and streamlined
            (fingers crossed ...
            lets hope).  As I understand it, the most recent documents
            from GALEON allow
            for an OPeNDAP URL as the payload of WCS.  So the concept
            of the API
            standard -- the reference to the file, rather than the
            binary file itself --
            has already made its way into the GALEON work, too.    I
            imagine there have
            already been discussions about this point.  Very
            interested to hear yours
            and other's thoughts.

               - Steve

--Gerry Creager -- gerry.creager@xxxxxxxx

    <mailto:gerry.creager@xxxxxxxx>

Texas Mesonet -- AATLT, Texas A&M UniversityCell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983

    Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843


------------------------------------------------------------------------

_______________________________________________
galeon mailing list
galeon@xxxxxxxxxxxxxxxx

For list information, to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/

References:
- [galeon] plan for establishing CF-netCDF as an OGC standard
  - From: Ben Domenico
- Re: [galeon] plan for establishing CF-netCDF as an OGC standard
  - From: Steve Hankin
- Re: [galeon] plan for establishing CF-netCDF as an OGC standard
  - From: Ron Lake
- Re: [galeon] plan for establishing CF-netCDF as an OGC standard
  - From: John Graybeal
- Re: [galeon] plan for establishing CF-netCDF as an OGC standard
  - From: Ron Lake
- Re: [galeon] plan for establishing CF-netCDF as an OGC standard
  - From: Jon Blower
- Re: [galeon] plan for establishing CF-netCDF as an OGC standard
  - From: Gerry Creager
- Re: [galeon] plan for establishing CF-netCDF as an OGC standard
  - From: Ben Domenico