Re: Preliminary HDF5 Dimension documents

To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Re: Preliminary HDF5 Dimension documents
From: John Caron <caron@xxxxxxxxxxxxxxxx>
Date: Mon, 29 Sep 2003 15:58:21 -0600

Quincey Koziol wrote:

Hi John,
Hi Quincey, some thoughts on your proposal:

1. A few notes on naming differences between the netCDF and HDF5 data model:
A netCDF *Variable* is a multidimensional array of primitivevalues, roughly corresponding to a HDF5 *Dataset.*
   Yup.
A netCDF *Dimension *is a named array index. They are globallyscoped, so can be shared. A Variable specifies its dimensionality byreferencing a set of Dimensions, this set corresponds to an HDF5*Dataspace. *There is no exact equivilence to a Dimension as iunderstand it. The fact that Variables can share Dimensions adds animportant meaning to netCDF files.
   This document introduces dimensions as an optional method of composing
a dataspace in HDF5, so they ought to be completely analogous to netCDF
dimensions.

sorry, i didnt realize you were defining dimensions seperately fromdimension scales. thats very good, from my POV.

   One possible difference is that I wasn't planning on naming the dimensions
within a dataspace.  They were just going to be indexed by their rank within
the dataspace (i.e. the 0th dimension, the 1st dimension, etc).  This could
reference a named dimensions through an indirect dimension (see the shareability
document), but the actual dimensions in the dataspace weren't planned on having
names associated with them.

only shared dimensions need be named.

   Do you think this is an important requirement?  Does the netCDF API
require that the dimensions in a dataspace for a dataset have names, or
will having shared dimensions using the names of dimension objects in the
grouping hierarchy be sufficient?

netcdf only has shared dimesnions, so they are always named.

A netCDF *Coordinate Variable* is a 1D Variable whose name matchesits dimension's name, and whose values are monotonic. This correspondsto your proposed *Dimension Scale*. Note that a netCDF Dimensiondescribes array indices, whereas a Coordinate Variable / Dimension Scaledescribe coordinates values assigned to each index of the correspondingDimension.
   Yes, I designed the new HDF5 Dimension Scale model to be compatible
with netCDF Coordinate Variables (ideally, Dimension Scales will be a supersetof Coordinate Variables). I'm still not totally pleased with the term "scale"
and somewhat lean toward using netCDF's "coordinates" term since that more
accurately describes their true meaning, but since HDF4 used "scale", I may end
up sticking with the term... :-/
2. So, generally I like your Dimension Scale proposal. The main thingswe need are 1) shared Dimensions even when theres not a coordinatevariable (perhaps a Dimension Scale without the values?),
   Actually, the HDF5 Dimensions will be able to be shared by different
dataspaces without involving any Dimension Scales.

good

2) each Dimension Scale must have a name;

   Yes, that's the primary method of indexing them from a dimension.  I
imagine we may have an API function to get the n'th scale, but that's not
a requirement at this point.

good

and 3) a Variable/Dataset can specifyits dimensionality/Dataspace by listing the Dimensions (or their names).
   I'm planning on adding API functions for "composing" a dataspace from
dimensions and then that "composed" dataspace could be used to create datasets.

good

3. While 1D Coordinate Variables / Dimension Scales are the common case,there are also datasets that need different kinds of coordinate systems,including multidimensional coordinate variables. I am eager that netCDF/ HDF5 can support these, but I think they can be built on top of thecurrent functionality, and so we can leave them out of this discussionso as to keep things from getting too complicated. (for more details onthose ideas, see chapter 3.1 of the java-netcdf user manual).
   As I mentioned to Russ and Ed last week, I think that having support for
coordinate systems (I was calling them "multi-dimensional scales" at the time)
is an important feature to include.  I've printed the java-netcdf user
manual and will be using it for reference during further iterations on the
HDF5 dimension scale design to try to include this concept.  I imagine that I'll
associate them with the dataspace directly instead of hanging them off the
dimensions (since the dataspace can be multi-dimensional and the dimensions are
1-D by definition).

   Also, I was considering cutting the ability of dimensions to have multiple
scales associated with them (to simplify things), but glancing through the
java-netcdf information, it looks like that may be an important feature.
What's your opinion about how critical that is and how often it is used?

   Quincey

i think there are 2 interesting examples if you try to handle coordinatesystems in a general way:

1. float lat(x,y) and float lon(x,y) assign latitude and longitudecoordinates to points on a projection plane. this is the"multidimenensional case"

2. lat(sample), lon(sample), altitude(sample) might be a coordinatesystem for variable O3(sample). this is the "1D trajectory" case.

So, what i came up with is that a coordinate system for avariable/dataset is a collection of "coordinate axes" which can have anydimensionality, but whose dimensions must all appear in the set ofdimensions used by the variable/dataset. Adding this info to thedataspace is exactly right.

Because the common case is that all or most of the variables/datasets ina file use the same coordinate system, its nice to factor thisinformation out. So if the dataspace can be shared and the coordinatesystem can be associated with the dataspace, that would be party timemost excellent.

BTW, a mathematical formulation behind this (a little out of date butuseful if you like formalisms) is at

   http://www.unidata.ucar.edu/staff/caron/papers/CoordMath.htm

theres still one piece that you *might* want to tackle. the above is aframework for general coordinate systems. our users generally wantgeoreferencing coordinate systems. this involves identifying which ofthe coordinate axes correspond to the x,y,z, and t coordinates. this canbe a big can of worms, eg is youve ever looked at GIS specs, they arecomplex. We have developed a set of very simple specs that so far havesatisfied most of our datasets, using "attribute conventions" outsideany explicit library support. I can understand if you dont want to addany more complications. However I will say that IMHO gettinggeoreferencing coodinate systems clearly specified (ie not having to useattribute Conventions) would be a huge win for our communities, and onethats really doable.

Follow-Ups:
- Re: Preliminary HDF5 Dimension documents
  - From: Quincey Koziol

References:
- Re: Preliminary HDF5 Dimension documents
  - From: Quincey Koziol