Hi Ed,
> Howdy all!
>
> Here's what we have in terms of requirements for Parallel I/O:
>
> Parallel I/O
>
> * Parallel I/O reading and writing to netCDF file is supported.
> * The parallel I/O features require that the MPI library be
> installed.
>
> I think we can all agree that this is a model of terseness!
:-)
> What does it mean to support parallel I/O to a file for reads and
> writes? Feel free to lecture on this topic if anyone is feeling
> loquacious.
From HDF5's perspective, you have to use H5Pset_fapl_<foo>(params) to
choose to use a particular file driver to access a file. Probably something
like this should be exported/translated out to the netCDF4 layer for users to
choose which driver to access the file with.
Here's the URL for the parallel HDF5 info currently:
http://hdf.ncsa.uiuc.edu/HDF5/PHDF5/
> For reading, what does this mean to the API, if anything?
Well, I've appended a list of HDF5 API functions that are required to be
performed collectively to the bottom of this document (I can't find the link
on our web-pages).
> Everyone gets to open the file read-only, and read from it to their
> heart's content, confident that they are getting the most recent data
> at that moment. That requires no API changes.
>
> Is that it for readers? Or do they get some special additional
> features, like notification of data arrival, etc?
User's would also need the option to choose to use collective or
independent I/O when reading or writing data to the file. That reminds me -
are y'all planning on adding any wrappers to the H5P* routines in HDF5 which
set/get various properties for objects?
Quincey
==============================================================
Collective functions:
H5Aclose (2)
H5Acreate
H5Adelete
H5Aiterate
H5Aopen_idx
H5Aopen_name
H5Aread (6)
H5Arename (A)
H5Awrite (3)
H5Dclose (2)
H5Dcreate
H5Dfill (6) (A)
H5Dopen
H5Dextend (5)
H5Dset_extent (5) (A)
H5Fclose (1)
H5Fcreate
H5Fflush
H5Fmount
H5Fopen
H5Funmount
H5Gclose (2)
H5Gcreate
H5Giterate
H5Glink
H5Glink2 (A)
H5Gmove
H5Gmove2 (A)
H5Gopen
H5Gset_comment
H5Gunlink
H5Idec_ref (7) (A)
H5Iget_file_id (B)
H5Iinc_ref (7) (A)
H5Pget_fill_value (6)
H5Rcreate
H5Rdereference
H5Tclose (4)
H5Tcommit
H5Topen
Additionally, these properties must be set to the same values when they
are used in a parallel program:
File Creation Properties:
H5Pset_userblock
H5Pset_sizes
H5Pset_sym_k
H5Pset_istore_k
File Access Properties:
H5Pset_fapl_mpio
H5Pset_meta_block_size
H5Pset_small_data_block_size
H5Pset_alignment
H5Pset_cache
H5Pset_gc_references
Dataset Creation Properties:
H5Pset_layout
H5Pset_chunk
H5Pset_fill_value
H5Pset_deflate
H5Pset_shuffle
Dataset Access Properties:
H5Pset_buffer
H5Pset_preserve
H5Pset_hyper_cache
H5Pset_btree_ratios
H5Pset_dxpl_mpio
Notes:
(1) - All the processes must participate only if this is the last
reference to the file ID.
(2) - All the processes must participate only if all the file IDs for
a file have been closed and this is the last outstanding object ID.
(3) - Because the raw data for an attribute is cached locally, all
processes must participate in order to guarantee that future
H5Aread calls return the correct results on all processes.
(4) - All processes must participate only if the datatype is for a
committed datatype, all the file IDs for the file have been closed
and this is the last outstanding object ID.
(5) - All processes must participate only if the number of chunks in
the dataset actually changes.
(6) - All processes must participate only if the datatype of the
attribute a a variable-length datatype (sequence or string).
(7) - This function may be called independently if the object ID does
not refer to an object that was collectively opened.
(A) - Available only in v1.6 or later versions of the library.
(B) - Available only in v1.7 or later versions of the library.
>From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 16 2004 Jul -0600 09:25:48
Message-ID: <wrx8ydkt1mr.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 16 Jul 2004 09:25:48 -0600
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
In-Reply-To: <20040715150058.E3034@xxxxxxxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Re: parallel I/O and netCDF-4
Received: (from majordo@localhost)
by unidata.ucar.edu (UCAR/Unidata) id i6GFPnsC003360
for netcdf-hdf-out; Fri, 16 Jul 2004 09:25:49 -0600 (MDT)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu
[128.117.140.88])
by unidata.ucar.edu (UCAR/Unidata) with ESMTP id i6GFPmaW003356
for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Fri, 16 Jul 2004 09:25:48 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200407161525.i6GFPmaW003356
References: <wrxllhlcc4i.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
<20040715150058.E3034@xxxxxxxxxxxxxxxxxxxxx>
Lines: 26
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: netcdf-hdf@xxxxxxxxxxxxxxxx
"Robert E. McGrath" <mcgrath@xxxxxxxxxxxxx> writes:
> Ed,
>
> You might want to look at our parallel tutorial. It gives an
> introduction
> to how HDF5 does parallel IO along with an example.
>
> http://hdf.ncsa.uiuc.edu/HDF5/PHDF5/
OK, I've done that. Very interesting...
>
> As far a prerequisites, parallel netcdf 4 will need to have parallel
> hdf5,
> i.e., HDF5 compile with the parallel stuff enabled.
OK, I've added that to the requirements.
Is there an easy way for my program to find out of parallel HDF5 is
installed (like with a function call?)
Thanks!
Ed