Hi Ed,
> Quincey Koziol <koziol@xxxxxxxxxxxxx> writes:
>
> > From HDF5's perspective, you have to use H5Pset_fapl_<foo>(params) to
> > choose to use a particular file driver to access a file. Probably something
> > like this should be exported/translated out to the netCDF4 layer for users
> > to
> > choose which driver to access the file with.
> > Here's the URL for the parallel HDF5 info currently:
> > http://hdf.ncsa.uiuc.edu/HDF5/PHDF5/
>
> I'm seeing three steps to parallel HDF5:
>
> 1 - Initialize MPI
> 2 - When opening/creating the file, set a property in file access
> properties.
> 3 - Every time reading or writing file, pass a correctly set transfer
> property.
I'm assuming you mean reading/writing "raw" data.
> Does that seem to sum it up?
That's some of it. You also have to make certain that the functions
listed below are called correctly.
> But I see below that you are also asking that "these properties must
> be set to the same values when they > are used in a parallel program,"
>
> What do you mean by that?
You can't have half the processes set a property to one value and the other
half set the same property to a different value. (i.e. everybody must agree
that the userblock is 512 bytes, for example :-)
> In parallel I/O do multiple processes try and create the file? Or does
> one create it, and the rest just open it? Sorry if that seems like a
> dumb question!
In MPI-I/O, file creation is a collective operation, so all the processes
participate in the create (from our perspective at least, I don't know how it
happens internally in the MPI-I/O library).
You are going to have fun learning how to do parallel programming with
MPI - think of it as multi-threaded programs with bad debugging support... :-/
Quincey
> > > For reading, what does this mean to the API, if anything?
> > Well, I've appended a list of HDF5 API functions that are required to be
> > performed collectively to the bottom of this document (I can't find the link
> > on our web-pages).
> >
> > > Everyone gets to open the file read-only, and read from it to their
> > > heart's content, confident that they are getting the most recent data
> > > at that moment. That requires no API changes.
> > >
> > > Is that it for readers? Or do they get some special additional
> > > features, like notification of data arrival, etc?
> > User's would also need the option to choose to use collective or
> > independent I/O when reading or writing data to the file. That reminds me -
> > are y'all planning on adding any wrappers to the H5P* routines in HDF5 which
> > set/get various properties for objects?
>
> This is truly an important question that I will treat in it's own
> email thread...
>
>
> >
> > Quincey
> >
> > ==============================================================
> >
> > Collective functions:
> > H5Aclose (2)
> > H5Acreate
> > H5Adelete
> > H5Aiterate
> > H5Aopen_idx
> > H5Aopen_name
> > H5Aread (6)
> > H5Arename (A)
> > H5Awrite (3)
> >
> > H5Dclose (2)
> > H5Dcreate
> > H5Dfill (6) (A)
> > H5Dopen
> > H5Dextend (5)
> > H5Dset_extent (5) (A)
> >
> > H5Fclose (1)
> > H5Fcreate
> > H5Fflush
> > H5Fmount
> > H5Fopen
> > H5Funmount
> >
> > H5Gclose (2)
> > H5Gcreate
> > H5Giterate
> > H5Glink
> > H5Glink2 (A)
> > H5Gmove
> > H5Gmove2 (A)
> > H5Gopen
> > H5Gset_comment
> > H5Gunlink
> >
> > H5Idec_ref (7) (A)
> > H5Iget_file_id (B)
> > H5Iinc_ref (7) (A)
> >
> > H5Pget_fill_value (6)
> >
> > H5Rcreate
> > H5Rdereference
> >
> > H5Tclose (4)
> > H5Tcommit
> > H5Topen
> >
> > Additionally, these properties must be set to the same values when they
> > are used in a parallel program:
> > File Creation Properties:
> > H5Pset_userblock
> > H5Pset_sizes
> > H5Pset_sym_k
> > H5Pset_istore_k
> >
> > File Access Properties:
> > H5Pset_fapl_mpio
> > H5Pset_meta_block_size
> > H5Pset_small_data_block_size
> > H5Pset_alignment
> > H5Pset_cache
> > H5Pset_gc_references
> >
> > Dataset Creation Properties:
> > H5Pset_layout
> > H5Pset_chunk
> > H5Pset_fill_value
> > H5Pset_deflate
> > H5Pset_shuffle
> >
> > Dataset Access Properties:
> > H5Pset_buffer
> > H5Pset_preserve
> > H5Pset_hyper_cache
> > H5Pset_btree_ratios
> > H5Pset_dxpl_mpio
> >
> > Notes:
> > (1) - All the processes must participate only if this is the last
> > reference to the file ID.
> > (2) - All the processes must participate only if all the file IDs
> > for
> > a file have been closed and this is the last outstanding object
> > ID.
> > (3) - Because the raw data for an attribute is cached locally, all
> > processes must participate in order to guarantee that future
> > H5Aread calls return the correct results on all processes.
> > (4) - All processes must participate only if the datatype is for a
> > committed datatype, all the file IDs for the file have been
> > closed
> > and this is the last outstanding object ID.
> > (5) - All processes must participate only if the number of chunks in
> > the dataset actually changes.
> > (6) - All processes must participate only if the datatype of the
> > attribute a a variable-length datatype (sequence or string).
> > (7) - This function may be called independently if the object ID
> > does
> > not refer to an object that was collectively opened.
> >
> > (A) - Available only in v1.6 or later versions of the library.
> > (B) - Available only in v1.7 or later versions of the library.
>