It seems as libSplash uses resizable datasets in any case. This might be good for data that might change size but not for data with fixed size (e.g. magnetic field data). Always allowing resizable datasets might cost performance.
For information on resizable datasets see [1].
As an example in PIConGPU see the following h5ls -r *.h5 dump:
/ Group
/custom Group
/data Group
/data/2000 Group
/data/2000/fields Group
/data/2000/fields/Density_e Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/Density_i Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/EnergyDensity_e Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/EnergyDensity_i Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldB Group
/data/2000/fields/FieldB/x Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldB/y Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldB/z Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldE Group
/data/2000/fields/FieldE/x Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldE/y Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldE/z Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/particles Group
/data/2000/particles/e Group
/data/2000/particles/e/globalCellIdx Group
/data/2000/particles/e/globalCellIdx/x Dataset {29491200/Inf}
/data/2000/particles/e/globalCellIdx/y Dataset {29491200/Inf}
/data/2000/particles/e/globalCellIdx/z Dataset {29491200/Inf}
/data/2000/particles/e/momentum Group
/data/2000/particles/e/momentum/x Dataset {29491200/Inf}
/data/2000/particles/e/momentum/y Dataset {29491200/Inf}
/data/2000/particles/e/momentum/z Dataset {29491200/Inf}
/data/2000/particles/e/momentumPrev1 Group
/data/2000/particles/e/momentumPrev1/x Dataset {29491200/Inf}
/data/2000/particles/e/momentumPrev1/y Dataset {29491200/Inf}
/data/2000/particles/e/momentumPrev1/z Dataset {29491200/Inf}
/data/2000/particles/e/particles_info Dataset {32/Inf}
/data/2000/particles/e/position Group
/data/2000/particles/e/position/x Dataset {29491200/Inf}
/data/2000/particles/e/position/y Dataset {29491200/Inf}
/data/2000/particles/e/position/z Dataset {29491200/Inf}
/data/2000/particles/e/weighting Dataset {29491200/Inf}
/data/2000/particles/i Group
/data/2000/particles/i/globalCellIdx Group
/data/2000/particles/i/globalCellIdx/x Dataset {29491200/Inf}
/data/2000/particles/i/globalCellIdx/y Dataset {29491200/Inf}
/data/2000/particles/i/globalCellIdx/z Dataset {29491200/Inf}
/data/2000/particles/i/momentum Group
/data/2000/particles/i/momentum/x Dataset {29491200/Inf}
/data/2000/particles/i/momentum/y Dataset {29491200/Inf}
/data/2000/particles/i/momentum/z Dataset {29491200/Inf}
/data/2000/particles/i/momentumPrev1 Group
/data/2000/particles/i/momentumPrev1/x Dataset {29491200/Inf}
/data/2000/particles/i/momentumPrev1/y Dataset {29491200/Inf}
/data/2000/particles/i/momentumPrev1/z Dataset {29491200/Inf}
/data/2000/particles/i/particles_info Dataset {32/Inf}
/data/2000/particles/i/position Group
/data/2000/particles/i/position/x Dataset {29491200/Inf}
/data/2000/particles/i/position/y Dataset {29491200/Inf}
/data/2000/particles/i/position/z Dataset {29491200/Inf}
/data/2000/particles/i/weighting Dataset {29491200/Inf}
/header Group
All datasets have the option to become infinitly large (maked by .../Inf).
With (parallel) hdf5 it should be possible to set fixed and arbitary sized datasets.
A python example to illustrate this is given here:
from mpi4py import MPI
import h5py
rank = MPI.COMM_WORLD.rank
print "Hello from processor {}".format(rank)
f = h5py.File('example_dataSize.hdf5', 'w', driver='mpio', comm=MPI.COMM_WORLD)
f.create_dataset('dataset_fixed', (10,5), dtype='f')
f.create_dataset('dataset_variable1', (10,5), maxshape=(10,10), dtype='f')
f.create_dataset('dataset_variable2', (10,5), maxshape=(None,None), dtype='f')
f.close()
The corresponding hdf5 file looks like this when using h5ls -r *.h5:
/ Group
/dataset_fixed Dataset {10, 5}
/dataset_variable1 Dataset {10, 5/10}
/dataset_variable2 Dataset {10/Inf, 5/Inf}
Is there a reason to aways use arbitrary sized datasets?
[1] http://docs.h5py.org/en/latest/high/dataset.html#resizable-datasets
It seems as libSplash uses resizable datasets in any case. This might be good for data that might change size but not for data with fixed size (e.g. magnetic field data). Always allowing resizable datasets might cost performance.
For information on resizable datasets see [1].
As an example in PIConGPU see the following
h5ls -r *.h5dump:/ Group /custom Group /data Group /data/2000 Group /data/2000/fields Group /data/2000/fields/Density_e Dataset {12/Inf, 512/Inf, 192/Inf} /data/2000/fields/Density_i Dataset {12/Inf, 512/Inf, 192/Inf} /data/2000/fields/EnergyDensity_e Dataset {12/Inf, 512/Inf, 192/Inf} /data/2000/fields/EnergyDensity_i Dataset {12/Inf, 512/Inf, 192/Inf} /data/2000/fields/FieldB Group /data/2000/fields/FieldB/x Dataset {12/Inf, 512/Inf, 192/Inf} /data/2000/fields/FieldB/y Dataset {12/Inf, 512/Inf, 192/Inf} /data/2000/fields/FieldB/z Dataset {12/Inf, 512/Inf, 192/Inf} /data/2000/fields/FieldE Group /data/2000/fields/FieldE/x Dataset {12/Inf, 512/Inf, 192/Inf} /data/2000/fields/FieldE/y Dataset {12/Inf, 512/Inf, 192/Inf} /data/2000/fields/FieldE/z Dataset {12/Inf, 512/Inf, 192/Inf} /data/2000/particles Group /data/2000/particles/e Group /data/2000/particles/e/globalCellIdx Group /data/2000/particles/e/globalCellIdx/x Dataset {29491200/Inf} /data/2000/particles/e/globalCellIdx/y Dataset {29491200/Inf} /data/2000/particles/e/globalCellIdx/z Dataset {29491200/Inf} /data/2000/particles/e/momentum Group /data/2000/particles/e/momentum/x Dataset {29491200/Inf} /data/2000/particles/e/momentum/y Dataset {29491200/Inf} /data/2000/particles/e/momentum/z Dataset {29491200/Inf} /data/2000/particles/e/momentumPrev1 Group /data/2000/particles/e/momentumPrev1/x Dataset {29491200/Inf} /data/2000/particles/e/momentumPrev1/y Dataset {29491200/Inf} /data/2000/particles/e/momentumPrev1/z Dataset {29491200/Inf} /data/2000/particles/e/particles_info Dataset {32/Inf} /data/2000/particles/e/position Group /data/2000/particles/e/position/x Dataset {29491200/Inf} /data/2000/particles/e/position/y Dataset {29491200/Inf} /data/2000/particles/e/position/z Dataset {29491200/Inf} /data/2000/particles/e/weighting Dataset {29491200/Inf} /data/2000/particles/i Group /data/2000/particles/i/globalCellIdx Group /data/2000/particles/i/globalCellIdx/x Dataset {29491200/Inf} /data/2000/particles/i/globalCellIdx/y Dataset {29491200/Inf} /data/2000/particles/i/globalCellIdx/z Dataset {29491200/Inf} /data/2000/particles/i/momentum Group /data/2000/particles/i/momentum/x Dataset {29491200/Inf} /data/2000/particles/i/momentum/y Dataset {29491200/Inf} /data/2000/particles/i/momentum/z Dataset {29491200/Inf} /data/2000/particles/i/momentumPrev1 Group /data/2000/particles/i/momentumPrev1/x Dataset {29491200/Inf} /data/2000/particles/i/momentumPrev1/y Dataset {29491200/Inf} /data/2000/particles/i/momentumPrev1/z Dataset {29491200/Inf} /data/2000/particles/i/particles_info Dataset {32/Inf} /data/2000/particles/i/position Group /data/2000/particles/i/position/x Dataset {29491200/Inf} /data/2000/particles/i/position/y Dataset {29491200/Inf} /data/2000/particles/i/position/z Dataset {29491200/Inf} /data/2000/particles/i/weighting Dataset {29491200/Inf} /header GroupAll datasets have the option to become infinitly large (maked by
.../Inf).With (parallel) hdf5 it should be possible to set fixed and arbitary sized datasets.
A python example to illustrate this is given here:
The corresponding hdf5 file looks like this when using
h5ls -r *.h5:/ Group /dataset_fixed Dataset {10, 5} /dataset_variable1 Dataset {10, 5/10} /dataset_variable2 Dataset {10/Inf, 5/Inf}Is there a reason to aways use arbitrary sized datasets?
[1] http://docs.h5py.org/en/latest/high/dataset.html#resizable-datasets