Skip to content

Resizable datasets used always #133

@PrometheusPi

Description

@PrometheusPi

It seems as libSplash uses resizable datasets in any case. This might be good for data that might change size but not for data with fixed size (e.g. magnetic field data). Always allowing resizable datasets might cost performance.

For information on resizable datasets see [1].

As an example in PIConGPU see the following h5ls -r *.h5 dump:

/                        Group
/custom                  Group
/data                    Group
/data/2000               Group
/data/2000/fields        Group
/data/2000/fields/Density_e Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/Density_i Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/EnergyDensity_e Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/EnergyDensity_i Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldB Group
/data/2000/fields/FieldB/x Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldB/y Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldB/z Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldE Group
/data/2000/fields/FieldE/x Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldE/y Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/fields/FieldE/z Dataset {12/Inf, 512/Inf, 192/Inf}
/data/2000/particles     Group
/data/2000/particles/e   Group
/data/2000/particles/e/globalCellIdx Group
/data/2000/particles/e/globalCellIdx/x Dataset {29491200/Inf}
/data/2000/particles/e/globalCellIdx/y Dataset {29491200/Inf}
/data/2000/particles/e/globalCellIdx/z Dataset {29491200/Inf}
/data/2000/particles/e/momentum Group
/data/2000/particles/e/momentum/x Dataset {29491200/Inf}
/data/2000/particles/e/momentum/y Dataset {29491200/Inf}
/data/2000/particles/e/momentum/z Dataset {29491200/Inf}
/data/2000/particles/e/momentumPrev1 Group
/data/2000/particles/e/momentumPrev1/x Dataset {29491200/Inf}
/data/2000/particles/e/momentumPrev1/y Dataset {29491200/Inf}
/data/2000/particles/e/momentumPrev1/z Dataset {29491200/Inf}
/data/2000/particles/e/particles_info Dataset {32/Inf}
/data/2000/particles/e/position Group
/data/2000/particles/e/position/x Dataset {29491200/Inf}
/data/2000/particles/e/position/y Dataset {29491200/Inf}
/data/2000/particles/e/position/z Dataset {29491200/Inf}
/data/2000/particles/e/weighting Dataset {29491200/Inf}
/data/2000/particles/i   Group
/data/2000/particles/i/globalCellIdx Group
/data/2000/particles/i/globalCellIdx/x Dataset {29491200/Inf}
/data/2000/particles/i/globalCellIdx/y Dataset {29491200/Inf}
/data/2000/particles/i/globalCellIdx/z Dataset {29491200/Inf}
/data/2000/particles/i/momentum Group
/data/2000/particles/i/momentum/x Dataset {29491200/Inf}
/data/2000/particles/i/momentum/y Dataset {29491200/Inf}
/data/2000/particles/i/momentum/z Dataset {29491200/Inf}
/data/2000/particles/i/momentumPrev1 Group
/data/2000/particles/i/momentumPrev1/x Dataset {29491200/Inf}
/data/2000/particles/i/momentumPrev1/y Dataset {29491200/Inf}
/data/2000/particles/i/momentumPrev1/z Dataset {29491200/Inf}
/data/2000/particles/i/particles_info Dataset {32/Inf}
/data/2000/particles/i/position Group
/data/2000/particles/i/position/x Dataset {29491200/Inf}
/data/2000/particles/i/position/y Dataset {29491200/Inf}
/data/2000/particles/i/position/z Dataset {29491200/Inf}
/data/2000/particles/i/weighting Dataset {29491200/Inf}
/header                  Group

All datasets have the option to become infinitly large (maked by .../Inf).

With (parallel) hdf5 it should be possible to set fixed and arbitary sized datasets.
A python example to illustrate this is given here:

from mpi4py import MPI
import h5py

rank = MPI.COMM_WORLD.rank

print "Hello from processor {}".format(rank)

f = h5py.File('example_dataSize.hdf5', 'w', driver='mpio', comm=MPI.COMM_WORLD)

f.create_dataset('dataset_fixed', (10,5), dtype='f')
f.create_dataset('dataset_variable1', (10,5), maxshape=(10,10), dtype='f')
f.create_dataset('dataset_variable2', (10,5), maxshape=(None,None), dtype='f')

f.close()

The corresponding hdf5 file looks like this when using h5ls -r *.h5:

/                        Group
/dataset_fixed           Dataset {10, 5}
/dataset_variable1       Dataset {10, 5/10}
/dataset_variable2       Dataset {10/Inf, 5/Inf}

Is there a reason to aways use arbitrary sized datasets?

[1] http://docs.h5py.org/en/latest/high/dataset.html#resizable-datasets

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions