pyop3#3318
Draft
connorjward wants to merge 878 commits into
Draft
Conversation
I've observed that the cell-node list is different for degree>=3 which might be causing issues. Therefore I'm trying to apply the orientations to every pack operation.
Key is that the heuristics were local and so the materialisation steps would diverge. We now have compute-rank-0-then-bcast strategy that works and is hopefully not too expensive.
This is tricksy because it means that we apply bizarre permutations for simplex and quad meshes that we don't strictly need to, but it's consistent with the current approach.
Sheesh that was difficult! Things still appear to be very slow.
Turns out that my orientations arrays were the wrong way around all along. That may well have been causing the issues.
Now we can discover the top and bottom boundary nodes much much more nicely.
The memory growth issue is now fixed. I had to redo how we instrument things because the previous approach led to an unresolvable reference cycle. Also the heavy cache stuff is now better than before.
Takeaway is to be careful materialising things.
* introduce device.py and set up branch * const parameter for host device * include gpu demo and update petsc version as 3.24.5 misaligned for PCPatch*ExteriorFacets * noting areas for change * introducing context variable and lazy cupy * lazy evaluation of arrays * passes basic script functionality * tofix: revised approach ensuring explicit choice of GPU device * implicit transfer and defaultdict implementation (pub/sub eager copy model) * explicit check if GPU available on init * cudagpu and fix incoming re: remove eager copying/register & dev syncing * move conversion logic to device.py * managing buffer duplicate * cleanup unnecessary todos/notes * removing notes and cleaning * fix: added copy to avoid weak reference - new state object (int -> dict) leads the reassignment being a weak reference. * test: data_wo access works in context * add flatten from prev logic * context function as global function and def state - defaultdict gives -1 as default value if device object does not exist * fix: maintaining constant array property - constant property was lost between cupy/numpy conversions - fixed by passing kwarg that is disregarded for cupy but used for numpy * pr review: removing unused variables * remove dispatch to allow no-import cupy * pr: fix property, duplicate, init - using @Property for last_updated_device, known by state, does not need to be variable. - duplicate method only copies most up-to-date copy and non-copy duplicate only copies for current device - initialisation adds None as optional for data input * fix: change petsc config version to v3.25.0 - v3.24.5 was failing to compile petsc4py due to PETSc no support for PCPatchSetComputeFunctionExteriorFacets * include cupy callable pointer * maintain CPU SF comms and cond exec with cupy - Wrappped all SF comms with on_host decorator to ensure happens on host - This increases number of copies but all MPI happening on host atm - Executable requires receiving a pointer to array. Providing a conditional singledispatch register - in case user does not have cupy. * basic GPU unit tests covering context * test cases for gpu - no fixtures due to bug * fixed bug by amending record_modified state bug regarding: if an array is modified as the first action when offloaded: 1. state for new device is updated and incremented (as modifying) 2. then buffer attempts to sync from most up-to-date device 3. this may be current device as state was incremented first 4. then device tries to copy non-existent array to itself - bad solution: adapt record-modified decorator to increment state after wrapped function - simple try-finally wrapped trick * tests and fix: cached properties on gpu basic unit tests introduced for gpu context as ground truth bug: if AxisTree for FunctionSpace (or sim. object) not cached until on device, it attempts to `compile` - this is not implemented for GPU yet. fix: this has been fixed with some patches for now, to be removed. - `firedrake/functionspaceimpl::make_dat` is wrapped in on_host - `pyop3/tree/axis_tree/tree.py::_buffer_indices` is wrapped in on_host - `pyop3/buffer.py` has necessary sync in duplicate (in scenario that user assigns and immediately copies) With current fix, this does mean that if the AxisTree properties are not cached, the assign operation will happen on CPU - consequent calls work on GPU. * cleaning comments * pr: resolving comments and adding docstrings * pr: clean tests and remove gpu demo * removing gpu demo * more descriptive docstring * pr: changes to remove _data for get_array --------- Co-authored-by: SamSJackson <s2020962@ed.ac.uk>
…ard/pyop3-develop
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace PyOP2 with pyop3.
Thesis link
Summary
Data structures for unstructured meshes are much more complicated than N-dimensional arrays. Firedrake therefore does an awful lot of very involved bookkeeping to make sure that things work.
The key contribution of pyop3 is a new abstraction for describing data layouts, called axis trees. Axis trees are able to describe mesh data layouts completely from a high-level specification. This has the dual set of benefits of: (a) being able to delete a lot of complex Firedrake code, and (b) we can now express a whole suite of algorithms that would otherwise be very difficult to write.
You can think of pyop3 as 'a JIT-compiled numpy for unstructured mesh data'.
pyop3
Core concepts
Expected benefits
step towards:
One substantial benefit is the reduction in technical debt and code complexity. Many complex pieces of preexisting Firedrake code have been simplified and moved into a single place. This reduces special-casing throughout Firedrake, making everything more composable!
Key changes
API changes visible in Firedrake
Most of the PyOP2 API has changed
Some of the
DatandMatAPI is consistent with before (in particular.dat.data) but naturally a substantial amount of it is different. Also a PyOP2Globalis now a pyop3Scalarand it admits only a single scalar value. If one previously had anop2.Globalwith shape then anop3.Datshould be used.Drop support for variable layer extruded meshes
Users should use a cell submesh instead to get the right behaviour.
Explanation
The functionality appeared not to be used by the community (e.g. thetisproject/thetis#405) and porting the functionality would have been a large amount of work.
.sub()expects a tuple instead of integer for tensor spaces.For example, to get the
(1, 1)component of a 2x2 tensor function space one should now passf.sub((1, 1))instead off.sub(3). The latter will continue to work for now but a warning is raised.Ensemble API for nonblocking communication returns a single request instead of a list of requests
This is because we no longer need to emit a request per subfunction.
mesh.num_cellsis now a property not a methodmesh.interior_facetsandmesh.exterior_facetsare nowAxisTrees, not_Facets