Skip to content

pyop3#3318

Draft
connorjward wants to merge 878 commits into
mainfrom
connorjward/pyop3
Draft

pyop3#3318
connorjward wants to merge 878 commits into
mainfrom
connorjward/pyop3

Conversation

@connorjward
Copy link
Copy Markdown
Contributor

@connorjward connorjward commented Jan 11, 2024

It's like changing the engine of a car while it's still running

Replace PyOP2 with pyop3.

Thesis link

Summary

Data structures for unstructured meshes are much more complicated than N-dimensional arrays. Firedrake therefore does an awful lot of very involved bookkeeping to make sure that things work.

The key contribution of pyop3 is a new abstraction for describing data layouts, called axis trees. Axis trees are able to describe mesh data layouts completely from a high-level specification. This has the dual set of benefits of: (a) being able to delete a lot of complex Firedrake code, and (b) we can now express a whole suite of algorithms that would otherwise be very difficult to write.

You can think of pyop3 as 'a JIT-compiled numpy for unstructured mesh data'.

pyop3

Core concepts

Expected benefits

step towards:

  • vector real
  • unifying coordinates with PETSc
  • real + mataij
  • clever layouts

One substantial benefit is the reduction in technical debt and code complexity. Many complex pieces of preexisting Firedrake code have been simplified and moved into a single place. This reduces special-casing throughout Firedrake, making everything more composable!

Key changes

  • PyOP2 is gone
  • Extruded meshes are now just DMPlexs

API changes visible in Firedrake

Most of the PyOP2 API has changed

Some of the Dat and Mat API is consistent with before (in particular .dat.data) but naturally a substantial amount of it is different. Also a PyOP2 Global is now a pyop3 Scalar and it admits only a single scalar value. If one previously had an op2.Global with shape then an op3.Dat should be used.

Drop support for variable layer extruded meshes

Users should use a cell submesh instead to get the right behaviour.

Explanation

The functionality appeared not to be used by the community (e.g. thetisproject/thetis#405) and porting the functionality would have been a large amount of work.

.sub() expects a tuple instead of integer for tensor spaces.

For example, to get the (1, 1) component of a 2x2 tensor function space one should now pass f.sub((1, 1)) instead of f.sub(3). The latter will continue to work for now but a warning is raised.

Ensemble API for nonblocking communication returns a single request instead of a list of requests

This is because we no longer need to emit a request per subfunction.

mesh.num_cells is now a property not a method

mesh.interior_facets and mesh.exterior_facets are now AxisTrees, not _Facets

@connorjward connorjward changed the title pyop3 (WIP) pyop3 Mar 21, 2024
--global and others added 27 commits January 16, 2026 16:23
I've observed that the cell-node list is different for degree>=3 which
might be causing issues. Therefore I'm trying to apply the orientations
to every pack operation.
Key is that the heuristics were local and so the materialisation steps
would diverge. We now have compute-rank-0-then-bcast strategy that works
and is hopefully not too expensive.
This is tricksy because it means that we apply bizarre permutations for
simplex and quad meshes that we don't strictly need to, but it's
consistent with the current approach.
Sheesh that was difficult! Things still appear to be very slow.
Turns out that my orientations arrays were the wrong way around all
along. That may well have been causing the issues.
Now we can discover the top and bottom boundary nodes much much more
nicely.
The memory growth issue is now fixed. I had to redo how we instrument
things because the previous approach led to an unresolvable reference
cycle. Also the heavy cache stuff is now better than before.
connorjward and others added 30 commits May 6, 2026 17:22
Takeaway is to be careful materialising things.
* introduce device.py and set up branch

* const parameter for host device

* include gpu demo and update petsc version as 3.24.5 misaligned for PCPatch*ExteriorFacets

* noting areas for change

* introducing context variable and lazy cupy

* lazy evaluation of arrays

* passes basic script functionality

* tofix: revised approach ensuring explicit choice of GPU device

* implicit transfer and defaultdict implementation (pub/sub eager copy model)

* explicit check if GPU available on init

* cudagpu and fix incoming re: remove eager copying/register & dev syncing

* move conversion logic to device.py

* managing buffer duplicate

* cleanup unnecessary todos/notes

* removing notes and cleaning

* fix: added copy to avoid weak reference

- new state object (int -> dict) leads the reassignment being a weak
  reference.

* test: data_wo access works in context

* add flatten from prev logic

* context function as global function and def state

- defaultdict gives -1 as default value if device object does not exist

* fix: maintaining constant array property

- constant property was lost between cupy/numpy conversions
- fixed by passing kwarg that is disregarded for cupy but used for numpy

* pr review: removing unused variables

* remove dispatch to allow no-import cupy

* pr: fix property, duplicate, init

- using @Property for last_updated_device, known by state, does not need
  to be variable.
- duplicate method only copies most up-to-date copy and non-copy
  duplicate only copies for current device
- initialisation adds None as optional for data input

* fix: change petsc config version to v3.25.0

- v3.24.5 was failing to compile petsc4py due to PETSc no support for
  PCPatchSetComputeFunctionExteriorFacets

* include cupy callable pointer

* maintain CPU SF comms and cond exec with cupy

- Wrappped all SF comms with on_host decorator to ensure happens on host
    - This increases number of copies but all MPI happening on host atm
- Executable requires receiving a pointer to array. Providing a
  conditional singledispatch register - in case user does not have cupy.

* basic GPU unit tests covering context

* test cases for gpu - no fixtures due to bug

* fixed bug by amending record_modified state

bug regarding:
    if an array is modified as the first action when offloaded:
    1. state for new device is updated and incremented (as
       modifying)
    2. then buffer attempts to sync from most up-to-date device
    3. this may be current device as state was incremented first
    4. then device tries to copy non-existent array to itself - bad
solution:
    adapt record-modified decorator to increment state after wrapped
    function
    - simple try-finally wrapped trick

* tests and fix: cached properties on gpu

basic unit tests introduced for gpu context as ground truth

bug: if AxisTree for FunctionSpace (or sim. object) not cached
until on device, it attempts to `compile` - this is not implemented for
GPU yet.

fix: this has been fixed with some patches for now, to be removed.
- `firedrake/functionspaceimpl::make_dat` is wrapped in on_host
- `pyop3/tree/axis_tree/tree.py::_buffer_indices` is wrapped in on_host
- `pyop3/buffer.py` has necessary sync in duplicate (in scenario that
  user assigns and immediately copies)

With current fix, this does mean that if the AxisTree properties are not cached,
the assign operation will happen on CPU - consequent calls work on GPU.

* cleaning comments

* pr: resolving comments and adding docstrings

* pr: clean tests and remove gpu demo

* removing gpu demo

* more descriptive docstring

* pr: changes to remove _data for get_array

---------

Co-authored-by: SamSJackson <s2020962@ed.ac.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants