pyop3 by connorjward · Pull Request #3318 · firedrakeproject/firedrake

connorjward · 2024-01-11T15:51:19Z

It's like changing the engine of a car while it's still running

Replace PyOP2 with pyop3.

Summary

Data structures for unstructured meshes are much more complicated than N-dimensional arrays. Firedrake therefore does an awful lot of very involved bookkeeping to make sure that things work.

The key contribution of pyop3 is a new abstraction for describing data layouts, called axis trees. Axis trees are able to describe mesh data layouts completely from a high-level specification. This has the dual set of benefits of: (a) being able to delete a lot of complex Firedrake code, and (b) we can now express a whole suite of algorithms that would otherwise be very difficult to write.

You can think of pyop3 as 'a JIT-compiled numpy for unstructured mesh data'.

pyop3

Core concepts

Expected benefits

step towards:

vector real
unifying coordinates with PETSc
real + mataij
clever layouts

One substantial benefit is the reduction in technical debt and code complexity. Many complex pieces of preexisting Firedrake code have been simplified and moved into a single place. This reduces special-casing throughout Firedrake, making everything more composable!

Key changes

PyOP2 is gone
Extruded meshes are now just DMPlexs

API changes visible in Firedrake

Most of the PyOP2 API has changed

Some of the Dat and Mat API is consistent with before (in particular .dat.data) but naturally a substantial amount of it is different. Also a PyOP2 Global is now a pyop3 Scalar and it admits only a single scalar value. If one previously had an op2.Global with shape then an op3.Dat should be used.

Drop support for variable layer extruded meshes

Users should use a cell submesh instead to get the right behaviour.

Explanation

The functionality appeared not to be used by the community (e.g. thetisproject/thetis#405) and porting the functionality would have been a large amount of work.

`.sub()` expects a tuple instead of integer for tensor spaces.

For example, to get the (1, 1) component of a 2x2 tensor function space one should now pass f.sub((1, 1)) instead of f.sub(3). The latter will continue to work for now but a warning is raised.

Ensemble API for nonblocking communication returns a single request instead of a list of requests

This is because we no longer need to emit a request per subfunction.

`mesh.num_cells` is now a property not a method

`mesh.interior_facets` and `mesh.exterior_facets` are now `AxisTrees`, not `_Facets`

…elop

I've observed that the cell-node list is different for degree>=3 which might be causing issues. Therefore I'm trying to apply the orientations to every pack operation.

Key is that the heuristics were local and so the materialisation steps would diverge. We now have compute-rank-0-then-bcast strategy that works and is hopefully not too expensive.

This is tricksy because it means that we apply bizarre permutations for simplex and quad meshes that we don't strictly need to, but it's consistent with the current approach.

Sheesh that was difficult! Things still appear to be very slow.

Turns out that my orientations arrays were the wrong way around all along. That may well have been causing the issues.

Now we can discover the top and bottom boundary nodes much much more nicely.

The memory growth issue is now fixed. I had to redo how we instrument things because the previous approach led to an unresolvable reference cycle. Also the heavy cache stuff is now better than before.

Takeaway is to be careful materialising things.

* introduce device.py and set up branch * const parameter for host device * include gpu demo and update petsc version as 3.24.5 misaligned for PCPatch*ExteriorFacets * noting areas for change * introducing context variable and lazy cupy * lazy evaluation of arrays * passes basic script functionality * tofix: revised approach ensuring explicit choice of GPU device * implicit transfer and defaultdict implementation (pub/sub eager copy model) * explicit check if GPU available on init * cudagpu and fix incoming re: remove eager copying/register & dev syncing * move conversion logic to device.py * managing buffer duplicate * cleanup unnecessary todos/notes * removing notes and cleaning * fix: added copy to avoid weak reference - new state object (int -> dict) leads the reassignment being a weak reference. * test: data_wo access works in context * add flatten from prev logic * context function as global function and def state - defaultdict gives -1 as default value if device object does not exist * fix: maintaining constant array property - constant property was lost between cupy/numpy conversions - fixed by passing kwarg that is disregarded for cupy but used for numpy * pr review: removing unused variables * remove dispatch to allow no-import cupy * pr: fix property, duplicate, init - using @Property for last_updated_device, known by state, does not need to be variable. - duplicate method only copies most up-to-date copy and non-copy duplicate only copies for current device - initialisation adds None as optional for data input * fix: change petsc config version to v3.25.0 - v3.24.5 was failing to compile petsc4py due to PETSc no support for PCPatchSetComputeFunctionExteriorFacets * include cupy callable pointer * maintain CPU SF comms and cond exec with cupy - Wrappped all SF comms with on_host decorator to ensure happens on host - This increases number of copies but all MPI happening on host atm - Executable requires receiving a pointer to array. Providing a conditional singledispatch register - in case user does not have cupy. * basic GPU unit tests covering context * test cases for gpu - no fixtures due to bug * fixed bug by amending record_modified state bug regarding: if an array is modified as the first action when offloaded: 1. state for new device is updated and incremented (as modifying) 2. then buffer attempts to sync from most up-to-date device 3. this may be current device as state was incremented first 4. then device tries to copy non-existent array to itself - bad solution: adapt record-modified decorator to increment state after wrapped function - simple try-finally wrapped trick * tests and fix: cached properties on gpu basic unit tests introduced for gpu context as ground truth bug: if AxisTree for FunctionSpace (or sim. object) not cached until on device, it attempts to `compile` - this is not implemented for GPU yet. fix: this has been fixed with some patches for now, to be removed. - `firedrake/functionspaceimpl::make_dat` is wrapped in on_host - `pyop3/tree/axis_tree/tree.py::_buffer_indices` is wrapped in on_host - `pyop3/buffer.py` has necessary sync in duplicate (in scenario that user assigns and immediately copies) With current fix, this does mean that if the AxisTree properties are not cached, the assign operation will happen on CPU - consequent calls work on GPU. * cleaning comments * pr: resolving comments and adding docstrings * pr: clean tests and remove gpu demo * removing gpu demo * more descriptive docstring * pr: changes to remove _data for get_array --------- Co-authored-by: SamSJackson <s2020962@ed.ac.uk>

…ard/pyop3-develop

…elop

connorjward mentioned this pull request Jan 11, 2024

Simple fix for pyop3 firedrakeproject/tsfc#305

Merged

connorjward changed the title ~~pyop3 (WIP)~~ pyop3 Mar 21, 2024

connorjward added the pyop3 label May 8, 2024

--global and others added 27 commits January 16, 2026 16:23

Merge remote-tracking branch 'origin/main' into connorjward/pyop3-dev…

5e3a08f

…elop

Fix slate test

ef594cc

Small fixes for multigrid, patch failing

7cd87f2

Try to fix grid transfer

adc51f0

I've observed that the cell-node list is different for degree>=3 which might be causing issues. Therefore I'm trying to apply the orientations to every pack operation.

Fix entity orientations

6550607

Fix multigrid again, issue was temp sizes

a053140

Fix multigrid in parallel

d24c41a

Key is that the heuristics were local and so the materialisation steps would diverge. We now have compute-rank-0-then-bcast strategy that works and is hopefully not too expensive.

Fix extrusion tests

ad1ec3a

Traversal fixes in preparation for parallel

304ce92

Only permute DoFs if entity_permutations is defined

dc11a5b

This is tricksy because it means that we apply bizarre permutations for simplex and quad meshes that we don't strictly need to, but it's consistent with the current approach.

big comment

6c794c5

Parallel grid transfer now works

aa68065

Sheesh that was difficult! Things still appear to be very slow.

Fixup for orienting matrices

c0ee64e

Make debug mode sufficiently performant to be usable

39e1d64

A lot of caching improvements, still seeing a leak

fdc6269

Extrusion fixes

aa9ed9d

Move debug code

0ede752

extrusion wip

4a05b7d

Progress on getting things to work

c1d0012

Turns out that my orientations arrays were the wrong way around all along. That may well have been causing the issues.

working ish

af5b097

It appears to work

48d22ce

Big comment

894a926

Fix boundary condition application for extruded meshes

ed03bf4

Now we can discover the top and bottom boundary nodes much much more nicely.

Fix the memory leak, cleanup needed

7f8a52a

Fix caching

6d49580

The memory growth issue is now fixed. I had to redo how we instrument things because the previous approach led to an unresolvable reference cycle. Also the heavy cache stuff is now better than before.

Refactor config

9414085

Make heavy caches into context managers

5e736a6

connorjward and others added 30 commits May 6, 2026 17:22

It now runs but convergence is v bad

c10b43a

RFS fixes, now just failing assign and mixed

53b3363

Takeaway is to be careful materialising things.

fix for mixed bcs

6f541f0

Fix lgmap a little

cf3f83b

Redo assign

3d6c177

Fixes for RFS

59055d5

hacky fix for fieldsplit

a648ee7

only one rfs test failing

8934202

found the bug, bad node ordering for mg

7e65be8

rfs+mg fix

8ed3511

Move lgmaps code into generic location

b50dccd

pmg fix for lgmaps

30d2cbe

fixups, heading home

9406269

Caching fix

5dfaf75

Slate fixes, nothing I can't handle

83a0ee6

Fix slate errs, buffers need a shape

fa5fe59

SLATE fixes, more to come

a993af2

Fix for RT parloops, gah

947e840

Only a small handful of SLATE tests now failing

8fd8987

Fix for Real+PETSc

cb9e4ed

Real fixes

a86baef

Fix for .sub()

ab702f7

Merge remote-tracking branch 'origin/connorjward/pyop3' into connorjw…

62949d2

…ard/pyop3-develop

Merge remote-tracking branch 'origin/main' into connorjward/pyop3-dev…

0e58f19

…elop

Post merge fixes and start on MATIS

d567270

matis fix

eac26c8

wip

435885a

skip parallel tests for the moment

8723e58

fixup

c9b92a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyop3#3318

pyop3#3318
connorjward wants to merge 878 commits into
mainfrom
connorjward/pyop3

connorjward commented Jan 11, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

connorjward commented Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

pyop3

Core concepts

Expected benefits

Key changes

API changes visible in Firedrake

Most of the PyOP2 API has changed

Drop support for variable layer extruded meshes

.sub() expects a tuple instead of integer for tensor spaces.

Ensemble API for nonblocking communication returns a single request instead of a list of requests

mesh.num_cells is now a property not a method

mesh.interior_facets and mesh.exterior_facets are now AxisTrees, not _Facets

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

connorjward commented Jan 11, 2024 •

edited

Loading

`.sub()` expects a tuple instead of integer for tensor spaces.

`mesh.num_cells` is now a property not a method

`mesh.interior_facets` and `mesh.exterior_facets` are now `AxisTrees`, not `_Facets`