Extend flytekit version hash calculation to be pluggable#2428
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2428 +/- ##
===========================================
+ Coverage 42.77% 72.09% +29.31%
===========================================
Files 185 181 -4
Lines 18677 18397 -280
Branches 2665 3601 +936
===========================================
+ Hits 7990 13264 +5274
+ Misses 10599 4508 -6091
- Partials 88 625 +537 ☔ View full report in Codecov by Sentry. |
8d0645c to
2951085
Compare
| if isinstance(entity, WorkflowBase): | ||
| default_inputs = entity.python_interface.default_inputs_as_kwargs | ||
|
|
||
| from flytekit.configuration.plugin import get_plugin |
There was a problem hiding this comment.
Probably not the ideal place to have an import.
It also seems like the additional_context (which could probably be named a bit more descriptively?) value should be passed into the register_script function, shouldn't it? The pattern of pulling in special state like this makes this code a bit harder to test / reason about I think.
There was a problem hiding this comment.
I use lazy import here to solve the circular import issue (remote imports plugin, plugin imports remote)
There was a problem hiding this comment.
Yeah, I've used this tactic to reduce startup time in Ruby -- but:
- It looks like there's already a
lazy_modulehelper in Flytekit - if this is a tactic for dealing with a potential circular dependency, that can be the sign of a flawed design
- Remote now takes a dependency on this plugin helper method -- it's cleaner if you can invert the behavior so that this function doesn't need to know anything about finding special mutation functions in plugins
There was a problem hiding this comment.
I could rename additional_context to version_hash_additional_context? I wouldn't rename it to task_configs because it does not necessarily contain the task configs, that only happens for a particular use case.
Moving it to a parameter of register_script seems sensible, but for the moment I'm thinking of keeping the PR as very easy to review/not changing too many methods. Will give it a try at a later point.
edit: will rename to version_hash_additional_context
| # For that add the hash of the compilation settings to hash of file | ||
| version = self._version_from_hash( | ||
| md5_bytes, serialization_settings, default_inputs, *_get_image_names(entity) | ||
| md5_bytes, serialization_settings, default_inputs, *_get_image_names(entity), *additional_context |
There was a problem hiding this comment.
Would it make more sense to have pluggable versioning instead? i.e. the plugin defines a custom version function that gets passed md5_bytes, serialization_settings, default_inputs, *_get_image_names(entity) if it exists? Would such a function need a dual?
There was a problem hiding this comment.
My feeling is that core logic i.e. the default version hash logic, should remain in its section, rather than moved to the plugin methods. This mostly follows the existing pattern I noticed in FlytekitPlugin, where core logic is not often moved to its methods – however there are some exceptions, as I suppose with FlytekitPlugin.get_remote
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472
Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
d4cda94 to
4992305
Compare
thomasjpfan
left a comment
There was a problem hiding this comment.
I prefer not to have a new extension point in FlytekitPluginProtocol, if there is an simpler alternative.
For your use case, can your external library provide a FlyteRemote subclass and override _version_from_hash? This FlyteRemote subclass can then be returned by your custom FlytekitPluginProtocol.get_remote.
|
@thomasjpfan Thanks, it's an interesting idea, and I never thought about that possibility! With a modest refactor of To explain more – the parent issue deals with adding |
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472
- This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472 [DOM-68737] Refactor version_hash_additional_context to use bytes (#7) `bytes` is more flexible. This flexibility is relevant when including the serialized input bindings (bytes) in the version hash calculations. This is necessary to fix a bug that can occur: There is a bug where Flyte rejects a valid workflow. This can happen when workflow inputs are read from a file, causing the workflow input bindings to be updated, without an update to the version. Refactor version_hash_additional_context, which currently uses `str`, to use `bytes`
- NOTE: the flytekit 1.16 release changed hash calculation a bit with flyteorg#3247, so this commit is a little different from what was done on the 1.15.4 version of the Domino branch (and the 2 commits for this were squashed into one) - This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472 [DOM-68737] Refactor version_hash_additional_context to use bytes (#7) `bytes` is more flexible. This flexibility is relevant when including the serialized input bindings (bytes) in the version hash calculations. This is necessary to fix a bug that can occur: There is a bug where Flyte rejects a valid workflow. This can happen when workflow inputs are read from a file, causing the workflow input bindings to be updated, without an update to the version. Refactor version_hash_additional_context, which currently uses `str`, to use `bytes`
- NOTE: the flytekit 1.16 release changed hash calculation a bit with flyteorg#3247, so this commit is a little different from what was done on the 1.15.4 version of the Domino branch (and the 2 commits for this were squashed into one) - This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472 [DOM-68737] Refactor version_hash_additional_context to use bytes (#7) `bytes` is more flexible. This flexibility is relevant when including the serialized input bindings (bytes) in the version hash calculations. This is necessary to fix a bug that can occur: There is a bug where Flyte rejects a valid workflow. This can happen when workflow inputs are read from a file, causing the workflow input bindings to be updated, without an update to the version. Refactor version_hash_additional_context, which currently uses `str`, to use `bytes`
- NOTE: the flytekit 1.16 release changed hash calculation a bit with flyteorg#3247, so this commit is a little different from what was done on the 1.15.4 version of the Domino branch (and the 2 commits for this were squashed into one) - This is necessary due to another pending PR to upstream flytekit: flyteorg#2428 In case this PR is not likely to be merged, we have a plan to move away from this change, see the linked Doc in DOM-57472 [DOM-68737] Refactor version_hash_additional_context to use bytes (flyteorg#7) `bytes` is more flexible. This flexibility is relevant when including the serialized input bindings (bytes) in the version hash calculations. This is necessary to fix a bug that can occur: There is a bug where Flyte rejects a valid workflow. This can happen when workflow inputs are read from a file, causing the workflow input bindings to be updated, without an update to the version. Refactor version_hash_additional_context, which currently uses `str`, to use `bytes`
Tracking issue
Closes flyteorg/flyte#5364
Why are the changes needed?
Extends https://github.com/flyteorg/flytekit/pull/2039/files – that PR gives the minimum API surface for configuring the API via
FlytekitPlugin.This PR increases how
pyflyteis pluggable by external libraries, specifically for controlling the additional context used to generate the version string hash.What changes were proposed in this pull request?
This can be used with a plugin like:
This will add the task config to the calculation of the version hash.
How was this patch tested?
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link