Skip to content

add support for custom merge strategy#104

Merged
costimuraru merged 10 commits intoadobe:masterfrom
oldgromit:feature/custom_merge_strategy
Aug 7, 2022
Merged

add support for custom merge strategy#104
costimuraru merged 10 commits intoadobe:masterfrom
oldgromit:feature/custom_merge_strategy

Conversation

@oldgromit
Copy link
Contributor

@oldgromit oldgromit commented Jul 19, 2022

Description

Adding custom merge strategy to himl so default behavior can be customized.

Currently ConfigProcessor builds a Merger using

merger = Merger([(list, ["append"]), (dict, ["merge"])], ["override"], ["override"])

We want to add support for custom merge strategies. i.e. for a list of items, we want to be able to override the default values, instead of default append behavior. Also we would like to be able to remove certain default values from lower nodes in the hierarchy.

We can change ConfigProcessor and ConfigGenerator init method signature so those optional merge strategies can be passed in, and send to deepmerge Merger object.

Related Issue

Motivation and Context

We want to define certain default values for overall hierarchy, but also allows flexibility to override or remove default values by lower nodes if necessary. Currently the merging policy/strategy is hardcoded in ConfigGenerator. We'd like to propose to change that as an optional parameter so user can pass in customized functions if they have that need.

How Has This Been Tested?

  1. Modified main.py so an example custom merge strategy as a function is defined, and this function is passed in as an optional parameter to ConfigProcessor
  2. Modify default.yaml and cluster.yaml to provide an example on how to use the custom merge strategy. User can define different unique keys for each record to distinguish records. Also a "remove" flag is provided in the example to showcase how to delete a default value.
  3. Run himl examples/complex/env=dev/region=us-east-1/cluster=cluster2 to test expected behavior:
  • records with same 'id' field are overridden
  • record with 'remove' flag is removed
  1. Run the same command against other folder such as himl examples/complex/env=dev/region=us-east-1/cluster=cluster1 to make sure default behavior is not changed if no 'id' field is defined.

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • I have signed the Adobe Open Source CLA.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

himl/main.py Outdated
opts.output_format, opts.print_data, opts.output_file, opts.skip_interpolation_resolving,
opts.skip_interpolation_validation, opts.skip_secrets, opts.multi_line_string)
opts.skip_interpolation_validation, opts.skip_secrets, opts.multi_line_string,
type_strategies= [(list, [self.strategy_merge_override,"append"]), (dict, ["merge"])] )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually changing the current behaviour by setting the new strategy as default.
Please add a new argument "use-merge-list-strategy" and add this if that is set else keep the old behaviour

Copy link
Contributor Author

@oldgromit oldgromit Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the feedback. I'm using 3 new optional arguments (type_strategies, fallback_strategies, type_conflict_strategies) to ConfigProcessor. If they are not set, we will default back the current behavior. which is defined in ConfigProcessor process method:

type_strategies = [(list, ["append"]), (dict, ["merge"])], fallback_strategies = ["override"], type_conflict_strategies = ["override"]):

Also I'm using main.py as an example on how to use ConfigProcessor for default behavior vs custom behavior. Please correct me if my understanding of usage of main.py is incorrect.
I've added back the previous default behavior invocation and added some comments to clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oldgromit what I meant is that himl/main.py is not really just an example but the main entry point in the himl binary installed by the python package:
https://github.com/adobe/himl/blob/master/setup.py#L59

So to avoid breaking existing clients using this with the default strategy I suggested you add a new argument here: https://github.com/adobe/himl/blob/master/himl/main.py#L66-L68
--use-merge-list-strategy and when only that is set configure the new strategy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. my understanding of main.py as an example is misguided.
I can definitely add a new command line argument for the strategy. However, the new strategy would be a function that should be implemented and provided by the user, not himl. is that ok we dynamically load the user specified module and pass the function to Merger, if the optional argument is passed?

Copy link
Contributor

@costimuraru costimuraru Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oldgromit I'm wondering what would be the best way for you override the merging strategy.

  1. Are you using himl as a standalone binary? In that case, it might be useful for you to have an argument like @amuraru mentioned, where you can control the merging behavior. The way I see it, the argument could be a str and receive one of the 3 values:
    --merge-list-strategy=append/override/prepend (see docs), where append is the default:
parser.add_argument('--merge-list-strategy',
                    choices=['append', 'override', 'prepend'],
                    default="append",
                    help='...')

Then you can run himl like this to achieve your goal:

himl --merge-list-strategy=override ...

For this to work, you can add the argument and then update this line of code to something like:

type_strategies= [(list, [opts.merge_list_strategy]), (dict, ["merge"])] )

In this way, we pass the value received from the user (override/append/prepend) to the deepmerge lib.

PS: Not sure if it's worth adding an argument for the dict (--merge-dict-strategy=merge/override) as well, probably not atm.

  1. If you are using himl as a python module, it looks like you can pass in the desired strategies with the code you've written so far, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @costimuraru for your feedback. you brought up an interesting point I haven't considered. the 3 values (append/override/prepend) are 3 merge strategies deepmerge provides out of box, we should allow users to specify which one they want to use. I've updated the code to offer that flexibility.

The original goal of my change is to allow a user specified merge strategy that can be passed to himl. a custom function that user has total control of. the example I provided merges/overrides depends on an "id" attribute, user can specify any other combo of fields to uniquely identifying records, they have total control of merging behavior.

I've pushed a new PR to illustrate how user can build their own custom function and himl dynamically loads specified module and function and pass it as the new merge strategy. Currently it only works with list data type.

Regarding #2 you mentioned, yes, if user have access to ConfigProcessor directly, they have a lot of more freedom and can pass in more elaborate merge strategies.

@amuraru @costimuraru Please review the latest PR and provide your valuable feedback. I'll update the documentation once PR is approved. Thanks!

Copy link
Contributor

@amuraru amuraru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR - see my comments

@amuraru
Copy link
Contributor

amuraru commented Jul 21, 2022

@danielcoman would you please review this as well?

Copy link
Contributor

@costimuraru costimuraru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also update the README.md with some docs on this new capability you've added (what it does, how it can be used), @oldgromit.


@staticmethod
def merge_yamls(values, yaml_content):
def merge_yamls(values, yaml_content,type_strategies, fallback_strategies, type_conflict_strategies):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: comma missing between arguments

Suggested change
def merge_yamls(values, yaml_content,type_strategies, fallback_strategies, type_conflict_strategies):
def merge_yamls(values, yaml_content, type_strategies, fallback_strategies, type_conflict_strategies):

value: 5
metrics:
- cpu
- exec No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: please add a newline at the end of the file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed this one. will update for next PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in the latest PR. please review. @costimuraru

himl/main.py Outdated
opts.output_format, opts.print_data, opts.output_file, opts.skip_interpolation_resolving,
opts.skip_interpolation_validation, opts.skip_secrets, opts.multi_line_string)
opts.skip_interpolation_validation, opts.skip_secrets, opts.multi_line_string,
type_strategies= [(list, [self.strategy_merge_override,"append"]), (dict, ["merge"])] )
Copy link
Contributor

@costimuraru costimuraru Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oldgromit I'm wondering what would be the best way for you override the merging strategy.

  1. Are you using himl as a standalone binary? In that case, it might be useful for you to have an argument like @amuraru mentioned, where you can control the merging behavior. The way I see it, the argument could be a str and receive one of the 3 values:
    --merge-list-strategy=append/override/prepend (see docs), where append is the default:
parser.add_argument('--merge-list-strategy',
                    choices=['append', 'override', 'prepend'],
                    default="append",
                    help='...')

Then you can run himl like this to achieve your goal:

himl --merge-list-strategy=override ...

For this to work, you can add the argument and then update this line of code to something like:

type_strategies= [(list, [opts.merge_list_strategy]), (dict, ["merge"])] )

In this way, we pass the value received from the user (override/append/prepend) to the deepmerge lib.

PS: Not sure if it's worth adding an argument for the dict (--merge-dict-strategy=merge/override) as well, probably not atm.

  1. If you are using himl as a python module, it looks like you can pass in the desired strategies with the code you've written so far, correct?

	himl examples/complex/env=dev/region=us-east-1/cluster=cluster2

default strategy test: (append/override/prepend/append_unique. append_unique doesn't work with list of dicts, that's current deepmerge limitation)
	himl examples/complex/env=dev/region=us-east-1/cluster=cluster2 --merge-list-strategy default override

custom strategy test: pass in a custom module and function
	himl examples/complex/env=dev/region=us-east-1/cluster=cluster2 --merge-list-strategy himl.custom_merger_example strategy_merge_override

negative tests: (error msg should be self explanatory )
	himl examples/complex/env=dev/region=us-east-1/cluster=cluster2 --merge-list-strategy himl.custom_merger_example_blah strategy_merge_override_blah
opts.skip_interpolation_validation, opts.skip_secrets, opts.multi_line_string,
type_strategies= [(list, [strategy_merge_override,'append']), (dict, ["merge"])] )
"""
def strategy_merge_override(config, path, base, nxt):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is described very well in the README.md. Is there a reason for keeping here as well or we can remoce it?
I'm thinking this file will be bundled in the python package and my concern is that we are polluting it with code that's not actually used. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept it just in case people don't read README :)
I do agree we shouldn't keep it there to keep bundle clean and mean. Removed.

There's a test_config_generator.py in tests folder. As far as I see, it's not used. Should I remove that as well?

README.md Outdated

This will replace the value after interpolation with the value of the regionBrokers.VA7 found under the configs/env=int/region=va7/kafka-brokers.yaml path.

## custom merge strategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: To be consistent with the other sections, let's start the section name with a capital letter: Custom merge strategy

Copy link
Contributor Author

@oldgromit oldgromit Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in latest PR.

@costimuraru
Copy link
Contributor

@oldgromit this is looking quite nice. Added just a couple more comments. Let me know what you think.

@oldgromit
Copy link
Contributor Author

@oldgromit this is looking quite nice. Added just a couple more comments. Let me know what you think.

thanks. I agree it's important to keep core lib clean, so I removed the example code, and simply will rely on the README for people to figure out the new feature.

@oldgromit
Copy link
Contributor Author

oldgromit commented Aug 3, 2022

@amuraru @costimuraru Thanks for all the great feedback. I've throughly enjoyed our discussions and have incorporated those feedback. This has been an interesting exchange of ideas. Would appreciate it if you guys can merge the new feature into master branch and release it so everyone can leverage the new feature. Thanks again for your great work on this very useful package!

@costimuraru costimuraru merged commit 1550057 into adobe:master Aug 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants