Redesign QueryContext class by paul-rogers · Pull Request #13071 · apache/druid

paul-rogers · 2022-09-11T20:49:42Z

Query context revision which builds on the work started in PR #13022.

Release Notes

Most of the work in this PR is transparent to users. This PR includes Issue #13120 which introduces a user-visible change. We introduce two new configuration keys that refine the query context security model controlled by druid.auth.authorizeQueryContextParams. When that value is set to true then two other configuration options become available:

druid.auth.unsecuredContextKeys: The set of query context keys that do not require a security check. Use this for the "white-list" of key to allow. All other keys go through the existing context key security checks.
druid.auth.securedContextKeys: The set of query context keys that do require a security check. Use this when you want to allow all but a specific set of keys: only these keys go through the existing context key security checks.

Both are set using JSON list format:

druid.auth.securedContextKeys=["secretKey1", "secretKey2"]

You generally set one or the other values. If both are set, unsecuredContextKeys acts as exceptions to securedContextKeys.

In addition, Druid defines two query context keys which always bypass checks because Druid uses them internally:

sqlQueryId
sqlStringifyArrays

Backward Compatibility

When upgrading Druid, if query context security is disabled (druid.auth.authorizeQueryContextParams=false, the default) then you will see no change.

If query context security is enabled (druid.auth.authorizeQueryContextParams=true), then the two keys listed above will always be allowed, regardless of any security rules that may have been set. You can remove these two keys from your rules if you have added them.

The two new config values (unsecuredContextKeys, and securedContextKeys) will not be set in your configuration (or in the default configurations) and will thus behavior after an upgrade will not change. You can, however, shift context key security rules you have set for all users into the new unsecuredContextKeys config property.

Background

Recent work uncovered two issues with the way Druid handles query contexts:

Fix QueryContext race condition #13049 fixes a race condition with JDBC updates
Avoid ClassCastException when getting values from QueryContext #13022 fixes type-unsafe value conversions

During the investigation of the above it became clear that the semantics around the query context are a bit unclear and could be improved.

In particular, the current QueryContext class promises more than it can deliver. It groups context values into three groups: default, system values and user values. Splitting up the values was done to allow authorizing just the user-provided keys. However, this didn't quite work as intended for several reasons. First, most code does not recognize this format, and instead "merges" all three maps before making additional changes. Second, the only use of the user-provided values is to authorize them, something that can be done more simply via other means. Third, the concept of "user-set" keys is incorrect: keys set may come from the user, or may come from an application or the Router.

As a result, the QueryContext class creates confusion rather than tidying things up as was intended. The tidying up requires that we pull in Issue #13120 to resolve a context key security ambiguity.

Overview of Revisions

This PR achieves the desired goals in a simpler way, while unifying the many ways that the code currently works with the query context.

The original three-map QueryContext is replaced by a "facade" on top of a single map.
The original QueryContext was created to allow authorizing user context keys. This PR solves that issue by making a copy of the keys separate from the context, allowing the SQL engine to modify the context without creating conflicts with the security feature.
Uses of QueryContext to fetch values are replaced with a reference the new QueryContext class as a facade.
Modification to the context in the planning layer occurs directly on the context map; it is no longer done via the old QueryContext class since there is no real value in creating a layer on top of the map.
Avoid ClassCastException when getting values from QueryContext #13022 adds a number of type-safe method to the Query interface. However, doing so clutters up that interface, and we still need the QueryContexts methods to get at specific context values. See the discussion in Avoid ClassCastException when getting values from QueryContext #13022 for details. This PR, by contrast, puts all context access methods on the revised QueryContext facade, making for cleaner code.
All type-unsafe context value accesses were replaced in Avoid ClassCastException when getting values from QueryContext #13022. They are replaced again here by the new QueryContext type-safe methods.
The old type-unsafe methods are marked deprecated. No code in Druid-proper references them, though we must assume that extensions may continue to use them.
All specific methods in QueryContexts which take a Query or context map parameter are replaced by methods on the revised QueryContext class.
QueryContexts methods which rewrote a query moved to the Queries class.
Validation of a context value now reliably throws a BadQueryContextException with a clear explanation of the problem. Previously, we threw both BadQueryContextException and ISE for such errors, and the errors have variations in format and wording.
get methods are standardized. get<Type>() without a default returns a boxed value, or null if the value is not set. This allows code to implement logic of the form "if the value is set, do something with the value, else do something else."
get(defaultValue) methods return an unboxed value, since they will return the default if the value is unset.
DruidPlanner is refactored a bit to better handle context key security.
AuthConfig implements the enhanced context key security model.

Specifics

The original QueryContext class was first removed: all references to that class were modified to refer to the context map instead.

A new class of the same name was created. The only creator of the new class is Query.queryContext(). The new class holds onto the context map associated with the query, and provides the many type-safe and value-specific methods. Since QueryContext is simply a facade, it is cheap to create an instance as needed. This lets us convert code of the form:

long foo = QueryContexts.getAsLong(query, FOO_NAME, FOO_DEFAULT);
long bar = QueryContexts.getBar(query);

To

long foo = query.queryContext().getLong(FOO_NAME, FOO_DEFAULT);
long bar = query.queryContext().getBar();

When a bit of code makes multiple references to context values (as above), we can also do:

QueryContext queryContext = query.queryContext();
long foo = queryContext.getLong(FOO_NAME, FOO_DEFAULT);
long bar = queryContext.getBar();

The following methods in Query are marked deprecated:

interface Query... {
 @Deprecated
  <ContextType> ContextType getContextValue(String key);
  @Deprecated
  <ContextType> ContextType getContextValue(String key, ContextType defaultValue);
  @Deprecated
  default boolean getContextBoolean(String key, boolean defaultValue);

Although the above methods are deprecated, they are reimplemented as default methods. As a result, the methods of the same names are removed from subclasses since they are redundant.

The Query.getQueryContext() method is removed as a way of marking that the old QueryContext is no longer available. The old method didn't really even work: the different context types were lost when merging into a single map. A new queryContext() method returns the facade version.

Alternatives

The discussions in #13049 and #13022 suggested other alternatives:

Work only with the context map and the QueryContexts methods as shown above. This works fine, and is one of the steps taken in Fix QueryContext race condition #13049 that lead to this PR, but is a bit cumbersome.
Add methods to the Query method as in an early draft of Avoid ClassCastException when getting values from QueryContext #13022. Doing so clutters the Query interface, and does not help in the SQL layer where we work with contexts before the native Query is created. Unless we really want to clutter the Query interface, we'd still need the value-specific methods in QueryContexts.
Leave well enough alone: status quo. While the existing code works, as noted above, the semantics are a bit messy which creates confusion for readers. Tidying up the code has no benefit to the computer, but is helpful for us poor humans who have to understand the code.

Conclusion

The result is a solution that:

Solves the authorization issue that gave rise to the original QueryContext
Solves the synchronization issue with QueryContext by avoiding multiple maps.
Builds on the type safety solution of PR Avoid ClassCastException when getting values from QueryContext #13022 by shifting the methods to QueryContext.

This PR has:

been self-reviewed.
has been validated using existing tests.
has modified unit tests as needed to reflect changed methods, exceptions and semantics.
been tested in a test Druid cluster.

FrankChen021 · 2022-09-12T02:33:11Z

I agree with your proposal on the new API form as

long foo = query.queryContext().getLong(FOO_NAME, FOO_DEFAULT);
long bar = query.queryContext().getBar();

It makes the APIs much simpler and cleaner.

paul-rogers · 2022-09-29T01:43:15Z

@FrankChen021, thanks for the encouraging comment! Since this PR builds on the work you recently did, can you perhaps give it a review? This one touches many files and so is at risk of conflicts, so we might want to get it in if things look OK. Thanks!

FrankChen021 · 2022-09-29T01:44:55Z

@FrankChen021, thanks for the encouraging comment! Since this PR builds on the work you recently did, can you perhaps give it a review? This one touches many files and so is at risk of conflicts, so we might want to get it in if things look OK. Thanks!

Sure. I will review it on this weekends.

paul-rogers

@abhishekagarwal87, thanks for your review! Addressed your questions and comments. There is one backward-compatibility issue that one of your questions raised: please let me know what you think about the issue.

* Removes the QueryContext class and its attendant race conditions. * Replaces it with a light-weight facade to access the context. * Uses a simpler, alternative way to authorize user context values. * Moved query-related methods to Queries * Moved more context value methods to QueryContext * Revised the non-default "get" methods to return null if the value is not set. * Various build fixes

FrankChen021

Thank you @paul-rogers for the hard work.

Reviewed most of the files(excluding test cases). Here're some of suggestions besides above comments.

A force-push is not recommended to an open PR since it breaks the review state kept by the GitHub. See https://github.com/apache/druid/blob/master/CONTRIBUTING.md

Avoid rebasing and force pushes after submitting a pull request, since these make it difficult for reviewers to see what you've changed in response to their reviews. The Druid committer that merges your change will rebase and squash it into a single commit before committing it to master.
Seems like your IDE adds extra line between import javax and import java directives, these changes are unneccessary
Regarding the QueryContext, my opinion is that caller side should always see this type instead of seeing its underlying storage type which is Map<String, Object>. We can see we have getContext(), context(), and even queryContextMap(), these are a little confusion.
Separation of the QueryContext and Map<String, Object> also bring another problem. There're a lot of call of context().getXXX, each call of context() generates a new fascade object QueryContext, I think it's a little bit waste of memory and unnessary.

paul-rogers · 2022-10-03T17:29:32Z

@FrankChen021, thanks much for your detailed review! Very helpful. To answer your specific questions:

Sorry for the force-push. On many projects, the build is done on the PR itself. Any conflicts are resolved at the end of the review process. In Druid, each PR build is first rebased on master. Failure to rebase causes the build to fail and we get no feedback on the code. As a result, when there is a merge conflict, we are forced to rebase, which requires a force push. I wonder, is there some trick that others use to avoid this issue?
Sorry for the extra spaces in imports. That's just how Eclipse works. I'll make a note to manually remove such newlines in the future.
See below.
See below.

Your point 3 suggests that callers always see QueryContext and point 4 worries about the cost of creating QueryContext per call. Looking more carefully at the prior implementation, there is a path to make this work. That fix appears in the latest commit. The result is we eliminate the copies that were your concern in point 4.

Point 3 also suggests that callers always see QueryContext instead of the map. I agree that is the goal. The one caveat is that, for backward compatibility, the original map-based getContext() must exist and be a JSON property so that the JSON form of a query is unchanged.

Thank you for pointing out the potential bug with updating a query context in place in a query. The general rule, after this change, should be:

The query context is created, as a map, before the query is constructed.
Once the query is constructed, the context is accessed via QueryContext and is immutable. (Again, the one exception is that the map is presented to Jackson for JSON serialization.)
If the context must change (such as while processing a native query), we must make a copy of the context then make a copy of the entire query with the new context. This ensures that the query is immutable and avoids the kinds of race conditions which first brought my attention to this area.

Decimal parsing for long parameters Streamline context access in queries

FrankChen021 · 2022-10-04T02:39:17Z

Sorry for the force-push. On many projects, the build is done on the PR itself. Any conflicts are resolved at the end of the review process. In Druid, each PR build is first rebased on master. Failure to rebase causes the build to fail and we get no feedback on the code. As a result, when there is a merge conflict, we are forced to rebase, which requires a force push. I wonder, is there some trick that others use to avoid this issue?

To resolve merge conflict, another way is to merge the latest master back to our own development branch.

Fix a flaky test (race conditions)

paul-rogers

@FrankChen021, thanks again for your very thorough review. I'm impressed at the detailed items you identified: you somehow find all the tender bits in the code, which is wonderful.

I did a re-inspection of each of the areas you identified. I'm hoping that, this time, the story is consistent in each area.

Required to get a security IT to pass Revises context security in native, SQL query path Moves two SQL context keys to QueryContexts for visibility to AuthConfig

paul-rogers · 2022-10-12T18:26:54Z

@FrankChen021, thanks for the approval!

So, it turns out that I had to pull the work for Issue #13120 into this PR in order to get the security IT to pass. See below.

paul-rogers · 2022-10-12T18:45:36Z

PR is updated to pull the work for Issue #13120. That change is needed to get the security IT to pass. Basically, that IT does context key security checks using JDBC. JDBC inserts a key, which fails the security check. This worked previously when we separated system and user keys. (In fact, this may be the reason that the prior PR did the split.)

This latest PR adds two config values, defined in the "release notes" section in the revised description. To make this work:

The AuthConfig is not the one-stop shop to prepare the list of context keys to use for authorization checks.
Three new set operations are added to CollectionUtils because Java, in its infinite wisdom, doesn't provide them.
Two context keys moved from the SQL package to QueryContexts so that they are visible to AuthConfig.
These two keys are "out-of-the-box freebies" for context security: they are always allowed because they are set by Druid itself in either the Router (query ID) or JDBC path (stringify arrays.)
The code in the planner to gather the resources is tidied up a bit.

We'll want to review this final change before we merge.

abhishekagarwal87

Thank you, Paul. I don't see any change in the docs. will you add that in another PR?

abhishekagarwal87 · 2022-10-13T05:31:20Z

It will also make life easier for cluster admin to have a default list of unsecured context variables.

abhishekagarwal87 · 2022-10-14T11:21:40Z

There are legit CI failures.

abhishekagarwal87 · 2022-10-15T05:33:15Z

Thank you @paul-rogers. We can make the changes I suggested in a follow-up PR.

paul-rogers force-pushed the 220911-context branch from b2eaf15 to 5d2965a Compare September 11, 2022 20:52

This was referenced Sep 11, 2022

Fix QueryContext race condition #13049

Merged

Avoid ClassCastException when getting values from QueryContext #13022

Merged

FrankChen021 added Area - Querying Refactoring labels Sep 13, 2022

paul-rogers force-pushed the 220911-context branch 3 times, most recently from 5e4ee02 to 367f9c8 Compare September 23, 2022 00:46

paul-rogers force-pushed the 220911-context branch from 367f9c8 to fe939b2 Compare September 29, 2022 01:34

paul-rogers requested a review from FrankChen021 September 29, 2022 01:43

abhishekagarwal87 reviewed Sep 29, 2022

View reviewed changes

paul-rogers commented Sep 30, 2022

View reviewed changes

paul-rogers force-pushed the 220911-context branch from fe939b2 to b70bd1b Compare September 30, 2022 20:16

Revisions from review comments

07ad674

paul-rogers force-pushed the 220911-context branch from b70bd1b to 07ad674 Compare September 30, 2022 20:26

FrankChen021 reviewed Oct 1, 2022

View reviewed changes

Revisions from review comments

3c600ce

Decimal parsing for long parameters Streamline context access in queries

FrankChen021 reviewed Oct 4, 2022

View reviewed changes

Comment thread processing/src/main/java/org/apache/druid/query/QueryContexts.java Outdated

FrankChen021 reviewed Oct 4, 2022

View reviewed changes

Comment thread processing/src/main/java/org/apache/druid/query/QueryContexts.java Outdated

FrankChen021 reviewed Oct 4, 2022

View reviewed changes

Comment thread processing/src/main/java/org/apache/druid/query/QueryContexts.java

paul-rogers added 4 commits October 4, 2022 09:20

Revisions from review comments

70458c8

Merge branch 'master' into 220911-context

2576c1b

Build fixes

f95f35e

Fix a flaky test (race conditions)

Fix another flaky test

0525651