Skip to content

Core,AWS: Fix NPE in ResolvingFileIO when HadoopConf is not set#10872

Merged
amogh-jahagirdar merged 3 commits into
apache:mainfrom
munendrasn:resolving-file-io-npe
Aug 22, 2024
Merged

Core,AWS: Fix NPE in ResolvingFileIO when HadoopConf is not set#10872
amogh-jahagirdar merged 3 commits into
apache:mainfrom
munendrasn:resolving-file-io-npe

Conversation

@munendrasn
Copy link
Copy Markdown
Contributor

When ResolvingFileIo is invoked without conf set (which can happen when catalog or fileIo is explicitly created), it throws NPE.

java.lang.NullPointerException
	at org.apache.iceberg.io.ResolvingFileIO.lambda$io$1(ResolvingFileIO.java:179)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
	at org.apache.iceberg.io.ResolvingFileIO.io(ResolvingFileIO.java:176)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:62)
	at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:74)
	at org.apache.iceberg.common.DynMethods$BoundMethod.invoke(DynMethods.java:171)

The workaround is set to Configuration always.
For example, while creating the RESTCatalog like below but the order need to be maintained

RESTCatalog catalog = new RESTCatalog();
catalog.setConf(new Configuration());
catalog.initialize("rest", Map.of());

Exception HadoopFileIO, conf is not required or unused for other FileIo impl hence, handling with soft check, other option was to have strict check and fail if it is not set here

@munendrasn
Copy link
Copy Markdown
Contributor Author

@nastra FYI

impl,
key -> {
Configuration conf = hadoopConf.get();
Configuration conf = Optional.ofNullable(hadoopConf).map(Supplier::get).orElse(null);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that ResolvingFileIO is implementing Configurable from Hadoop it actually requires the conf to be set via setConf but I agree that this is a bit problematic for FileIO instances that don't need this, like S3FileIO.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do feel like it would be better to add this null handling to getConf() directly and then just call getConf() here

Copy link
Copy Markdown
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor comment to move the null handling code. @amogh-jahagirdar can you double-check the changes here as well please?

@munendrasn
Copy link
Copy Markdown
Contributor Author

@nastra Thank you for the review, please let me know if I should moved the null check to getConf or should we create new Configuration if not set?

@nastra
Copy link
Copy Markdown
Contributor

nastra commented Aug 19, 2024

@nastra Thank you for the review, please let me know if I should moved the null check to getConf or should we create new Configuration if not set?

@munendrasn yes I think it makes sense to move the null handling code into getConf(see also my other comment on this)

@munendrasn
Copy link
Copy Markdown
Contributor Author

@nastra
Updated, reverted the last commit in the PR fe256b4

@amogh-jahagirdar amogh-jahagirdar changed the title Fix NPE in Resolving FileIo when conf is not set Core,AWS: Fix NPE in ResolvingFileIO when HadoopConf is not set Aug 22, 2024
Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this change, I think it's the right behavior to just set a null conf instead of a strict failure if a HadoopConf is not set. The fact that ResolvingFileIO implements HadoopConfigurable does not mean a HadoopConf must be set, it just means that option is available and will be applied if it's specified.

@amogh-jahagirdar
Copy link
Copy Markdown
Contributor

THanks @munendrasn and thanks @nastra for the review.

impl,
key -> {
Configuration conf = hadoopConf.get();
Configuration conf = getConf();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that the CatalogUtil.loadFileIO calls below are null protected as well, the s3fileIO test will exercise that API so that gives confidence too.

@amogh-jahagirdar amogh-jahagirdar merged commit cbd71eb into apache:main Aug 22, 2024
@munendrasn
Copy link
Copy Markdown
Contributor Author

Thank you @nastra and @amogh-jahagirdar for the review

@munendrasn munendrasn deleted the resolving-file-io-npe branch August 22, 2024 04:52
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
zhongyujiang pushed a commit to zhongyujiang/iceberg that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants