Skip to content

Using FileIO and EncryptionManager for MR InputFiles#985

Closed
jerryshao wants to merge 1 commit into
apache:masterfrom
jerryshao:ISSUE-865
Closed

Using FileIO and EncryptionManager for MR InputFiles#985
jerryshao wants to merge 1 commit into
apache:masterfrom
jerryshao:ISSUE-865

Conversation

@jerryshao

Copy link
Copy Markdown
Contributor

This address the left issue in #865

CC @rdsr @rdblue

out.writeInt(data.length);
out.write(data);

byte[] ioData = SerializationUtil.serializeToBytes(io);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdsr, I just noticed this. Do we need custom Serializable implementations? If all of the fields are Serializable and this is Serializable then the default implementation should just work.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to verify this, we can add a test for serializability, too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. I'll double check this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue, can you elaborate more on what is the default implementation here? I don't think there's a default implementation for Serializable in MR. There's a default impl. for Writables

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java serialization will work without implementing write and readFields as long as all of the fields in an object are Serializable. The only reason to implement this is to wrap a non-serializable object or to customize in some way how serialization happens. Because all 3 fields that need to be serialized are already Serializable (FileScanTask, FileIO, and EncryptionManager) I don't think we need these methods.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow up on this, I don't think that we need to remove it in this PR. But we should be able to remove these in a follow-up.

@rdsr rdsr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me. Pending @rdblue questions

});
}

private FileIO fileIO(Table table) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:seems like this method can be made static?

out.writeInt(data.length);
out.write(data);

byte[] ioData = SerializationUtil.serializeToBytes(io);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue, can you elaborate more on what is the default implementation here? I don't think there's a default implementation for Serializable in MR. There's a default impl. for Writables


private FileIO fileIO(Table table) {
if (table.io() instanceof HadoopFileIO) {
return new HadoopFileIO(((HadoopFileIO) table.io()).conf());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this create a new HadoopFileIO? The table's FileIO should work fine.

I think this might be based on the handling in Spark, which creates a new one that wraps a broadcasted manifest. But there is no broadcast for MR so I don't think we need this method. It should work fine to use table.io().

@rdsr

rdsr commented May 28, 2020

Copy link
Copy Markdown
Contributor

This seems close to completion @jerryshao . Can we get this in, after addressing those minor comments?

@rdblue

rdblue commented Oct 23, 2020

Copy link
Copy Markdown
Contributor

This was fixed in #1532, so I'll close this one. Thanks for working on this!

@rdblue rdblue closed this Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants