Skip to content

MR: use encryption manager for input files#1532

Merged
rdblue merged 3 commits into
apache:masterfrom
chenjunjiedada:use-encryption-manager
Oct 1, 2020
Merged

MR: use encryption manager for input files#1532
rdblue merged 3 commits into
apache:masterfrom
chenjunjiedada:use-encryption-manager

Conversation

@chenjunjiedada

Copy link
Copy Markdown
Contributor

This along with #1526 fixes #865.

out.writeInt(data.length);
out.write(data);

byte[] ioData = SerializationUtil.serializeToBytes(io);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue @rdsr , I saw your comments in #985 about remove customized serialization. Now the IcebergSplit implements org.apache.hadoop.mapred.InputSplit which extends writable. So I keep serialization implementation. Does this make sense to you?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with that.

@rdblue rdblue requested a review from rdsr September 30, 2020 20:24
this.encryptionManager = SerializationUtil.deserializeFromBytes(encryptionManagerData);
}

public FileIO fileIO() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is named io in Table, so I don't think a longer name is needed here.

@rdblue

rdblue commented Sep 30, 2020

Copy link
Copy Markdown
Contributor

Looks good to me. Would be nice to rename that method to fit the convention used elsewhere.

@chenjunjiedada

chenjunjiedada commented Oct 1, 2020

Copy link
Copy Markdown
Contributor Author

@rdblue , The HadoopFileIO contains SerailiableSupplier<Configuration>, so it should be serialized by SerializationUtil and then Writable, therefore satisfy the serialization requirement for MR framework. Does that make sense to you?

@chenjunjiedada chenjunjiedada force-pushed the use-encryption-manager branch 2 times, most recently from 0c94771 to b9b4553 Compare October 1, 2020 01:20

@rdsr rdsr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

out.writeInt(data.length);
out.write(data);

byte[] ioData = SerializationUtil.serializeToBytes(io);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with that.

@rdblue

rdblue commented Oct 1, 2020

Copy link
Copy Markdown
Contributor

The HadoopFileIO contains SerailiableSupplier, so it should be serialized by SerializationUtil and then Writable, therefore satisfy the serialization requirement for MR framework.

Where is the HadoopFileIO being rebuilt with SerializableSupplier<Configuration>? In Spark, we have to check the FileIO and replace it with the one that references the SerializableSupplier.

@chenjunjiedada chenjunjiedada force-pushed the use-encryption-manager branch from ff6f4ee to 12b0906 Compare October 1, 2020 04:51
@chenjunjiedada

Copy link
Copy Markdown
Contributor Author

@rdblue , Just added this. I missed that the configuration used to build SerialableSupplier<Configuration> also need to be serializable.

@rdblue

rdblue commented Oct 1, 2020

Copy link
Copy Markdown
Contributor

Thanks for fixing it, @chenjunjiedada! I'll merge this.

@rdblue rdblue merged commit c7caa64 into apache:master Oct 1, 2020
@chenjunjiedada

Copy link
Copy Markdown
Contributor Author

Thanks for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[InputFormat Followup] InputFiles should be created using FileIO and Encryption manager

3 participants