Add support for a reducer step that aggregates per-user metrics#76
Merged
Conversation
xkrogen
requested changes
Feb 14, 2019
xkrogen
left a comment
Collaborator
There was a problem hiding this comment.
Hey @csgregorian , this is a great start! Besides the inline comments, I have two general comments:
- We should add a Combiner to reduce the data transfer volume between Mappers and Reducers.
- I'm wondering if it makes sense for the
WorkloadReducerto be user-configurable. I'm thinking it may be better for it to be specified by the mapper, like how aWorkloadMapperspecifies what input format it expects. Since the output keys/values of the mapper must be exactly matched to the input keys/values of the reducer, I don't think it makes sense to configure them independently. Let me know what you think.
757d342 to
67873d7
Compare
9b61ead to
dc3e754
Compare
dc3e754 to
75de589
Compare
Contributor
Author
|
Simplified a ton of stuff here.
Ready for more reviews! |
xkrogen
requested changes
Feb 15, 2019
xkrogen
left a comment
Collaborator
There was a problem hiding this comment.
Nice, I love all of the changes. I have mostly minor comments throughout, and my only general comment would be that we should update TestDynamometerInfra in some way. We're already asserting on the output within TestWorkloadGenerator, but maybe in the infra test we can at least assert that the -workload_output_path arg works as expected and puts the file in the right place?
xkrogen
reviewed
Feb 15, 2019
xkrogen
left a comment
Collaborator
There was a problem hiding this comment.
Few minor style nits and then it's all good to go!
xkrogen
approved these changes
Feb 16, 2019
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The goal of this PR is to let Dynamometer collect and emit per-user metrics, as opposed to the whole-workload measurements that it currently collects using Counters. This is done by first changing the
AuditReplayMapperto emit key-value pairs in the form of(username_type, latency)wheretypeis eitherREADorWRITE. Then, a reducer stepAuditReplayReduceris added to sum latencies per key. More generally, this change allows Dynamometer to support arbitrary reducers for stats aggregation along with the existing mappers.I modified
TestWorkloadGeneratorto run with and without a reducer, and confirm that the file was created. This still needs a test to ensure that the actual contents of the output file are correct. As well, it's possible that I changed some things that didn't need to be changed in the process of getting the tests to pass/output to appear.Mostly trying to see if CI will pass, but ready for some preliminary reviews.
EDIT: a couple other things that I'm not sure are the right approach and need to be looked at a bit closer:
WorkloadMapperto specifyKEYOUTandVALUEOUTas parameters is the right moveAuditReplayMapperwrites to the context without a reducer being specified? Shouldjob.setMapOutputKeyClassbe set toLongWritable.classinstead ofNullWritable.classregardless of whether it's actually outputting anything? It doesn't seem like this caused any errors, but if not, why?WorkloadReducerjust to what it needs to run, but it should probably havegetDescriptionandgetConfigDescriptionsas well.