[STORM-3045] Microsoft Azure EventHubs Storm Spout and Bolt improvements#2588
[STORM-3045] Microsoft Azure EventHubs Storm Spout and Bolt improvements#2588SreeramGarlapati wants to merge 30 commits into
Conversation
of events received per call. -Refactor the code to group classes more appropriately -Remove redundant types -Javadoc comments where applicable -Preftch config parameter to dictate EH prefetch count -config parameter to introduce sleep between spout's nexttuple calls -config parameter to retrieve a batched number of events per call to EH (opposed to single event) -New data scheme to group event payload and audit params into a single type, and expose the single type as the only tuple field to downstream bolts.
… when returning empty/null values
…upported schemes. Enable scheme based serialization in spout
# Conflicts: # external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/bolt/EventHubBolt.java # external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/core/FieldConstants.java # external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/format/StringEventDataScheme.java # external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/spout/EventDataScheme.java # external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/spout/EventHubException.java # external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/spout/EventHubSpout.java # external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/trident/TransactionalTridentEventHubSpout.java # external/storm-eventhubs/src/test/java/org/apache/storm/eventhubs/spout/SpoutOutputCollectorMock.java # pom.xml
srdo
left a comment
There was a problem hiding this comment.
Partial review, got down to the start of the Trident code. Will finish up later.
In addition to the other comments, could you please look at target/checkstyle-violations.xml when you've built, and try to resolve any violations that point to your changes? I noticed a number of places with weird indentation, tabs and ifs with no braces, that I'd like to see fixed, since we're trying to stick to the checkstyle rules for new code.
| * @param password Password to use | ||
| * @param namespace target namespace for the service bus | ||
| * @param entityPath Name of the event hub | ||
| * @param partitionMode number of partitions |
There was a problem hiding this comment.
This description might need to be updated.
| public void prepare(Map<String, Object> config, TopologyContext context, | ||
| OutputCollector collector) { | ||
| this.collector = collector; | ||
| logger.info(String.format("Conn String: %s, PartitionMode: %s", boltConfig.getConnectionString(), |
There was a problem hiding this comment.
The logger can do String.format inherently. use {} for placeholders (e.g. logger.info("Connection String: {}", boltConfig.getConnectionString())
| * <p> | ||
| * The implementation has two modes of operation: | ||
| * <ul> | ||
| * <li>partitionmode = true, One bolt for per partition write.</li> |
There was a problem hiding this comment.
Nit: This seems like an enum would be a good fit.
| ehClient.close().whenCompleteAsync((voidargs, error) -> { | ||
| try { | ||
| if (error != null) { | ||
| logger.error("Exception during EventHubBolt cleanup phase" + error.toString()); |
There was a problem hiding this comment.
Consider logging the exception without toString so you don't lose the stack trace, i.e. logger.error("Some message", error). This looks like it's a problem in a few places.
| public void cleanup() { | ||
| logger.debug("EventHubBolt cleanup"); | ||
| try { | ||
| ehClient.close().whenCompleteAsync((voidargs, error) -> { |
There was a problem hiding this comment.
Nit: Any reason to use whenCompleteAsync over whenComplete?
| @Override | ||
| public void declareOutputFields(final OutputFieldsDeclarer declarer) { | ||
| List<String> fields = new LinkedList<String>(); | ||
| fields.add(FieldConstants.MESSAGE_FIELD); |
There was a problem hiding this comment.
You should use the fields declared by the scheme here, otherwise what's the point of having IEventDataScheme provide a declareOutputFields?
| import org.apache.storm.eventhubs.core.EventHubConfig; | ||
|
|
||
| /** | ||
| * EventHub configuration. This class remains in |
There was a problem hiding this comment.
Nit: Didn't the other class moves already break binary compatibility?
| private static final String ZK_LOCAL_URL = "localhost:2181"; | ||
|
|
||
| private final String zookeeperConnectionString; | ||
| private final CuratorFramework curatorFramework; |
There was a problem hiding this comment.
I don't think this is serializable
| return null; | ||
| } | ||
|
|
||
| String data = new String(curatorFramework.getData().forPath(statePath)); |
There was a problem hiding this comment.
Better if you specify a charset
| @Override | ||
| public void saveData(String statePath, String data) { | ||
| data = StringUtil.isNullOrWhiteSpace(data) ? "" : data; | ||
| byte[] bytes = data.getBytes(); |
|
thanks @srdo - i will take a shot at your suggestions by EOD monday PST. |
|
@srdo - when I try to merge my changes with |
|
@SreeramGarlapati Quick google suggests that you can try adding -Xrenormalize to the merge command. https://stackoverflow.com/a/12194759/8845188 |
|
And also, yes you can change the line ending to LF and that should work too |
|
By the way, when you get a chance, please go to https://issues.apache.org/jira and create an issue for tracking these changes, then rename the PR and commit(s) so they contain the issue number. It makes it easier for us to track which branches the changes are applied to, and helps us generate correct release notes. Thanks. |
This is continuation of work done by @raviperi.
-update to the latest version of eventhubs java client
-Introduce config params to use latest EH client, control request prefetch size, batch size of events received per call.
-Refactor the code to group classes more appropriately
-Remove redundant types
-Javadoc comments where applicable
-Preftch config parameter to dictate EH prefetch count
-config parameter to introduce sleep between spout's nexttuple calls
-config parameter to retrieve a batched number of events per call to EH
(opposed to single event)
-New data scheme to group event payload and audit params into a single
type, and expose the single type as the only tuple field to downstream
bolts.
Close #2322