Feature/extensible parsers#1
Conversation
Currently a lot of functionality is hidden behind private access modifiers making it difficult to extend the functionality of the parsers without copying large chunks of code. These small changes make re-use much easier.
Was able to reinstate some of the access modifiers and instead extracted out extensible code from those methods into new overridable methods. Added some hierarchy to the parsers/readers so they can be switched out.
|
Sorry, that was my fault for not explaining it well at all. I'm going to merge this in now since it is the same PR as the one for databricks-spark-csv: databricks#259 So if you guys have any comments they'd be better off going on that PR. |
|
@mustafashabib Please can you give me write access to this repo? Thanks `remote: Permission to quartethealth/spark-csv.git denied to blrnw3. |
|
Sorry about that Ben - can you try again?
|
|
Works now. thanks! |
| } | ||
|
|
||
| /** | ||
| * Allows for greater extensibility |
There was a problem hiding this comment.
This comment and the one above don't really add anything IMO
|
Can you create tests for this? |
| import org.apache.spark.rdd.RDD | ||
|
|
||
| private[csv] object TextFile { | ||
| object TextFile { |
There was a problem hiding this comment.
|
is there a different PR that you're preparing for dealing with fixed width data? |
|
Thanks for the comments. I'll work on those changes now. The fixed-width parser PR: quartethealth/spark-fixedwidth#1 |
As per databricks#259