API: Deprecate ContentFile#path API and add location API which returns String#11092
Conversation
flyrain
left a comment
There was a problem hiding this comment.
It'd be nice to keep the name, but overloading could not apply to the return type.
| @Deprecated | ||
| CharSequence path(); | ||
|
|
||
| /** Return the fully qualified path to the file, suitable for constructing a Hadoop Path */ |
There was a problem hiding this comment.
I wonder whether we still want to keep the Hadoop Path reference here as it quite evolved over the years and we do not necessarily depend on Hadoop anymore.
There was a problem hiding this comment.
Good point, I was largely concerned if we didn't preserve this property of being suitable for a hadoop path at the API level, once path is deprecated, it may break the users that do rely on having this. But if we're saying that we don't really need to concern ourselves on maintaining this, we may not need to include it. cc @rdblue thoughts on this?
There was a problem hiding this comment.
Discussed with @rdblue @danielcweeks and here are some key points on why we shouldn't document anything related to hadoop:
- Hadoop pathing doesn't work with storage systems like GCS
- Locations are a super set of what can be represented in a hadoop path. Even today
pathAPI can return paths which simply cannot be represented in a hadoop Path. - The hadoop path normalization logic simply conflicts with absolute paths
So updated to remove this part of the comment. I'll leave the existing path API comment as is since it's being deprecated anyways.
aokolnychyi
left a comment
There was a problem hiding this comment.
I support this change. Internally, we always use String objects for serializability and most places that consume have explicit conversion to String already.
95b75b5 to
963d321
Compare
Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
963d321 to
ea7d2ec
Compare
|
THanks for reviewing @flyrain @nastra @aokolnychyi , I'll go ahead and merge |
…s String (apache#11092) Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
ContentFile#path returns CharSequence instead of String. The original intention of this API returning CharSequence was to directly surface Avro UTF8 without any conversion, however for distributed cases the CharSequence needs to be converted into String since CharSequence itself isn't serializable. Furthermore, in terms of naming we tend to use location in other places like manifestListLocation or metadataLocation, also in FileIO. This change performs the following:
Over time up until 2.0 we can remove references to path and use location. After 2.0 implementations of ContentFile would need to implement location.