#34009 avro generic record to beam row conversion added support for a…#34024
#34009 avro generic record to beam row conversion added support for a…#34024Abacn merged 30 commits intoapache:masterfrom wollowizard:master
Conversation
…ll logical types and conversions
|
Run Java PreCommit |
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
assign set of reviewers |
|
Assigning reviewers. If you would like to opt out of this review, comment R: @robertwb for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ |
There was a problem hiding this comment.
we don't need to include a license header here since we're not distributing this file (and it is under the RAT exclusion)
buildSrc/build.gradle.kts
Outdated
|
|
||
| runtimeOnly("com.google.protobuf:protobuf-gradle-plugin:0.8.13") // Enable proto code generation | ||
| runtimeOnly("com.github.davidmc24.gradle.plugin:gradle-avro-plugin:1.9.1") // Enable Avro code generation | ||
| runtimeOnly("com.github.davidmc24.gradle.plugin:gradle-avro-plugin:1.1.0") // Enable Avro code generation. Version 1.1.0 is the last supporting avro 1.10.2 |
There was a problem hiding this comment.
I'm not sure about downgrading this dependency. Why do we specifically need to target 1.1.0? It is a rather old dependency, and downgrading could introduce other problems
There was a problem hiding this comment.
restored 1.9.1. My thinking is that since the earliest tested avro version is 1.10.2 (see sdks/java/extensions/avro/build.gradle) , it would be good to use a version of the avro plugin that uses avro 1.10.2.
In any case, the whole plugin seems no longer maintained, so maybe it would be a good idea to remove this altogether and just use the avro-tools jar, like elsewhere. seems like this could be addressed in a separate task though
...tensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/schemas/utils/AvroUtils.java
Show resolved
Hide resolved
|
Adding some folks who may have some more context here: @aromanenko-dev @Abacn |
…underlying type instead of types.Alias (#33868)
* add endpoint type to WorkerMetadataResponse proto * add default value to endpoint_type
* Add UUID support in Spanner Schema * Add test
@aromanenko-dev that was indeed helpful, to get some context. I believe the change is aligned, and I have added extra tests to specifically check avro records for specific record classes created with 1.8.2 and 1.9.2 |
|
Run PostCommit_Java_Avro_Versions |
|
Run Java Avro Versions PostCommit |
|
@wollowizard There is a dedicated workflow to run tests against different Avro versions: |
|
Run Java Avro Versions PostCommit |
|
@aromanenko-dev I am not sure how to proceed, would you mind giving another quick review? thanks in advance |
There was a problem hiding this comment.
@wollowizard Well, I'm not on the project anymore for about one year, so, tbh I didn't follow any recent Avro extension changes. So, maybe ask another person, who has more knowledge on this, to make sure that it doesn't break the things.
Though, on the first sight, it LGTM in case if all checks are passing and, especially, multiple versions Avro checks. Then I think we are good.
|
Thanks, I see there is already an approval. Added a minor comment regarding formatting, and could help merge after it is resolved. |
|
@Abacn thanks, I've made the change and resolved your comment |
|
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @Abacn for label java. Available commands:
|
#34009 avro generic record to beam row conversion added support for all logical types and conversions
The PR modifies the Avro extension, specifically enhancing how Avro logical types are handled when converting between Avro records and Beam rows. It introduces support for a broader range of Avro logical types (e.g., decimal, uuid, timestamp-micros, etc.), improves compatibility with Avro’s SpecificRecord and GenericRecord, and ensures proper serialization/deserialization of these types.
The primary goal is to make AvroUtils more robust in converting Avro records (both GenericRecord and SpecificRecord) to Beam rows by fully supporting Avro’s logical types. A new
convertLogicalTypemethod is introduced to handle logical type conversions dynamically using a GenericData instance. This method checks for registered conversions (e.g., DecimalConversion, UUIDConversion) and applies them to transform Avro data into Beam-compatible formats.Instead of hardcoding conversions for specific logical types (e.g., TimestampMillis to ReadableInstant), the code now delegates to Avro’s GenericData conversion system. This makes it extensible to any logical type with a registered conversion, including custom ones.
Just fyi, my initial interest in this was given by the fact that I have some
SpecificRecords (subclass ofGenericRecord) for which I have classes generated with theavro-maven-plugin. They have java time fields, uuids and bigdecimal fields and the generated code also adds conversion to the GenericData of the generated class. So it seems that the generated code is nice and complete but the conversion to a beam row failed because beam's AvroUtils' code does not support all logical types and has a predefined list of accepted input types (for example does not accept a GenericRecord with a BigDecimal field, it needs to be ByteBuffer, its raw type) . I thought a more flexible approach would be to use the avro conversions.GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.