Fedora license audit#9704
Conversation
The previous NOTICE.txt was inherited from the broader Apache Arrow project and listed third-party components (SFrame, DyND, LLVM, google-lint/cpplint, mman-win32, LevelDB, CMake, multibuild, Ibis, Dremio, Google Guava, Apache Kudu, Apache ORC) that only exist in the C++, Python, or Java implementations. None of these have any code incorporated in the Rust crates. Meanwhile, the following actually-incorporated third-party code was not listed: - chronoutil (MIT, Oliver Margetts): arrow-array/src/delta.rs is copied verbatim from the chronoutil crate with its MIT header. - compact-thrift (Apache 2.0, Jörn Horstmann): parquet/src/parquet_macros.rs macros are adapted from this project. - parity-common/uint (Apache 2.0 / MIT): arrow-buffer/src/bigint/div.rs division algorithm is heavily inspired by this crate. - simdjson (Apache 2.0): arrow-json/src/reader/tape.rs JSON tape representation is inspired by simdjson's tape design. The Cargo.toml license field (Apache-2.0) remains correct: MIT is compatible with Apache-2.0 inclusion (and the MIT notice is retained in delta.rs), and all other incorporated code is Apache-2.0 licensed. Prompted by Fedora packaging review at https://bugzilla.redhat.com/show_bug.cgi?id=2456991 about NOTICE.txt over-declaring licenses for components not present in the distributed crate. Generated-by: Claude Opus 4.6 (Anthropic)
arrow-array/src/delta.rs is copied verbatim from the chronoutil crate (Copyright 2020-2023 Oliver Margetts) under the MIT license. The file retains its MIT header, but the crate metadata did not reflect this. - Add arrow-array/LICENSE-MIT from the upstream chronoutil project - Update arrow-array license to "Apache-2.0 AND MIT" - Override the workspace include to ship LICENSE-MIT with the crate Generated-by: Claude Opus 4.6 (Anthropic)
|
note: LICENSE-MIT has the year in which the file was forked, so it ends in 2022 while the latest upstream repo now says 2023. This matches what's in the file header |
| This product includes software developed by Hewlett-Packard: | ||
| (c) Copyright [2014-2015] Hewlett-Packard Development Company, L.P | ||
| This product includes software inspired by the simdjson project (Apache 2.0) | ||
| * https://github.com/simdjson/simdjson |
There was a problem hiding this comment.
Not sure if it actually uses code/the code was inspired from simdjson (or only the approach)?
There was a problem hiding this comment.
yeah, this one can probably be dropped
| https://github.com/wesm/feather | ||
| This product includes software from the compact-thrift project (Apache 2.0) | ||
| * Copyright Jörn Horstmann | ||
| * https://github.com/jhorstmann/compact-thrift |
There was a problem hiding this comment.
This was contributed by @jhorstmann so does it need this addition?
There was a problem hiding this comment.
Technically I think @etseidl used the code from @jhorstmann 's repo as part of
So it probably doesn't hurt to have this in here 🤔
There was a problem hiding this comment.
I included a heavily modified version of @jhorstmann's code, but it's still a derivative work, so I agree with @alamb that it doesn't hurt to include
There was a problem hiding this comment.
My main contribution was probably the idea and prototype to use rust macros, with the goal to contribute that code. Then @etseidl did all the hard work of actually integrating that idea into the arrow-rs codebase.
I would be fine to leave this out of the NOTICE file since there is no code that could be considered a direct copy from that repo. I'm happy with the shout-out I got in to blog post about faster parquet parsing :)
There was a problem hiding this comment.
I think it is ok to leave this shout out in the notice. You can be forever famous (to a very select group of people)
|
I had gemini audit the new claims in
|
alamb
left a comment
There was a problem hiding this comment.
Thank you @michel-slm -- this is pretty great
May I ask how you did the audit (what tool)? The list looks good to me
I also had gemini double check and I was able to find the relevant PRs and issues for these new claims
The original NOTICE.txt i think is left over from when this code was split from the apache/arrow repo and it is a nice cleanup
Thank you @michel-slm
I use Claude Opus (I declared it in the individual commits, but not in the PR itself, hope that's enough) Should I address the two questions from @Dandandan too, @alamb ? Thanks |
Code only inspired by probably should not be listed. Also update LICENSE-MIT to actually match the years in the file header, not the latest from the upstream repo Signed-off-by: Michel Lind <salimma@fedoraproject.org>
Yeah -- that is great. Nothing else is needed from my perspective.
i defer to @Dandandan . I don't have a strong opinion either way nor do I really know how much
th comment says "inspired by" and since simdjson is written in C++ we probably didn't use the code 🤔 But on the other hand giving credit on the safe side is probably ok too |
It was strictly inspired by the approach of having a two-pass decoder using a tape, I don't think it needs a license attribution. |
|
@Dandandan I think my last push addressed your feedback, please let me know if you would like any further changes. Thanks! |
alamb
left a comment
There was a problem hiding this comment.
I think it looks great -- thanks everyone
|
We can make a follow on PR if we need to make additional changes |
# Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. --> - Closes apache#9703 9703 # Rationale for this change I am going to ship `arrow` crates in Fedora as a dependency for another tool, and we did a license audit as part of the review process # What changes are included in this PR? - Updates NOTICE.txt to reflect third party code actually shipped in the repo - Updates Cargo.toml in arrow-array because it actually ships MIT code as well # Are these changes tested? N/A, metadata change only # Are there any user-facing changes? No --------- Signed-off-by: Michel Lind <salimma@fedoraproject.org>
Which issue does this PR close?
Rationale for this change
I am going to ship
arrowcrates in Fedora as a dependency for another tool, and we did a license audit as part of the review processWhat changes are included in this PR?
Are these changes tested?
N/A, metadata change only
Are there any user-facing changes?
No