refactor for Onda Format v0.5.0 by jrevels · Pull Request #59 · beacon-biosignals/Onda.jl

jrevels · 2021-01-04T03:45:31Z

This is a ridiculously breaking set of changes - I'm not even going to attempt backwards compatibility, except for a convenience function for upgrading old datasets.

I still need to refactor examples/docs/tests, as well as potentially reintroduce some higher-level convenience functionality depending on which manipulations end up being naturally replaced by DataFrames-y one-liners. done

~~This PR also requires that Arrow.jl have a tagged release + a corresponding compat bound bump to here.~~ done

One additional nice-to-have that is mostly orthogonal to this PR (i.e. this PR is mergeable without it) would be if we could make an upstream Arrow.jl PR to make paths-handling there a bit more type-agnostic so that read_signals etc. would automagically work with S3Paths. this would still be nice to have but we work around it here already

.travis.yml

…icphanson

…icphanson explanation from apache/arrow-julia#96

jrevels · 2021-02-15T06:35:38Z

.travis.yml

-  - 1.0
-  - 1.3
+  - 1.5
+  - nightly


This is really dumb, but it seems like if I only have a single stage listed here, Travis only performs the "Documentation" job and not the actual tests themselves...not sure what the underlying problem is but worked around it via this for now

jrevels · 2021-02-15T07:11:01Z

Hmmm. Codecov is claiming certain lines in https://codecov.io/gh/beacon-biosignals/Onda.jl/pull/59/src/src/samples.jl are not covered, but I'm like 99.5% sure that a few of these are...both of the cases I'm looking at are ones where the call to the function f is clearly covered but is happening through a call to broadcast!(f, ...)...

jrevels · 2021-02-15T07:14:16Z

This is now ready for review.

Given that this is essentially a package-wide rewrite, I strongly suggest any reviewers review the package fresh and only reference the diff where useful (rather than starting with the diff).

IMO examples/tour.jl is the best entrypoint for understanding the intended new direction here.

ericphanson · 2021-02-16T11:44:48Z

If we add push_preview = true to deploydocs, we should get per-PR docs builds to preview the new docs (https://juliadocs.github.io/Documenter.jl/stable/lib/public/#Documenter.deploydocs)

jrevels · 2021-02-16T16:20:36Z

If we add push_preview = true to deploydocs, we should get per-PR docs builds to preview the new docs

Done!

ericphanson · 2021-02-16T13:52:22Z

src/utilities.jl

+#####

 function zstd_compress(bytes::Vector{UInt8}, level=3)
    compressor = ZstdCompressor(; level=level)


This is like the one untouched bit of Onda, but I noticed Arrow.jl re-uses ZstdCompressor's: https://github.com/JuliaData/Arrow.jl/blob/a113edd934a1efa667b3ffb3d11b135f746322ab/src/Arrow.jl#L94-L107; maybe this could have perf benefits for us too here?

ah, probably not a bad idea. I wonder what the perf difference is between multithreading on the Julia side vs. zstd's built-in multithreading (not exposed via CodecZstd IIRC; for all i know it might be the same thing under the hood anyway)

one possible advantage of doing it Julia side is composing with other threaded Julia functions, if it's nested or such. (Though I'm not totally sure how tuned that stuff is yet anyway)

ericphanson · 2021-02-16T14:01:22Z

src/utilities.jl

+end
+
+# It would be better if Arrow.jl supported a generic API for nonstandard path-like types so that
+# we can avoid potential intermediate copies here, but its documentation is explicit that it only


this is something I've been wondering about; I think read_arrow_table(read(path)) does not usually make extra copies, right? it reads the whole file into memory as a `Vector{UInt8}, then Arrow.jl re-uses that byte buffer and basically exposes a lazy tabular view on top?

But this API choice of Arrow.jl means we can't handle larger-then-memory tables or use mmaping, since we always need to read the whole table into memory.

I think read_arrow_table(read(path)) does not usually make extra copies, right? it reads the whole file into memory as a `Vector{UInt8}, then Arrow.jl re-uses that byte buffer and basically exposes a lazy tabular view on top?

Yup. The part that's a bit more annoying IMO is the write_arrow_table workaround. I think it currently involves an unnecessary extra buffer...in theory it shouldn't need to, though. What I really want is a method that, given a Table, returns an <:IO object that directly references the table's memory and when read yields bytes in the caller's preferred Arrow file/IPC format (i.e. the file=true option for write).

There might already be a good way to achieve that with the Arrow.jl API, and I just missed it lol

this API choice of Arrow.jl means we can't handle larger-then-memory tables or use mmaping, since we always need to read the whole table into memory.

Well, we still get mmaping for local filesystem paths, but it would be really cool if e.g. AWSS3.S3Paths supported an mmap-like method that used byte-range reads under the hood 😎

Generally, though, you'd split a larger-than-memory Arrow table into manageably chunked objects upon write, and then treat the objects as a one logical table/dataset via an API like https://arrow.apache.org/docs/python/dataset.html. In Julia land, I wonder how much we can get away with just using e.g. Tables.partition for things lol

The part that's a bit more annoying IMO is the write_arrow_table workaround. I think it currently involves an unnecessary extra buffer...in theory it shouldn't need to, though.

I see, yeah I've been using the same workaround and have also been annoyed by the need for it. I think maybe one fix is rofinn/FilePathsBase.jl#113.

I wonder how much we can get away with just using e.g. Tables.partition

That seems cool, I don't know how much as been figured out for using that kind of tool for out-of-memory data, as opposed to just lazily chaining tables that are already in-memory. But I haven't used it much yet.

src/utilities.jl

ericphanson · 2021-02-16T16:47:36Z

If we add push_preview = true to deploydocs, we should get per-PR docs builds to preview the new docs

Done!

Cool! The preview docs are here: https://beacon-biosignals.github.io/Onda.jl/previews/PR59/

Co-authored-by: Eric Hanson <5846501+ericphanson@users.noreply.github.com>

ararslan

Cursory review for now but overall things are looking nice

examples/tour.jl

src/Onda.jl

src/annotations.jl

src/utilities.jl

test/samples.jl

Co-authored-by: Alex Arslan <ararslan@comcast.net>

docs/src/index.md

jrevels added 22 commits December 24, 2020 16:54

wip

bb2b406

wip

5278ce4

wip

325e7e3

wip

f38722b

wip

829dc59

wip

ab8c77b

wip

a823c10

wip

fd2ac2f

wip

2573dbe

wip

65ec94c

wip

0cf02c1

wip

33406c7

wip

943f10f

wip

2db63b5

wip

998ca68

wip

6037c65

wip

fdb93f4

wiip

70c2ebc

wip

8ffa64a

wip

fcef5dc

wip

915330f

wip

a8daecc

christopher-dG reviewed Jan 4, 2021

View reviewed changes

.travis.yml Show resolved Hide resolved

jrevels added 7 commits January 4, 2021 20:42

add Onda.register_lpcm_format! docstring and shameless steal/edit @er…

539449c

…icphanson explanation from apache/arrow-julia#96

wip

aac0f0b

wip

da529f9

wip

908a73f

wip

e2eeec1

wip

041f2fe

wip

ae4eb8a

jrevels marked this pull request as ready for review February 15, 2021 06:05

fix docs xref

dbb7bda

jrevels requested a review from ararslan February 15, 2021 06:18

jrevels added 2 commits February 15, 2021 01:29

wip

25556af

wip

6883215

jrevels commented Feb 15, 2021

View reviewed changes

wip

08a69b5

wip

ac373b1

add push_preview=true to deploydocs

ceb33ba

ericphanson reviewed Feb 16, 2021

View reviewed changes

jrevels and others added 3 commits February 16, 2021 16:40

Update src/utilities.jl

778048c

Co-authored-by: Eric Hanson <5846501+ericphanson@users.noreply.github.com>

minor doc tweaks

38f6be2

test _iterator_for_column fast/slow paths

07cdc58

ararslan reviewed Feb 18, 2021

View reviewed changes

jrevels and others added 5 commits February 18, 2021 15:59

remove unnecessary using statement

a329cf6

Update src/Onda.jl

28eb54c

Co-authored-by: Alex Arslan <ararslan@comcast.net>

Update src/annotations.jl

6190643

Co-authored-by: Alex Arslan <ararslan@comcast.net>

Update src/annotations.jl

97aaed1

Co-authored-by: Alex Arslan <ararslan@comcast.net>

Update src/annotations.jl

d9286b0

Co-authored-by: Alex Arslan <ararslan@comcast.net>

ericphanson reviewed Feb 19, 2021

View reviewed changes

docs/src/index.md Outdated Show resolved Hide resolved

jrevels added 2 commits February 19, 2021 16:32

attempt to resolve docs ambiguity

afae06d

add comment

1d275e3

jrevels merged commit ec092ca into master Feb 20, 2021

jrevels deleted the jr/arrow branch February 20, 2021 00:03

jrevels mentioned this pull request Feb 20, 2021

forward additional keyword arguments passed to load/store! as keyword arguments to serializer #5

Closed

Conversation

jrevels commented Jan 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jrevels Feb 15, 2021

Choose a reason for hiding this comment

Uh oh!

jrevels commented Feb 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrevels commented Feb 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericphanson commented Feb 16, 2021

Uh oh!

jrevels commented Feb 16, 2021

Uh oh!

ericphanson Feb 16, 2021

Choose a reason for hiding this comment

Uh oh!

jrevels Feb 16, 2021

Choose a reason for hiding this comment

Uh oh!

ericphanson Feb 17, 2021

Choose a reason for hiding this comment

Uh oh!

ericphanson Feb 16, 2021

Choose a reason for hiding this comment

Uh oh!

jrevels Feb 16, 2021

Choose a reason for hiding this comment

Uh oh!

ericphanson Feb 17, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ericphanson commented Feb 16, 2021

Uh oh!

ararslan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jrevels commented Jan 4, 2021 •

edited

Loading

jrevels commented Feb 15, 2021 •

edited

Loading

jrevels commented Feb 15, 2021 •

edited

Loading