Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
59dcbde
working implementation but lacks case-insensitivity and more unit tests
wmalpica Sep 1, 2021
4cb862b
different algorithm. Added more tests and benchmarks
wmalpica Sep 2, 2021
d03b7eb
uncommented tests
wmalpica Sep 2, 2021
69972dd
ARROW-13792 [Java]: The toString representation is incorrect for unsi…
liyafan82 Sep 2, 2021
b76caf4
ARROW-13544 [Java]: Remove APIs that have been deprecated for long (C…
liyafan82 Sep 2, 2021
111f0c7
ARROW-13823 [Java]: Exclude .factorypath
laurentgo Sep 2, 2021
09497a9
ARROW-13544 [Java]: Remove APIs that have been deprecated for long (C…
liyafan82 Sep 2, 2021
e380c1a
ARROW-13812: [C++] Fix Valgrind error in Grouper.BooleanKey test
lidavidm Sep 2, 2021
bbecb6a
ARROW-13067: [C++][Compute] Implement integer to decimal cast
cyb70289 Sep 2, 2021
495c734
ARROW-13846: [C++] Fix crashes on invalid IPC file
pitrou Sep 2, 2021
425b1cb
ARROW-13850: [C++] Fix crashes on invalid Parquet data
pitrou Sep 2, 2021
f0879a5
ARROW-13164: [R] altrep vectors from Array with nulls
romainfrancois Sep 2, 2021
8c70a5f
ARROW-13459: [C++][Docs]Missing param docs for RecordBatch::SetColumn
zhjwpku Sep 2, 2021
a1d207e
ARROW-13831: [GLib][Ruby] Add support for writing by Arrow Dataset
kou Sep 2, 2021
1440d5a
ARROW-13768: [R] Allow JSON to be an optional component
karldw Sep 3, 2021
a45fc3f
ARROW-13782: [C++] Add skip_nulls/min_count to tdigest/mode/quantile
lidavidm Sep 3, 2021
5ead375
ARROW-13855: [C++][Python] Implement C data interface support for ext…
pitrou Sep 3, 2021
e9251b0
ARROW-13740: [R] summarize() should not eagerly evaluate
nealrichardson Sep 3, 2021
858ac57
ARROW-13874: [R] Implement TrimOptions
thisisnic Sep 3, 2021
a49048b
ARROW-13543: [R] Handle summarize() with 0 arguments or no aggregate …
nealrichardson Sep 3, 2021
f12c18e
ARROW-13899: [Ruby] Implement slicer by compute kernels
kou Sep 4, 2021
882e8b4
MINOR: [Doc][Python] Fix a typo (#11085)
jjyao Sep 4, 2021
5d38723
ARROW-13909: [GLib] Add GArrowVarianceOptions
kou Sep 5, 2021
2588e17
ARROW-13909: [GLib] Add tests for GArrowVarianceOptions
kou Sep 5, 2021
c83db7e
ARROW-13793: [C++] Migrate ORCFileReader to Result<T>
zhjwpku Sep 6, 2021
5c5af6c
ARROW-13871: [C++] JSON reader can fail if a list array key is presen…
westonpace Sep 6, 2021
a8953de
ARROW-13845: [C++] Reconcile RandomArrayGenerator::ArrayOf implementa…
pitrou Sep 6, 2021
4390a64
ARROW-13857: [R][CI] Remove checkbashisms download
nealrichardson Sep 6, 2021
cf0e5e4
ARROW-13803: [C++] Don't read past end of buffer in BitUtil::SetBitmap
cyb70289 Sep 6, 2021
b1cfa7d
ARROW-13912: [R] TrimOptions implementation breaks test-r-minimal-bui…
nealrichardson Sep 6, 2021
303b7f4
ARROW-13915: [R][CI] R UCRT C++ bundles are incomplete
nealrichardson Sep 6, 2021
fd47183
ARROW-13913: [C++] Don't segfault if IndexOptions omitted
lidavidm Sep 6, 2021
02343c8
ARROW-13684: [C++][Compute] Strftime kernel follow-up
rok Sep 6, 2021
5876e3f
ARROW-13403: [R] Update developing.Rmd vignette
thisisnic Sep 6, 2021
4cb77a2
ARROW-13910: [Ruby] Arrow::Table#[]/Arrow::RecordBatch#[] accepts Ran…
kou Sep 6, 2021
67b5bd2
ARROW-13743: [CI] OSX job fails due to incompatible git and libcurl
kszucs Sep 7, 2021
6dc272a
ARROW-13810: [C++][Compute] Predicate IsAsciiCharacter allows invalid…
edponce Sep 7, 2021
6c7c4f0
ARROW-13671: [Dev] Fix conda recipe on Arm 64k page system
cyb70289 Sep 7, 2021
9064fa0
ARROW-12981: [R] Install source package from CRAN alone
karldw Sep 7, 2021
080a86b
Implemented review feedback and added more unit tests
wmalpica Sep 7, 2021
f40856a
ARROW-13925: [R] Remove system installation devdocs jobs
jonkeane Sep 7, 2021
85d8175
ARROW-13919: [GLib] Add GArrowFunctionDoc
kou Sep 7, 2021
e396d4f
ARROW-13872: [Java] ExtensionTypeVector does not work with RangeEqual…
BryanCutler Sep 8, 2021
57e76e8
ARROW-13921: [Python][Packaging] Pin minimum setuptools version for t…
kszucs Sep 8, 2021
97135bc
Docs + lintr fix (#11107)
jonkeane Sep 8, 2021
a081a05
checked for empty hex falues. added scalar tests
wmalpica Sep 8, 2021
170a24f
ARROW-13820: [R] Rename na.min_count to min_count and na.rm to skip_n…
nealrichardson Sep 8, 2021
7a23a07
fixed style with clang-format
wmalpica Sep 8, 2021
e5db0fc
MINOR: [R] Fix broken doc example (#11110)
nealrichardson Sep 8, 2021
9dd8b6a
implemented some improvements
wmalpica Sep 8, 2021
31f80e5
fixed clang format
wmalpica Sep 8, 2021
4666073
fixed unit test
wmalpica Sep 8, 2021
b0d89db
ARROW-13680: [C++] Create an asynchronous nursery to simplify capture…
westonpace Sep 9, 2021
4b5ed4e
ARROW-13138: [C++][R] Implement extract temporal components (year, mo…
aucahuasi Sep 9, 2021
bb1ef85
ARROW-13033: [C++] Kernel to localize naive timestamps to a timezone …
rok Sep 9, 2021
9aee524
ARROW-11885: [R] Turn off some capabilities when LIBARROW_MINIMAL=true
nealrichardson Sep 9, 2021
0c41e0b
ARROW-13842: [C++] Bump vendored date library
pitrou Sep 9, 2021
946bdcf
ARROW-13963: [Go] Minor: Add bitmap reader/writer impl from go Parque…
Sep 9, 2021
4fe6fae
ARROW-13961: [C++] Fix use of non-const references, declaration witho…
lidavidm Sep 9, 2021
66d7dd4
ARROW-13962: [R] Catch up on the NEWS
nealrichardson Sep 9, 2021
04515de
MINOR: [R] Exclude some paths from the cpp rsync
nealrichardson Sep 9, 2021
56411f5
ARROW-13940: [R] Turn on multithreading with Arrow engine queries
nealrichardson Sep 9, 2021
42d10c3
ARROW-13964: MINOR: [Go][Parquet] remove base bitmap reader/writer fr…
Sep 9, 2021
3bbec3f
ARROW-13942: [Dev] Update cmake_format usage in autotune comment bot
kou Sep 10, 2021
3db4854
ARROW-13778: [R] Handle complex summarize expressions
nealrichardson Sep 10, 2021
fa7cff6
ARROW-1565: [C++] Implement TopK/BottomK
aocsa Sep 10, 2021
bae7e2b
MINOR: [Doc][Python] Fix typo ParquetFileForma (#11137)
domoritz Sep 11, 2021
db5b848
ARROW-13979: [Go] Enable -race for go tests
Sep 12, 2021
c091e6d
ARROW-13859: [Java] Add code coverage support
laurentgo Sep 12, 2021
e8ab3ae
ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots
liyafan82 Sep 12, 2021
1049dde
ARROW-13544 [Java]: Remove APIs that have been deprecated for long (C…
liyafan82 Sep 12, 2021
74f020d
ARROW-13974: [C++] Resolve follow-up reviews for TopK/BottomK
aocsa Sep 13, 2021
293f856
ARROW-13966: [C++] Support decimals in comparisons
lidavidm Sep 13, 2021
9122149
ARROW-13937: [C++][Compute] Add explicit output values to sign functi…
edponce Sep 13, 2021
f2cb977
ARROW-13646: [Go][Parquet] adding the parquet metadata package
Sep 13, 2021
dfaa415
ARROW-13983: [C++] Avoid raising error if fadvise() isn't supported
pitrou Sep 13, 2021
0610998
ARROW-13978: [C++] Bump gtest to 1.11 to unbreak builds with recent c…
pitrou Sep 13, 2021
52904d6
ARROW-13958: [Python] Migrate Python ORC bindings to use new Result-b…
jorisvandenbossche Sep 13, 2021
376cb45
ARROW-12744: [C++][Compute] Add rounding kernel
edponce Sep 13, 2021
87b2fcd
ARROW-12087: [C++] Allow sorting durations, timestamps with timezones
lidavidm Sep 13, 2021
1cbc4a2
ARROW-13904: [R] Implement ModeOptions
thisisnic Sep 13, 2021
f3d3c68
ARROW-13905: [R] Implement ReplaceSliceOptions
thisisnic Sep 14, 2021
0b6f531
ARROW-13906: [R] Implement PartitionNthOptions
thisisnic Sep 14, 2021
672149b
ARROW-13869: [R] Implement options for non-bound MatchSubstringOption…
thisisnic Sep 14, 2021
8875d5c
ARROW-13908: [R] Implement ExtractRegexOptions
thisisnic Sep 14, 2021
f1d6811
working implementation but lacks case-insensitivity and more unit tests
wmalpica Sep 1, 2021
925b2a7
different algorithm. Added more tests and benchmarks
wmalpica Sep 2, 2021
68ec4db
Implemented review feedback and added more unit tests
wmalpica Sep 7, 2021
a538072
checked for empty hex falues. added scalar tests
wmalpica Sep 8, 2021
400d886
fixed style with clang-format
wmalpica Sep 8, 2021
9cac060
implemented some improvements
wmalpica Sep 8, 2021
6053936
fixed clang format
wmalpica Sep 8, 2021
68a6844
fixed unit test
wmalpica Sep 8, 2021
7aee4f0
Merge branch 'wmalpica/ARROW-12657' of github.com:wmalpica/arrow into…
wmalpica Sep 14, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
ARROW-13910: [Ruby] Arrow::Table#[]/Arrow::RecordBatch#[] accepts Ran…
…ge and selectors

Closes #11090 from kou/ruby-table-array-ref

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
  • Loading branch information
kou committed Sep 6, 2021
commit 4cb77a21ee3b68138c3e2bfcc8969234039ed24d
101 changes: 100 additions & 1 deletion ruby/red-arrow/lib/arrow/column-containable.rb
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,17 @@ def each_column(&block)
columns.each(&block)
end

# @overload [](name)
# Find a column that has the given name.
#
# @param name [String, Symbol] The column name to be found.
# @return [Column] The found column.
#
# @overload [](index)
# Find the `index`-th column.
#
# @param index [Integer] The index to be found.
# @return [Column] The found column.
def find_column(name_or_index)
case name_or_index
when String, Symbol
Expand All @@ -40,9 +51,97 @@ def find_column(name_or_index)
return nil if index < 0 or index >= n_columns
Column.new(self, index)
else
message = "column name or index must be String, Symbol or Integer"
message = "column name or index must be String, Symbol or Integer: "
message << name_or_index.inspect
raise ArgumentError, message
end
end

# Selects columns that are selected by `selectors` and/or `block`
# and creates a new container only with the selected columns.
#
# @param selectors [Array<String, Symbol, Integer, Range>]
# If a selector is `String`, `Symbol` or `Integer`, the selector
# selects a column by {#find_column}.
#
# If a selector is `Range`, the selector selects columns by `::Array#[]`.
# @yield [column] Gives a column to the block to select columns.
# This uses `::Array#select`.
# @yieldparam column [Column] A target column.
# @yieldreturn [Boolean] Whether the given column is selected or not.
# @return [self.class] The newly created container that only has selected
# columns.
def select_columns(*selectors, &block)
if selectors.empty?
return to_enum(__method__) unless block_given?
selected_columns = columns.select(&block)
else
selected_columns = []
selectors.each do |selector|
case selector
when Range
selected_columns.concat(columns[selector])
else
column = find_column(selector)
if column.nil?
case selector
when String, Symbol
message = "unknown column: #{selector.inspect}: #{inspect}"
raise KeyError.new(message)
else
message = "out of index (0..#{n_columns - 1}): "
message << "#{selector.inspect}: #{inspect}"
raise IndexError.new(message)
end
end
selected_columns << column
end
end
selected_columns = selected_columns.select(&block) if block_given?
end
self.class.new(selected_columns)
end

# @overload [](name)
# Find a column that has the given name.
#
# @param name [String, Symbol] The column name to be found.
# @return [Column] The found column.
# @see #find_column
#
# @overload [](index)
# Find the `index`-th column.
#
# @param index [Integer] The index to be found.
# @return [Column] The found column.
# @see #find_column
#
# @overload [](range)
# Selects columns that are in `range` and creates a new container
# only with the selected columns.
#
# @param range [Range] The range to be selected.
# @return [self.class] The newly created container that only has selected
# columns.
# @see #select_columns
#
# @overload [](selectors)
# Selects columns that are selected by `selectors` and creates a
# new container only with the selected columns.
#
# @param selectors [Array] The selectors that are used to select columns.
# @return [self.class] The newly created container that only has selected
# columns.
# @see #select_columns
def [](selector)
case selector
when ::Array
select_columns(*selector)
when Range
select_columns(selector)
else
find_column(selector)
end
end
end
end
4 changes: 2 additions & 2 deletions ruby/red-arrow/lib/arrow/map-data-type.rb
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ class MapDataType
# See {Arrow::DataType.resolve} how to specify data type
# description.
#
# @example Create a map data type for {0: "Hello", 1: "World"}
# @example Create a map data type for `{0: "Hello", 1: "World"}`
# key = :int8
# item = :string
# Arrow::MapDataType.new(key, item)
Expand All @@ -66,7 +66,7 @@ class MapDataType
# See {Arrow::DataType.resolve} how to specify data type
# description.
#
# @example Create a maap data type for {0: "Hello", 1: "World"}
# @example Create a map data type for `{0: "Hello", 1: "World"}`
# Arrow::MapDataType.new(key: :int8, item: :string)
def initialize(*args)
n_args = args.size
Expand Down
2 changes: 0 additions & 2 deletions ruby/red-arrow/lib/arrow/record-batch.rb
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,6 @@ def new(*args)
alias_method :size, :n_rows
alias_method :length, :n_rows

alias_method :[], :find_column

# Converts the record batch to {Arrow::Table}.
#
# @return [Arrow::Table]
Expand Down
37 changes: 0 additions & 37 deletions ruby/red-arrow/lib/arrow/table.rb
Original file line number Diff line number Diff line change
Expand Up @@ -195,8 +195,6 @@ def each_record_batch
alias_method :size, :n_rows
alias_method :length, :n_rows

alias_method :[], :find_column

alias_method :slice_raw, :slice

# @overload slice(offset, length)
Expand Down Expand Up @@ -397,41 +395,6 @@ def remove_column(name_or_index)
remove_column_raw(index)
end

# TODO
#
# @return [Arrow::Table]
def select_columns(*selectors, &block)
if selectors.empty?
return to_enum(__method__) unless block_given?
selected_columns = columns.select(&block)
else
selected_columns = []
selectors.each do |selector|
case selector
when String, Symbol
column = find_column(selector)
if column.nil?
message = "unknown column: #{selector.inspect}: #{inspect}"
raise KeyError.new(message)
end
selected_columns << column
when Range
selected_columns.concat(columns[selector])
else
column = columns[selector]
if column.nil?
message = "out of index (0..#{n_columns - 1}): " +
"#{selector.inspect}: #{inspect}"
raise IndexError.new(message)
end
selected_columns << column
end
end
selected_columns = selected_columns.select(&block) if block_given?
end
self.class.new(selected_columns)
end

# Experimental
def group(*keys)
Group.new(self, keys)
Expand Down
42 changes: 42 additions & 0 deletions ruby/red-arrow/test/test-record-batch.rb
Original file line number Diff line number Diff line change
Expand Up @@ -136,5 +136,47 @@ def setup
end
end
end

sub_test_case("#[]") do
def setup
@record_batch = Arrow::RecordBatch.new(a: [true],
b: [true],
c: [true],
d: [true],
e: [true],
f: [true],
g: [true])
end

test("[String]") do
assert_equal(Arrow::Column.new(@record_batch, 0),
@record_batch["a"])
end

test("[Symbol]") do
assert_equal(Arrow::Column.new(@record_batch, 1),
@record_batch[:b])
end

test("[Integer]") do
assert_equal(Arrow::Column.new(@record_batch, 6),
@record_batch[-1])
end

test("[Range]") do
assert_equal(Arrow::RecordBatch.new(d: [true],
e: [true]),
@record_batch[3..4])
end

test("[[Symbol, String, Integer, Range]]") do
assert_equal(Arrow::RecordBatch.new(c: [true],
a: [true],
g: [true],
d: [true],
e: [true]),
@record_batch[[:c, "a", -1, 3..4]])
end
end
end
end
31 changes: 28 additions & 3 deletions ruby/red-arrow/test/test-table.rb
Original file line number Diff line number Diff line change
Expand Up @@ -190,20 +190,45 @@ def setup
end

sub_test_case("#[]") do
def setup
@table = Arrow::Table.new(a: [true],
b: [true],
c: [true],
d: [true],
e: [true],
f: [true],
g: [true])
end

test("[String]") do
assert_equal(Arrow::Column.new(@table, 0),
@table["count"])
@table["a"])
end

test("[Symbol]") do
assert_equal(Arrow::Column.new(@table, 1),
@table[:visible])
@table[:b])
end

test("[Integer]") do
assert_equal(Arrow::Column.new(@table, 1),
assert_equal(Arrow::Column.new(@table, 6),
@table[-1])
end

test("[Range]") do
assert_equal(Arrow::Table.new(d: [true],
e: [true]),
@table[3..4])
end

test("[[Symbol, String, Integer, Range]]") do
assert_equal(Arrow::Table.new(c: [true],
a: [true],
g: [true],
d: [true],
e: [true]),
@table[[:c, "a", -1, 3..4]])
end
end

sub_test_case("#merge") do
Expand Down