Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions cpp/src/arrow/util/nameof.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

#include <string>

namespace arrow {
namespace util {
namespace detail {

#ifdef _MSC_VER
#define ARROW_PRETTY_FUNCTION __FUNCSIG__
#else
#define ARROW_PRETTY_FUNCTION __PRETTY_FUNCTION__
#endif

template <typename T>
const char* raw() {
return ARROW_PRETTY_FUNCTION;
}

template <typename T>
size_t raw_sizeof() {
return sizeof(ARROW_PRETTY_FUNCTION);
}

#undef ARROW_PRETTY_FUNCTION

constexpr bool starts_with(char const* haystack, char const* needle) {
return needle[0] == '\0' ||
(haystack[0] == needle[0] && starts_with(haystack + 1, needle + 1));
}

constexpr size_t search(char const* haystack, char const* needle) {
return haystack[0] == '\0' || starts_with(haystack, needle)
? 0
: search(haystack + 1, needle) + 1;
}

const size_t typename_prefix = search(raw<void>(), "void");

template <typename T>
size_t struct_class_prefix() {
#ifdef _MSC_VER
return starts_with(raw<T>() + typename_prefix, "struct ")
? 7
: starts_with(raw<T>() + typename_prefix, "class ") ? 6 : 0;
#else
return 0;
#endif
}

template <typename T>
size_t typename_length() {
// raw_sizeof<T>() - raw_sizeof<void>() == (length of T's name) - strlen("void")
// (length of T's name) == raw_sizeof<T>() - raw_sizeof<void>() + strlen("void")
return raw_sizeof<T>() - raw_sizeof<void>() + 4;
}

template <typename T>
const char* typename_begin() {
return raw<T>() + struct_class_prefix<T>() + typename_prefix;
}

} // namespace detail

template <typename T>
std::string nameof() {
return {detail::typename_begin<T>(), detail::typename_length<T>()};
}

} // namespace util
} // namespace arrow
15 changes: 13 additions & 2 deletions r/src/arrow_cpp11.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
#include <vector>
#undef Free

#include "arrow/util/nameof.h"

namespace cpp11 {

template <typename T>
Expand Down Expand Up @@ -157,8 +159,17 @@ struct ns {

template <typename Pointer>
Pointer r6_to_pointer(SEXP self) {
return reinterpret_cast<Pointer>(
R_ExternalPtrAddr(Rf_findVarInFrame(self, arrow::r::symbols::xp)));
if (!Rf_inherits(self, "ArrowObject")) {
std::string type_name =
arrow::util::nameof<typename std::remove_pointer<Pointer>::type>();
cpp11::stop("Invalid R object for %s, must be an ArrowObject", type_name.c_str());
}
void* p = R_ExternalPtrAddr(Rf_findVarInFrame(self, arrow::r::symbols::xp));
if (p == nullptr) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're checking for nullptr here, can we remove the shared_ptr_is_null check inside the shared_ptr() constructor? Then we can just $new() to create the R6 objects.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't really do that because of the implicit else NULL case:

shared_ptr <- function(class, xp) {
  if (!shared_ptr_is_null(xp)) class$new(xp)
}

There are cases where we want an R NULL when the internal C++ shared pointer holds a C++ null pointer, e.g. when we call:

std::shared_ptr<Array> RecordBatch::GetColumnByName(const std::string& name) const {
  auto i = schema_->GetFieldIndex(name);
  return i == -1 ? NULLPTR : column(i);
}

For example in this tests:

test_that("[[ and $ on RecordBatch", {
  [...]
  expect_null(batch$qwerty)
  [...]
})

having the extra layer with the R function shared_ptr(T, .) maybe calling T$new(.) gives the NULL.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way would be to have the ability to directly create the R6 objects internally, either by having more code in

template <typename T>
SEXP as_sexp(const std::shared_ptr<T>& ptr) {
  return cpp11::external_pointer<std::shared_ptr<T>>(new std::shared_ptr<T>(ptr));
}

but we would need some sort of dispatch from T to the R6 object.

Or we would need to change the interface of many functions:

// [[arrow::export]]
std::shared_ptr<arrow::Array> StructArray__field(
    const std::shared_ptr<arrow::StructArray>& array, int i) {
  return array->field(i);
}

perhaps to:

// [[arrow::export]]
R6 StructArray__field(
    const std::shared_ptr<arrow::StructArray>& array, int i) {
  return R6(array->field(i), "Array");
}

but I don't think it's worth it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I missed that that check translated some responses into NULL--though I don't think that's something we do widely, and maybe it would be better to make that more explicit in order to simplify the rest of the cases.

Regarding directly returning R6 objects from the cpp code, my thought had been more like the first idea of modifying as_sexp to map T to R6 classes. Our convention is that R6 class names are generally the bare C++ class name (assuming namespaces), i.e. arrow::dataset::Dataset is Dataset R6 class, with some exceptions. So we could record a map of those exceptions (e.g. "arrow::csv::TableReader" maps to "CsvTableReader") in cpp. If I were writing this in R, it could look something like this:

get_r6_constructor <- function(cpp_class) {
  if (cpp_class %in% names(special_cases)) {
    r6_name <- special_cases[[cpp_class]]
  } else {
    r6_name <- sub("^.*::(.*)$", "\\1", cpp_class)
  }
  get(r6_name, asNamespace("arrow"))$new
}

cc @bkietz

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to specify a complete mapping T to R6 class names, rather than only listing special cases. This would be a pretty straightforward trait structure in c++

SEXP klass = Rf_getAttrib(self, R_ClassSymbol);
cpp11::stop("Invalid <%s>, external pointer to null", CHAR(STRING_ELT(klass, 0)));
}
return reinterpret_cast<Pointer>(p);
}

// T is either std::shared_ptr<U> or std::unique_ptr<U>
Expand Down
13 changes: 13 additions & 0 deletions r/tests/testthat/test-arrow.R
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,16 @@ r_only({
)
})
})

test_that("arrow gracefully fails to load objects from other sessions (ARROW-10071)", {
a <- Array$create(1:10)
tf <- tempfile(); on.exit(unlink(tf))
saveRDS(a, tf)

b <- readRDS(tf)
Comment thread
nealrichardson marked this conversation as resolved.
expect_error(b$length(), "Invalid <Array>")
})

test_that("check for an ArrowObject in functions use std::shared_ptr", {
expect_error(Array__length(1), "Invalid R object")
})