Skip to content

reducer assignment issues #76

@DavisVaughan

Description

@DavisVaughan

So, I've been debugging an issue all morning, and finally think I have a grasp of what's happening. It's not a fun one, so be warned.

The short version is:
I think when you assign the result of xt::reduce() back to an xt::rarray, something goes wrong, but if you assign it back to xt::xarray then coerce that to xt::rarray everything is fine. This only seems to show up when reducing over large-ish (>2500 elements) matrices.

The long version
I'm trying (and successfully!) to expose the "custom reducer" idea to the R user, so they can do something like rray_reducer(x, function(.x, .y) {.x + .y}) to get a reduced sum over a specified axis. It would be great because you can then use any R function as the reducer function (probably with some performance loss because you have to call back to R repeatedly from cpp, but im okay with that for now).

It works for small matrices (or maybe im getting lucky) but when I up the size to ~2500 elements or more, things fall apart.

The only way I can think to give you a reproducible example is to make a standalone cpp file that you can use Rcpp::sourceFile() on...so...I'm sorry about that.

This version goes to xarray first, then to rarray. It seems to work fine. In the rray_custom_reducer_impl function, with the comment // THIS IS THE PROBLEM LINE! is the section that is causing the issue. It works fine in this example because of the conversion first to xarray.

Details
// [[Rcpp::depends(xtensorrr)]]
// [[Rcpp::plugins(cpp14)]]

#include <xtensor-r/rarray.hpp>
#include <xtensor/xreducer.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xarray.hpp>
#include <Rcpp.h>
using namespace Rcpp;

// How does this work?
// 1) rray_custom_reducer_cpp() takes x, a function with 2 args, the axes, and the return type
// 2) Based on the types of x and the return type, it calls something along the lines of:
//    rray_custom_reducer_impl<TYPEOFX, RETURNTYPE>(x, f, axes);
// 3) rray_custom_reducer_impl
//    - coerces x to an rarray
//    - creates a binary_functor using f and all of the type info
//    - then makes an xreducer_functor out of that and calls reduce using it
// 4) binary_functor this is where the magic happens
//    it lets you use scoping so you can initialize it with "things" you'd like
//    it to have available when the function is called (by the overloaded () operator)
//    - So we give it the function and the SEXPTYPE return type in the constructor
//    - And then we template on the Cpp element and return type
//    - Each call to the overloaded () operator calls f and coerces the result
//      to RET_T if it can be, otherwise error


// Functor allows a kind of scoping so r_functor(left, right)
// internally knows about the R function f
template <typename ELEM_T, typename RET_T>
class binary_functor {

private:
  Rcpp::Function f;
  SEXPTYPE RTYPE;
  R_xlen_t good_length = 1;

public:
  binary_functor(Rcpp::Function f_, SEXPTYPE RTYPE_) : f(f_), RTYPE(RTYPE_) {  }

  // Also pulling the first element of the result, even if it had more than
  // one thing. Should do checking instead.
  RET_T operator () (const ELEM_T& x, const ELEM_T& y) const {

    //std::cout << "x: " << x << std::endl;
    //std::cout << "y: " << y << std::endl;

    SEXP f_res = f(x, y);

    //Rcpp::print(f_res);

    // Type check
    if (TYPEOF(f_res) != RTYPE) {
      const char* good_type = Rf_type2char(RTYPE);
      const char* bad_type = Rf_type2char(TYPEOF(f_res));
      Rf_errorcall(
        R_NilValue,
        "`.f` should return length 1 objects of class `%s`, not class `%s`.",
        good_type,
        bad_type
      );
    }

    // Length check
    if (Rf_xlength(f_res) != good_length) {
      Rf_errorcall(
        R_NilValue,
        "`.f` should return objects of length 1, not %i.",
        Rf_xlength(f_res)
      );
    }

    // Clever conversion to replace explicit use of
    // REAL() or INTEGER()
    RET_T res = ((RET_T *)DATAPTR(f_res))[0];

    //std::cout << "res: " << res << std::endl;

    return(res);
  }
};


template <typename ELEM_R_T, typename RET_R_T>
SEXP rray_custom_reducer_impl(SEXP x, Rcpp::Function f, std::vector<std::size_t> axes) {

  // Underlying Cpp type. This helper is necessary because it converts
  // rlogical->int
  using elem_t = xt::r_detail::get_underlying_value_type_r<ELEM_R_T>;
  using ret_t  = xt::r_detail::get_underlying_value_type_r<RET_R_T>;

  // Pull the SEXPTYPE of the return value from the cpp type
  // The helper is necessary because rlogical->LGLSXP has been specially registered
  static constexpr int ret_sexptype = Rcpp::traits::r_sexptype_traits<RET_R_T>::rtype;

  // Create the rarray with the type matching x
  const xt::rarray<ELEM_R_T>& x_rray = xt::rarray<ELEM_R_T>(x);

  // Create the functor
  auto r_functor = binary_functor<typename elem_t::type, typename ret_t::type>(f, ret_sexptype);
  auto xr_functor = xt::make_xreducer_functor(r_functor);

  // THIS IS THE PROBLEM LINE!
  xt::xarray<typename ret_t::type> res = xt::reduce(xr_functor, x_rray, axes);
  xt::rarray<RET_R_T> res2 = res;

  std::cout << "xarray: " << res << std::endl;
  std::cout << "rarray: " << res2 << std::endl;

  return(res2);

}

// Helper for switching on the string op
constexpr unsigned int str2int(const char* str, int h = 0) {
  return !str[h] ? 5381 : (str2int(str, h+1) * 33) ^ str[h];
}

// [[Rcpp::export]]
SEXP rray_custom_reducer_cpp(SEXP x, Rcpp::Function f, std::vector<std::size_t> axes, SEXP type_) {

  // Collect the char from the string type ("double", "integer", "logical")
  const char* type = CHAR(Rf_asChar(type_));

  // Switch on X
  switch(TYPEOF(x)) {

    // This means ELEM_T = double
    case REALSXP: {

      // Switch on return type:
      // "double" = REALSXP
      // "integer" = INTSXP
      // "logical" = LGLSXP
      switch(str2int(type)) {

        case str2int("double"): {
          return rray_custom_reducer_impl<double, double>(x, f, axes);
        }

        case str2int("integer"): {
          return rray_custom_reducer_impl<double, int>(x, f, axes);
        }

        case str2int("logical"): {
          return rray_custom_reducer_impl<double, rlogical>(x, f, axes);
        }

        default: {
          Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
        }

      }

    }

    // This means ELEM_T = int
  case INTSXP: {
    switch(str2int(type)) {

      case str2int("double"): {
        return rray_custom_reducer_impl<int, double>(x, f, axes);
      }

      case str2int("integer"): {
        return rray_custom_reducer_impl<int, int>(x, f, axes);
      }

      case str2int("logical"): {
        return rray_custom_reducer_impl<int, rlogical>(x, f, axes);
      }

      default: {
        Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
      }

    }
  }

    // This means ELEM_T = int
  case LGLSXP: {
    switch(str2int(type)) {

      case str2int("double"): {
        return rray_custom_reducer_impl<rlogical, double>(x, f, axes);
      }

      case str2int("integer"): {
        return rray_custom_reducer_impl<rlogical, int>(x, f, axes);
      }

      case str2int("logical"): {
        return rray_custom_reducer_impl<rlogical, rlogical>(x, f, axes);
      }

      default: {
        Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
      }

    }
  }

  default: {
    Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
  }

  }
}

Correct results:

Rcpp::sourceCpp("~/Desktop/test.cpp")

x <- matrix(as.double(1:5000), ncol = 2)

fn <- function(.x, .y) {.x + .y}

rray_custom_reducer_cpp(x, fn, 0L, "double")
#> [1] 3126250 9376250

standard output and standard error

xarray: { 3126250.,  9376250.}
rarray: { 3126250.,  9376250.}

Created on 2018-12-09 by the reprex package (v0.2.1.9000)


This is the not working version, even though I think it should work. It's more...unstable...than actually not working as the rarray result has variable output. The only thing I changed is that problem line, and now I just return 1 because I don't trust the output. But a key thing to look at here is the std output, where I've printed resulting rarray objects from the two calls to the same function. I had to try and run it multiple times because sometimes it just crashes...

Details
// [[Rcpp::depends(xtensorrr)]]
// [[Rcpp::plugins(cpp14)]]

#include <xtensor-r/rarray.hpp>
#include <xtensor/xreducer.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xarray.hpp>
#include <Rcpp.h>
using namespace Rcpp;

// How does this work?
// 1) rray_custom_reducer_cpp() takes x, a function with 2 args, the axes, and the return type
// 2) Based on the types of x and the return type, it calls something along the lines of:
//    rray_custom_reducer_impl<TYPEOFX, RETURNTYPE>(x, f, axes);
// 3) rray_custom_reducer_impl
//    - coerces x to an rarray
//    - creates a binary_functor using f and all of the type info
//    - then makes an xreducer_functor out of that and calls reduce using it
// 4) binary_functor this is where the magic happens
//    it lets you use scoping so you can initialize it with "things" you'd like
//    it to have available when the function is called (by the overloaded () operator)
//    - So we give it the function and the SEXPTYPE return type in the constructor
//    - And then we template on the Cpp element and return type
//    - Each call to the overloaded () operator calls f and coerces the result
//      to RET_T if it can be, otherwise error


// Functor allows a kind of scoping so r_functor(left, right)
// internally knows about the R function f
template <typename ELEM_T, typename RET_T>
class binary_functor {

private:
  Rcpp::Function f;
  SEXPTYPE RTYPE;
  R_xlen_t good_length = 1;

public:
  binary_functor(Rcpp::Function f_, SEXPTYPE RTYPE_) : f(f_), RTYPE(RTYPE_) {  }

  // Also pulling the first element of the result, even if it had more than
  // one thing. Should do checking instead.
  RET_T operator () (const ELEM_T& x, const ELEM_T& y) const {

    //std::cout << "x: " << x << std::endl;
    //std::cout << "y: " << y << std::endl;

    SEXP f_res = f(x, y);

    //Rcpp::print(f_res);

    // Type check
    if (TYPEOF(f_res) != RTYPE) {
      const char* good_type = Rf_type2char(RTYPE);
      const char* bad_type = Rf_type2char(TYPEOF(f_res));
      Rf_errorcall(
        R_NilValue,
        "`.f` should return length 1 objects of class `%s`, not class `%s`.",
        good_type,
        bad_type
      );
    }

    // Length check
    if (Rf_xlength(f_res) != good_length) {
      Rf_errorcall(
        R_NilValue,
        "`.f` should return objects of length 1, not %i.",
        Rf_xlength(f_res)
      );
    }

    // Clever conversion to replace explicit use of
    // REAL() or INTEGER()
    RET_T res = ((RET_T *)DATAPTR(f_res))[0];

    //std::cout << "res: " << res << std::endl;

    return(res);
  }
};


template <typename ELEM_R_T, typename RET_R_T>
SEXP rray_custom_reducer_impl(SEXP x, Rcpp::Function f, std::vector<std::size_t> axes) {

  // Underlying Cpp type. This helper is necessary because it converts
  // rlogical->int
  using elem_t = xt::r_detail::get_underlying_value_type_r<ELEM_R_T>;
  using ret_t  = xt::r_detail::get_underlying_value_type_r<RET_R_T>;

  // Pull the SEXPTYPE of the return value from the cpp type
  // The helper is necessary because rlogical->LGLSXP has been specially registered
  static constexpr int ret_sexptype = Rcpp::traits::r_sexptype_traits<RET_R_T>::rtype;

  // Create the rarray with the type matching x
  const xt::rarray<ELEM_R_T>& x_rray = xt::rarray<ELEM_R_T>(x);

  // Create the functor
  auto r_functor = binary_functor<typename elem_t::type, typename ret_t::type>(f, ret_sexptype);
  auto xr_functor = xt::make_xreducer_functor(r_functor);

  // THIS IS THE PROBLEM LINE!
  xt::rarray<RET_R_T> res = xt::reduce(xr_functor, x_rray, axes);

  std::cout << "rarray: " << res << std::endl;

  return(Rf_ScalarInteger(1));

}

// Helper for switching on the string op
constexpr unsigned int str2int(const char* str, int h = 0) {
  return !str[h] ? 5381 : (str2int(str, h+1) * 33) ^ str[h];
}

// [[Rcpp::export]]
SEXP rray_custom_reducer_cpp2(SEXP x, Rcpp::Function f, std::vector<std::size_t> axes, SEXP type_) {

  // Collect the char from the string type ("double", "integer", "logical")
  const char* type = CHAR(Rf_asChar(type_));

  // Switch on X
  switch(TYPEOF(x)) {

  // This means ELEM_T = double
  case REALSXP: {

    // Switch on return type:
    // "double" = REALSXP
    // "integer" = INTSXP
    // "logical" = LGLSXP
    switch(str2int(type)) {

  case str2int("double"): {
    return rray_custom_reducer_impl<double, double>(x, f, axes);
  }

  case str2int("integer"): {
    return rray_custom_reducer_impl<double, int>(x, f, axes);
  }

  case str2int("logical"): {
    return rray_custom_reducer_impl<double, rlogical>(x, f, axes);
  }

  default: {
    Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
  }

  }

  }

    // This means ELEM_T = int
  case INTSXP: {
    switch(str2int(type)) {

  case str2int("double"): {
    return rray_custom_reducer_impl<int, double>(x, f, axes);
  }

  case str2int("integer"): {
    return rray_custom_reducer_impl<int, int>(x, f, axes);
  }

  case str2int("logical"): {
    return rray_custom_reducer_impl<int, rlogical>(x, f, axes);
  }

  default: {
    Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
  }

  }
  }

    // This means ELEM_T = int
  case LGLSXP: {
    switch(str2int(type)) {

  case str2int("double"): {
    return rray_custom_reducer_impl<rlogical, double>(x, f, axes);
  }

  case str2int("integer"): {
    return rray_custom_reducer_impl<rlogical, int>(x, f, axes);
  }

  case str2int("logical"): {
    return rray_custom_reducer_impl<rlogical, rlogical>(x, f, axes);
  }

  default: {
    Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
  }

  }
  }

  default: {
    Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
  }

  }
}

Sometimes this will crash on you, sometimes it will just give odd results...

Rcpp::sourceCpp("~/Desktop/test2.cpp")

x <- matrix(as.double(1:5000), ncol = 2)

fn <- function(.x, .y) {.x + .y}

rray_custom_reducer_cpp2(x, fn, 0L, "double")
#> [1] 1

rray_custom_reducer_cpp2(x, fn, 0L, "double")
#> [1] 1

standard output and standard error

rarray: { 3126250.}
rarray: { 6.938981e-310}

Created on 2018-12-09 by the reprex package (v0.2.1.9000)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions