ARROW-10399: [R] Fix performance regression from cpp11::r_string#8536
ARROW-10399: [R] Fix performance regression from cpp11::r_string#8536nealrichardson wants to merge 2 commits into
Conversation
bkietz
left a comment
There was a problem hiding this comment.
cpp11::r_string(const std::string&) does not take advantage of knowing the length of the string (so the terminating null must be searched for), adds the resulting CHARSXP to the preserved list, and guards against stack unwinding originating in Rf_mkCharCE.
Rf_mkCharLenCE should probably be used in r_string constructors when appropriate.
Adding each charsxp to the preserved list is unnecessary since we're immediately passing ownership to the constructed vector, but we probably should guard against stack unwinding. We can do so outside the loop with much less overhead
|
The only other usage of |
Side note: it's not always safe to assume nuls are terminating, and R and Rcpp don't assume that. cf. #8365 |
|
Confirmed that Ben's suggestions don't hurt the performance that was recovered. |
Indeed, with this change, the test on that PR now errors again rather than silently truncating strings. |
@bkietz and I identified the performance regression I observed today, and by more-or-less reverting 55defbf#diff-090c5cff4eadd62a121e85babd186c9838055d2757670971204817eb2d96211aR297-R313 (Ben suggested renaming the function and calling GetView instead of GetString), the performance regression went away.
I haven't investigated why the
cpp11::r_stringway is so significantly slower but it's worth exploring. We should also make sure we aren't inefficiently calling that elsewhere.cc @kszucs