- All relevant stringr functions now preserve names (@jonovik, #575).
str_like(ignore_case)is deprecated, withstr_like()now always case sensitive to better follow the conventions of the SQL LIKE operator (@edward-burn, #543).- In
str_replace_all(), areplacementfunction now receives all values in a single vector. This radically improves performance at the cost of breaking some existing uses (#462).
- New
vignette("locale-sensitive")about locale sensitive functions (@kylieainslie, #404) - New
str_ilike()that follows the conventions of the SQL ILIKE operator (@edward-burn, #543). - New
str_to_camel(),str_to_snake(), andstr_to_kebab()for changing "programming" case (@librill, #573 + @arnaudgallou, #593).
str_*now errors ifpatternincludes anyNAs (@nash-delcamp-slp, #546).str_dup()gains asepargument so you can add a separator between every repeated value (@edward-burn, #564).str_sub<-now gives a more informative error ifvalueis not the correct length.str_view()displays a message when called with a zero-length character vector (@LouisMPenrod, #497).- New
[[.stringr_patternmethod to match existing[.stringr_pattern(@edward-burn, #569).
R CMD checkfixes
-
Some minor documentation improvements.
-
str_trunc()now correctly truncates strings whensideis"left"or"center"(@UchidaMizuki, #512).
-
stringr functions now consistently implement the tidyverse recycling rules (#372). There are two main changes:
-
Only vectors of length 1 are recycled. Previously, (e.g.)
str_detect(letters, c("x", "y"))worked, but it now errors. -
str_c()ignoresNULLs, rather than treating them as length 0 vectors.
Additionally, many more arguments now throw errors, rather than warnings, if supplied the wrong type of input.
-
-
regex()and friends now generate class names withstringr_prefix (#384). -
str_detect(),str_starts(),str_ends()andstr_subset()now error when used with either an empty string ("") or aboundary(). These operations didn't really make sense (str_detect(x, "")returnedTRUEfor all non-empty strings) and made it easy to make mistakes when programming.
-
Many tweaks to the documentation to make it more useful and consistent.
-
New
vignette("from-base")by @sastoudt provides a comprehensive comparison between base R functions and their stringr equivalents. It's designed to help you move to stringr if you're already familiar with base R string functions (#266). -
New
str_escape()escapes regular expression metacharacters, providing an alternative tofixed()if you want to compose a pattern from user supplied strings (#408). -
New
str_equal()compares two character vectors using unicode rules, optionally ignoring case (#381). -
str_extract()can now optionally extract a capturing group instead of the complete match (#420). -
New
str_flatten_comma()is a special case ofstr_flatten()designed for comma separated flattening and can correctly apply the Oxford commas when there are only two elements (#444). -
New
str_split_1()is tailored for the special case of splitting up a single string (#409). -
New
str_split_i()extract a single piece from a string (#278, @bfgray3). -
New
str_like()allows the use of SQL wildcards (#280, @rjpat). -
New
str_rank()to complete the set of order/rank/sort functions (#353). -
New
str_sub_all()to extract multiple substrings from each string. -
New
str_unique()is a wrapper aroundstri_unique()and returns unique string values in a character vector (#249, @seasmith). -
str_view()uses ANSI colouring rather than an HTML widget (#370). This works in more places and requires fewer dependencies. It includes a number of other small improvements:- It no longer requires a pattern so you can use it to display strings with special characters.
- It highlights unusual whitespace characters.
- It's vectorised over both string
andpattern` (#407). - It defaults to displaying all matches, making
str_view_all()redundant (and hence deprecated) (#455).
-
New
str_width()returns the display width of a string (#380). -
stringr is now licensed as MIT (#351).
-
Better error message if you supply a non-string pattern (#378).
-
A new data source for
sentenceshas fixed many small errors. -
str_extract()andstr_exctract_all()now work correctly whenpatternis aboundary(). -
str_flatten()gains alastargument that optionally override the final separator (#377). It gains ana.rmargument to remove missing values (since it's a summary function) (#439). -
str_pad()gainsuse_widthargument to control whether to use the total code point width or the number of code points as "width" of a string (#190). -
str_replace()andstr_replace_all()can use standard tidyverse formula shorthand forreplacementfunction (#331). -
str_starts()andstr_ends()now correctly respect regex operator precedence (@carlganz). -
str_wrap()breaks only at whitespace by default; setwhitespace_only = FALSEto return to the previous behaviour (#335, @rjpat). -
word()now returns all the sentence when using a negativestartparameter that is greater or equal than the number of words. (@pdelboca, #245)
Hot patch release to resolve R CMD check failures.
-
str_interp()now renders lists consistently independent on the presence of additional placeholders (@amhrasmussen). -
New
str_starts()andstr_ends()functions to detect patterns at the beginning or end of strings (@jonthegeek, #258). -
str_subset(),str_detect(), andstr_which()getnegateargument, which is useful when you want the elements that do NOT match (#259, @yutannihilation). -
New
str_to_sentence()function to capitalize with sentence case (@jonthegeek, #202).
-
str_replace_all()with a named vector now respects modifier functions (#207) -
str_trunc()is once again vectorised correctly (#203, @austin3dickey). -
str_view()handlesNAvalues more gracefully (#217). I've also tweaked the sizing policy so hopefully it should work better in notebooks, while preserving the existing behaviour in knit documents (#232).
- During package build, you may see
Error : object ‘ignore.case’ is not exported by 'namespace:stringr'. This is because the long deprecatedstr_join(),ignore.case()andperl()have now been removed.
-
str_glue()andstr_glue_data()provide convenient wrappers aroundglueandglue_data()from the glue package (#157). -
str_flatten()is a wrapper aroundstri_flatten()and clearly conveys flattening a character vector into a single string (#186). -
str_remove()andstr_remove_all()functions. These wrapstr_replace()andstr_replace_all()to remove patterns from strings. (@Shians, #178) -
str_squish()removes spaces from both the left and right side of strings, and also converts multiple space (or space-like characters) to a single space within strings (@stephlocke, #197). -
str_sub()gainsomit_naargument for ignoringNA. Accordingly,str_replace()now ignoresNAs and keeps the original strings. (@yutannihilation, #164)
-
str_trunc()now preserves NAs (@ClaytonJY, #162) -
str_trunc()now throws an error whenwidthis shorter thanellipsis(@ClaytonJY, #163). -
Long deprecated
str_join(),ignore.case()andperl()have now been removed.
str_match_all()now returns NA if an optional group doesn't match (previously it returned ""). This is more consistent withstr_match()and other match failures (#134).
-
In
str_replace(),replacementcan now be a function that is called once for each match and whose return value is used to replace the match. -
New
str_which()mimicsgrep()(#129). -
A new vignette (
vignette("regular-expressions")) describes the details of the regular expressions supported by stringr. The main vignette (vignette("stringr")) has been updated to give a high-level overview of the package.
-
str_order()andstr_sort()gain explicitnumericargument for sorting mixed numbers and strings. -
str_replace_all()now throws an error ifreplacementis not a character vector. IfreplacementisNA_character_it replaces the complete string with replaces withNA(#124). -
All functions that take a locale (e.g.
str_to_lower()andstr_sort()) default to "en" (English) to ensure that the default is consistent across platforms.
-
Add sample datasets:
fruit,wordsandsentences. -
fixed(),regex(), andcoll()now throw an error if you use them with anything other than a plain string (#60). I've clarified that the replacement forperl()isregex()notregexp()(#61).boundary()has improved defaults when splitting on non-word boundaries (#58, @lmullen). -
str_detect()now can detect boundaries (by checking for astr_count()> 0) (#120).str_subset()works similarly. -
str_extract()andstr_extract_all()now work withboundary(). This is particularly useful if you want to extract logical constructs like words or sentences.str_extract_all()respects thesimplifyargument when used withfixed()matches. -
str_subset()now respects custom options forfixed()patterns (#79, @gagolews). -
str_replace()andstr_replace_all()now behave correctly when a replacement string contains$s,\\\\1, etc. (#83, #99). -
str_split()gains asimplifyargument to matchstr_extract_all()etc. -
str_view()andstr_view_all()create HTML widgets that display regular expression matches (#96). -
word()returnsNAfor indexes greater than number of words (#112).
-
stringr is now powered by stringi instead of base R regular expressions. This improves unicode and support, and makes most operations considerably faster. If you find stringr inadequate for your string processing needs, I highly recommend looking at stringi in more detail.
-
stringr gains a vignette, currently a straight forward update of the article that appeared in the R Journal.
-
str_c()now returns a zero length vector if any of its inputs are zero length vectors. This is consistent with all other functions, and standard R recycling rules. Similarly, usingstr_c("x", NA)now yieldsNA. If you want"xNA", usestr_replace_na()on the inputs. -
str_replace_all()gains a convenient syntax for applying multiple pairs of pattern and replacement to the same vector:input <- c("abc", "def") str_replace_all(input, c("[ad]" = "!", "[cf]" = "?"))
-
str_match()now returns NA if an optional group doesn't match (previously it returned ""). This is more consistent withstr_extract()and other match failures. -
New
str_subset()keeps values that match a pattern. It's a convenient wrapper forx[str_detect(x)](#21, @jiho). -
New
str_order()andstr_sort()allow you to sort and order strings in a specified locale. -
New
str_conv()to convert strings from specified encoding to UTF-8. -
New modifier
boundary()allows you to count, locate and split by character, word, line and sentence boundaries. -
The documentation got a lot of love, and very similar functions (e.g. first and all variants) are now documented together. This should hopefully make it easier to locate the function you need.
-
ignore.case(x)has been deprecated in favour offixed|regex|coll(x, ignore.case = TRUE),perl(x)has been deprecated in favour ofregex(x). -
str_join()is deprecated, please usestr_c()instead.
-
fixed path in
str_wrapexample so works for more R installations. -
remove dependency on plyr
-
Zero input to
str_split_fixedreturns 0 row matrix withncolumns -
Export
str_join
-
new modifier
perlthat switches to Perl regular expressions -
str_matchnow uses new base functionregmatchesto extract matches - this should hopefully be faster than my previous pure R algorithm
-
new
str_wrapfunction which givesstrwrapoutput in a more convenient format -
new
wordfunction extract words from a string given user defined separator (thanks to suggestion by David Cooper) -
str_locatenow returns consistent type when matching empty string (thanks to Stavros Macrakis) -
new
str_countcounts number of matches in a string. -
str_padandstr_trimreceive performance tweaks - for large vectors this should give at least a two order of magnitude speed up -
str_length returns NA for invalid multibyte strings
-
fix small bug in internal
recyclablefunction
- all functions now vectorised with respect to string, pattern (and where appropriate) replacement parameters
- fixed() function now tells stringr functions to use fixed matching, rather than escaping the regular expression. Should improve performance for large vectors.
- new ignore.case() modifier tells stringr functions to ignore case of pattern.
- str_replace renamed to str_replace_all and new str_replace function added. This makes str_replace consistent with all functions.
- new str_sub<- function (analogous to substring<-) for substring replacement
- str_sub now understands negative positions as a position from the end of the string. -1 replaces Inf as indicator for string end.
- str_pad side argument can be left, right, or both (instead of center)
- str_trim gains side argument to better match str_pad
- stringr now has a namespace and imports plyr (rather than requiring it)
- fixed() now also escapes |
- str_join() renamed to str_c()
- all functions more carefully check input and return informative error messages if not as expected.
- add invert_match() function to convert a matrix of location of matches to locations of non-matches
- add fixed() function to allow matching of fixed strings.
- str_length now returns correct results when used with factors
- str_sub now correctly replaces Inf in end argument with length of string
- new function str_split_fixed returns fixed number of splits in a character matrix
- str_split no longer uses strsplit to preserve trailing breaks