Skip to content

(core::str) Boyer-Moore string searching#1932

Closed
killerswan wants to merge 14 commits into
rust-lang:masterfrom
killerswan:misc2
Closed

(core::str) Boyer-Moore string searching#1932
killerswan wants to merge 14 commits into
rust-lang:masterfrom
killerswan:misc2

Conversation

@killerswan

Copy link
Copy Markdown
Contributor

Here, I've added a Boyer-Moore string search. It computes a pair of tables based on the "needle" being searched for, and then uses them to go faster through the "haystack"...

I've used it within str::find_str_between and str::iter_matches, and also added these functions:

  • str::findn_str_between
  • str::findn_str
  • vec::ends_with
  • str::chars_iteri

@marijnh

marijnh commented Mar 6, 2012

Copy link
Copy Markdown
Contributor

Did you benchmark on small strings? I would expect that for anything shorter than than a hundred characters, a naive search (which doesn't allocate anything) is much faster than Boyer-Moore. Might be worthwhile to conditionalize the finding functions to pick an algorithm based on the haystack size (after informed benchmarking, of course).

@killerswan

Copy link
Copy Markdown
Contributor Author

Yeah, that definitely needs doing. (I've benchmarked enough to know it will be worth it.)

@ghost ghost assigned brson Mar 8, 2012
@catamorphism

Copy link
Copy Markdown
Contributor

@brson -- hopefully you can do any remaining review/merging since you've merged other pull requests for core::str.

@brson

brson commented Mar 8, 2012

Copy link
Copy Markdown
Contributor

I'll go ahead and merge this and find a reasonable cutoff point to switch from naive search to boyer-moore.

@brson

brson commented Mar 8, 2012

Copy link
Copy Markdown
Contributor

This needs some tuning.

In my cursory testing the performance is significantly worse than the naive implementation. I think we need some measurements that show when it makes sense to switch to boyer-moore.

This test case doesn't terminate before I get bored and kill it: https://gist.github.com/2003993

@killerswan

Copy link
Copy Markdown
Contributor Author

Hmm, I expected large needles in small haystacks to be unimpressive, but that's worse than I expected! I bet a zero-copy slice would fix everything, but in the meantime I have several other ideas...

I've just been busy at work this week, though.

@killerswan

Copy link
Copy Markdown
Contributor Author

In progress. Would you all rather I close this and re-request later, or let this sit in the queue while we fiddle with this, or what?

@marijnh

marijnh commented Mar 9, 2012

Copy link
Copy Markdown
Contributor

Feel free to leave the pull req open -- unless you're planning to let this sit for a month or so.

@brson

brson commented Apr 1, 2012

Copy link
Copy Markdown
Contributor

@killerswan what's the status? it looks like you've made some improvements

@killerswan

Copy link
Copy Markdown
Contributor Author

@brson Yeah... but I've now done enough testing to know that the average "needle" and "haystack" in a make check for Rust, for example, are so small (8.2 and 18.2 bytes, with the very largest haystack at only 1604 bytes), that I think maybe no variation of this is worth including in the library right now...

I'm going to put this and my test code in another repository, and close the pull request. (Edit, for reference: https://github.com/killerswan/boyer-moore-search).

@killerswan killerswan closed this Apr 1, 2012
@brson

brson commented Apr 1, 2012

Copy link
Copy Markdown
Contributor

@killerswan That's a bummer, but it happens. I have many branches that will never go anywhere because they didn't pay off like I expected. Thanks for the effort though.

@huonw huonw mentioned this pull request May 11, 2014
celinval added a commit to celinval/rust-dev that referenced this pull request Jun 4, 2024
Kobzol pushed a commit to Kobzol/rust that referenced this pull request Nov 2, 2025
Use SIMD intrinsics for `vfmaddsubph` and `vfmsubaddph`
github-actions Bot pushed a commit to model-checking/verify-rust-std that referenced this pull request Nov 30, 2025
Use SIMD intrinsics for `vfmaddsubph` and `vfmsubaddph`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants