Skip to content

Performance regressions porting Jetscii from inline assembly to intrinsics #401

@shepmaster

Description

@shepmaster

I ported Jetscii to use stdsimd with the belief that it will be stabilized sooner 😜.

There's a stdsimd branch in case you are interested in following along at home.

The initial port is roughly 60% of the original speed:

 name                                      inline-asm ns/iter     intrinsics ns/iter     diff ns/iter  diff %  speedup
 bench::space_asciichars                   1,023,795 (5121 MB/s)  1,643,905 (3189 MB/s)       620,110  60.57%   x 0.62
 bench::space_asciichars_as_pattern        1,044,517 (5019 MB/s)  1,716,374 (3054 MB/s)       671,857  64.32%   x 0.61
 bench::space_asciichars_macro             993,105 (5279 MB/s)    1,658,466 (3161 MB/s)       665,361  67.00%   x 0.60
 bench::space_find_byte                    3,610,758 (1452 MB/s)  3,526,808 (1486 MB/s)       -83,950  -2.32%   x 1.02
 bench::space_find_char                    633,608 (8274 MB/s)    636,607 (8235 MB/s)           2,999   0.47%   x 1.00
 bench::space_find_char_set                10,600,525 (494 MB/s)  10,561,106 (496 MB/s)       -39,419  -0.37%   x 1.00
 bench::space_find_closure                 10,156,759 (516 MB/s)  10,072,882 (520 MB/s)       -83,877  -0.83%   x 1.01
 bench::space_find_string                  7,506,830 (698 MB/s)   7,507,111 (698 MB/s)            281   0.00%   x 1.00
 bench::substring_as_pattern               1,082,652 (4842 MB/s)  1,496,699 (3502 MB/s)       414,047  38.24%   x 0.72
 bench::substring_find                     1,670,638 (3138 MB/s)  1,687,034 (3107 MB/s)        16,396   0.98%   x 0.99
 bench::substring_with_cached_searcher     997,570 (5255 MB/s)    1,520,424 (3448 MB/s)       522,854  52.41%   x 0.66
 bench::substring_with_created_searcher    1,007,291 (5204 MB/s)  1,533,745 (3418 MB/s)       526,454  52.26%   x 0.66
 bench::xml_delim_3_asciichars             1,014,110 (5169 MB/s)  1,637,181 (3202 MB/s)       623,071  61.44%   x 0.62
 bench::xml_delim_3_asciichars_as_pattern  984,594 (5324 MB/s)    1,628,740 (3218 MB/s)       644,146  65.42%   x 0.60
 bench::xml_delim_3_asciichars_macro       1,023,173 (5124 MB/s)  1,623,991 (3228 MB/s)       600,818  58.72%   x 0.63
 bench::xml_delim_3_find_byte_closure      2,237,287 (2343 MB/s)  2,211,426 (2370 MB/s)       -25,861  -1.16%   x 1.01
 bench::xml_delim_3_find_char_closure      14,359,362 (365 MB/s)  14,204,971 (369 MB/s)      -154,391  -1.08%   x 1.01
 bench::xml_delim_3_find_char_set          17,588,694 (298 MB/s)  17,769,736 (295 MB/s)       181,042   1.03%   x 0.99
 bench::xml_delim_5_asciichars             1,032,586 (5077 MB/s)  1,790,343 (2928 MB/s)       757,757  73.38%   x 0.58
 bench::xml_delim_5_asciichars_as_pattern  1,034,084 (5070 MB/s)  1,612,350 (3251 MB/s)       578,266  55.92%   x 0.64
 bench::xml_delim_5_asciichars_macro       986,644 (5313 MB/s)    1,666,725 (3145 MB/s)       680,081  68.93%   x 0.59
 bench::xml_delim_5_find_byte_closure      2,257,573 (2322 MB/s)  2,408,606 (2176 MB/s)       151,033   6.69%   x 0.94
 bench::xml_delim_5_find_char_closure      8,009,474 (654 MB/s)   7,453,402 (703 MB/s)       -556,072  -6.94%   x 1.07
 bench::xml_delim_5_find_char_set          23,184,513 (226 MB/s)  23,272,996 (225 MB/s)        88,483   0.38%   x 1.00

Takeaways

  • Make sure to use #[target_feature] (and/or -C target-feature)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions