Skip to content

Lexer improvements#99884

Merged
bors merged 7 commits into
rust-lang:masterfrom
nnethercote:lexer-improvements
Aug 1, 2022
Merged

Lexer improvements#99884
bors merged 7 commits into
rust-lang:masterfrom
nnethercote:lexer-improvements

Conversation

@nnethercote

Copy link
Copy Markdown
Contributor

Some cleanups and small speed improvements.

r? @matklad

@rustbot rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Jul 29, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 29, 2022
@rustbot

rustbot commented Jul 29, 2022

Copy link
Copy Markdown
Collaborator

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

@nnethercote

Copy link
Copy Markdown
Contributor Author

Here are some instruction count results I got locally, for some rustc-perf benchmarks and some other benchmarks.

Primary benchmarks

Benchmark Profile Scenario % Change Significance Factor?
libc-0.2.124 check full -0.38% 1.91x
stm32f4-0.14.0 check full -0.34% 1.70x
unicode-normalization-0.1.19 check full -0.31% 1.57x

Secondary and external benchmarks

Benchmark Profile Scenario % Change Significance Factor?
web-sys-0.3.56 check full -2.91% 14.57x
externs check full -1.16% 5.80x
unicode_categories-0.1.1 check full -0.75% 3.75x
pest-2.1.3 check full -0.70% 3.50x
unused-warnings check full -0.56% 2.82x
deep-vector check full -0.54% 2.71x
coercions check full -0.54% 2.69x
tuple-stress check full -0.47% 2.35x
unicode-xid-0.2.3 check full -0.41% 2.05x
structopt-0.3.26-2 check full -0.38% 1.90x
ucd check full -0.38% 1.89x
log-0.4.16 check full -0.35% 1.76x

@bors try @rust-timer queue

@rust-timer

Copy link
Copy Markdown
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 29, 2022
@bors

bors commented Jul 29, 2022

Copy link
Copy Markdown
Collaborator

⌛ Trying commit 37ec8903bd0fffe545458aaa819bd1749ca8f4bb with merge 12221390288e64d968f906cf54b356a5e3d010fe...

@bors

bors commented Jul 29, 2022

Copy link
Copy Markdown
Collaborator

💥 Test timed out

Comment thread src/librustdoc/html/highlight.rs Outdated
@rust-log-analyzer

This comment has been minimized.

@matklad matklad left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically looks good to me, love the usize -> u32 change.

I am slightly less sure about boxing errors -- I understand that this shrinks the size_of::<Token>, but this makes the type !Copy, so call-sites would suffer from some code-bloat due to drop-glue. The ultimate decision here should probably be dictated by the benchmark, but I wish we could have find some more elegant solution here.

Perhaps, rather than returning an error outright, we can just bool signal that there's an error, and provide a separate helper function to re-lex the offending token for error-reporting purposes?

@nnethercote

Copy link
Copy Markdown
Contributor Author

Perhaps, rather than returning an error outright, we can just bool signal that there's an error, and provide a separate helper function to re-lex the offending token for error-reporting purposes?

Definitely possible, there is already some partial re-lexing when the tokens are "cooked" in rustc_parser, which is also when the errors are reported. It's possible to get size of Token down to one byte if you push hard enough, though that's probably excessive. I did a test run earlier with a size of 8 bytes by just disabling the string errors and it shaved off another 0.5% in the best case. I'll do something in this direction on Monday.

@nnethercote

Copy link
Copy Markdown
Contributor Author

Ok, I have done the re-lexing thing, getting the size of Token down to 12 bytes.

@bors try @rust-timer queue

@rust-timer

Copy link
Copy Markdown
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@bors

bors commented Jul 29, 2022

Copy link
Copy Markdown
Collaborator

⌛ Trying commit c9907b53dd331ee60d14aab76b22ee85fe4f39f9 with merge 6ea3a126c47fe2d95f163775720572ac11dc8846...

@bors

bors commented Jul 30, 2022

Copy link
Copy Markdown
Collaborator

☀️ Try build successful - checks-actions
Build commit: 6ea3a126c47fe2d95f163775720572ac11dc8846 (6ea3a126c47fe2d95f163775720572ac11dc8846)

@rust-timer

Copy link
Copy Markdown
Collaborator

Queued 6ea3a126c47fe2d95f163775720572ac11dc8846 with parent 3924dac, future comparison URL.

@rust-timer

Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (6ea3a126c47fe2d95f163775720572ac11dc8846): comparison url.

Instruction count

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-0.3% -0.7% 48
Improvements 🎉
(secondary)
-0.5% -1.2% 49
All 😿🎉 (primary) -0.3% -0.7% 48

Max RSS (memory usage)

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 😿 relevant regression found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
2.1% 2.1% 1
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) N/A N/A 0

Cycles

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
2.2% 2.2% 1
Regressions 😿
(secondary)
3.3% 4.5% 6
Improvements 🎉
(primary)
-3.8% -3.8% 1
Improvements 🎉
(secondary)
-2.6% -2.9% 2
All 😿🎉 (primary) -0.8% -3.8% 2

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 30, 2022
Comment thread compiler/rustc_lexer/src/lib.rs Outdated
Avoid doing stuff until it's necessary.
It not needed, always being set to the end of the text.
Because it's tiny and hot.
Because it's small and hot.
From 72 bytes to 12 bytes (on x86-64).

There are two parts to this:
- Changing various source code offsets from 64-bit to 32-bit. This is
  not a problem because the rest of rustc also uses 32-bit source code
  offsets. This means `Token` is no longer `Copy` but this causes no
  problems.
- Removing the `RawStrError` from `LiteralKind`. Raw string literal
  invalidity is now indicated by a `None` value within
  `RawStr`/`RawByteStr`, and the new `validate_raw_str` function can be
  used to re-lex an invalid raw string literal to get the `RawStrError`.

There is one very small change in behaviour. Previously, if a raw string
literal matched both the `InvalidStarter` and `TooManyHashes` cases,
the latter would override the former. This has now changed, because
`raw_double_quoted_string` now uses `?` and so returns immediately upon
detecting the `InvalidStarter` case. I think this is a slight
improvement to report the earlier-detected error, and it explains the
change in the `test_too_many_hashes` test.

The commit also removes a couple of comments that refer to rust-lang#77629 and
say that the size of these types don't affect performance. These
comments are wrong, though the performance effect is small.
@nnethercote

Copy link
Copy Markdown
Contributor Author

I have rebased to fix the conflicts.

@matklad matklad left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, with validate_raw_str this looks very much nicer, thanks!

@bors r+

// This assertion is in this crate, rather than in `rustc_lexer`, because that
// crate cannot depend on `rustc_data_structures`.
#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
rustc_data_structures::static_assert_size!(rustc_lexer::Token, 12);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to open-code this in the lexer crate, it's not too annoying these days:

#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
const _: () = if std::mem::size_of::<Token>() != 12 { panic!() } else { () };

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const _: () = assert!(std::mem::size_of::<Token> == 12);

Can't use assert_eq!() unfortunately.

@matklad

matklad commented Aug 1, 2022

Copy link
Copy Markdown
Contributor

(I am also not sure if I have bors these days)

@lqd

lqd commented Aug 1, 2022

Copy link
Copy Markdown
Member

I don't think it works in GH review comments though

@bors r=matklad

@bors

bors commented Aug 1, 2022

Copy link
Copy Markdown
Collaborator

📌 Commit 99f5c79 has been approved by matklad

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 1, 2022
@bors

bors commented Aug 1, 2022

Copy link
Copy Markdown
Collaborator

⌛ Testing commit 99f5c79 with merge dcb444a...

@bors

bors commented Aug 1, 2022

Copy link
Copy Markdown
Collaborator

☀️ Test successful - checks-actions
Approved by: matklad
Pushing dcb444a to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 1, 2022
@bors bors merged commit dcb444a into rust-lang:master Aug 1, 2022
@rustbot rustbot added this to the 1.64.0 milestone Aug 1, 2022
@rust-timer

Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (dcb444a): comparison url.

Instruction count

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-0.3% -0.6% 42
Improvements 🎉
(secondary)
-0.5% -1.2% 48
All 😿🎉 (primary) -0.3% -0.6% 42

Max RSS (memory usage)

Results
  • Primary benchmarks: 😿 relevant regression found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
5.4% 5.4% 1
Regressions 😿
(secondary)
4.8% 8.1% 2
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) 5.4% 5.4% 1

Cycles

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-2.8% -3.3% 3
All 😿🎉 (primary) N/A N/A 0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@nnethercote nnethercote deleted the lexer-improvements branch August 9, 2022 03:44
flip1995 pushed a commit to flip1995/rust that referenced this pull request Aug 11, 2022
…klad

Lexer improvements

Some cleanups and small speed improvements.

r? `@matklad`
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 28, 2022
… r=matklad

More lexer improvements

A follow-up to rust-lang#99884.

r? `@matklad`
Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023
… r=matklad

More lexer improvements

A follow-up to rust-lang#99884.

r? `@matklad`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.