Skip to content

Fix broken internal links.#3

Merged
ckennelly merged 1 commit into
google:masterfrom
mlimber:master
Feb 17, 2020
Merged

Fix broken internal links.#3
ckennelly merged 1 commit into
google:masterfrom
mlimber:master

Conversation

@mlimber

@mlimber mlimber commented Feb 12, 2020

Copy link
Copy Markdown
Contributor

No description provided.

Comment thread docs/quickstart.md
@manshreck manshreck self-assigned this Feb 12, 2020
Comment thread docs/quickstart.md

@manshreck manshreck left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR so quickly!

@ckennelly

Copy link
Copy Markdown
Collaborator

Thanks for the pull request, @mlimber!

Can you squash/rebase this PR?

@mlimber

mlimber commented Feb 14, 2020

Copy link
Copy Markdown
Contributor Author

@ckennelly: Done.

Note: I have not resolved @manshreck's comment above re: whether to .md or not to .md. If it's ok as-is, Bob's your uncle, tally ho, and whatnot! If not, please recommend your preferred fix.

@ckennelly ckennelly merged commit 1676100 into google:master Feb 17, 2020
copybara-service Bot pushed a commit that referenced this pull request Feb 20, 2020
We can potentially race initialization of a new core's cache with accesses to
that cache.
* One thread (t1) can access its core's (i) cache, realize it is not
  initialized, and migrate to another core (j != i) during the initialization
  process.
* Another thread (t2) can now try to access core i's cache and see
  begin = 0 and current > 0.  This resembles an initialized cache condition, so
  the fallthrough path (where the thread would block on LowLevelOnceInit as t1
  executes) is not taken.

When storing the headers, we must ensure that no restartable sequence ever sees
"begin < current".  This is done following the pattern in
TcmallocSlab<>::Drain.
1. We set begin = 0xFFFF.  We FenceCpu() to ensure all restartable sequences on
   that core are complete.
2. We set current to the desired offset.  We FenceCpu() again, to ensure any
   newly in-flight restartable sequences see begin > current at all times and
   can never see step #3's store to begin.
3. We set begin to the desired offset.  begin = current > 0 and the cache can
   operate normally.

This inserts two new TODOs for:
* Consolidating both the implementation of Init and InitCPU for choosing
  offsets, to avoid non-lazy Init needing to do multiple iterations of
  FenceCpu when no races are possible.
* Consolidating both the implementation of Drain and InitCPU, which are
  similar, but drain exposes a non-mutating interface to its callback function.

PiperOrigin-RevId: 296195886
Change-Id: Iaf3a580ac8b31198b3dfab3f67819e5faf24c7e2
copybara-service Bot pushed a commit that referenced this pull request Jun 10, 2024
…om Filter

The Guarded Page Allocator (aka. GWP-ASan), currently implements a relatively
complex policy to filter allocations based on previously covered stack traces.
The current implementation of StackTraceFilter is a fixed-size cache, where the
hash of a stack trace is used to look up the entry count since the last
eviction; evictions occur if different stack trace hashes map to the same cache
entry, but otherwise explicit removals are not possible.

The properties of this policy are difficult to reason about, and it is unclear
if some of these requirements are met:

1. Prevent the pool from filling up with the same frequent or long-lived
   allocations: Due to the unpredictability of evictions on stack trace entry
   collisions, there is no guarantee that the pool will not fill up with
   allocations of the same source. This problem is most likely to manifest on
   large long-running services.

2. Ensure that GWP-ASan keeps sampling allocations (depends on #1): Since the
   cache has no explicit removals, once the StackTraceFilter contains all
   allocation stack traces possible for a program, the current policy will
   simply stop sampling altogether (see the "max per stack" case of the old
   policy). The current policy crucially relies on evictions to take place to
   keep sampling. This problem is most likely to manifest on small programs,
   such as tests.

While the above are necessary properties to keep GWP-ASan working, the policy
was introduced primarily to achieve this requirement:

3. Achieve diverse set of covered allocations: The current solution overfits
   towards rare allocations, although it is unclear to what extent due to the
   unpredictability of when evictions may occur.

We argue that achieving requirement #3 is satisfied when both #1 and #2 are
satisfied: one of GWP-ASan's main design principles is that with increasing
total allocations sampled (across a fleet of machines), allocation coverage
increases sufficiently to detect previously undetected bugs.

In general, we know that coverage (esp. at the level available to us here) is a
bad metric [1] to predict if we covered enough: we are looking for an unknown
population of bugs, and recent coverage of an allocation of a given source does
not provide any strong indicators that subsequent sampling of an allocation of
the same source finds fewer bugs.

[1] "Coverage is not strongly correlated with test suite effectiveness",
    https://dl.acm.org/doi/10.1145/2568225.2568271

This change simplifies the allocation filtering logic to achieve #1 and #2 with
more predictable runtime properties:

A. StackTraceFilter is changed to implement a thread-safe Counting Bloom Filter.
   A major benefit is that it allows to (a) choose parameters to achieve
   desirable false positive probabilities, and (b) allow for removals (by
   decrementing). The parameters for GuardedPageAllocator are chosen to give us
   relatively low false positive probabilities (10-20%) even in cases with a very
   diverse set of allocations in the GPA pool. See code comments for details.

   On top of that, DecayingStackTraceFilter adds ring-buffer based decaying of
   the Bloom Filter, which allows to further filter recent allocations, but
   will allow GWP-ASan to keep allocating if it only sees the same allocations
   over and over (they always eventually decay).

   On deallocation, an entry is now removed from the filter, which will help
   satisfy requirement #2: after deallocation and decaying of all same-source
   allocations, subsequent allocations of the same source are allowed to happen
   again.

B. GuardedPageAllocator will filter an allocation that is currently or recently
   covered (i.e. currently in the pool or not yet decayed) with a probability
   that increases proportionally with pool utilization. Once the pool reaches
   more than 50% utilization, all currently or recently covered allocations
   will always be filtered on repeated allocation attempts. This will help
   satisfy requirements #1.

Overall, the new implementation is simpler and easier to reason about.

The implementation is based on what is implemented in Linux's KFENCE:
https://lore.kernel.org/all/20210923104803.2620285-4-elver@google.com/

PiperOrigin-RevId: 641968555
Change-Id: Ie73bf5b06ce65fb42667e28c69f09daca0ea395e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants