Fix broken internal links. by mlimber · Pull Request #3 · google/tcmalloc

mlimber · 2020-02-12T17:37:38Z

No description provided.

manshreck

Thanks for the PR so quickly!

ckennelly · 2020-02-14T14:33:00Z

Thanks for the pull request, @mlimber!

Can you squash/rebase this PR?

mlimber · 2020-02-14T22:25:29Z

@ckennelly: Done.

Note: I have not resolved @manshreck's comment above re: whether to .md or not to ~~.md~~. If it's ok as-is, Bob's your uncle, tally ho, and whatnot! If not, please recommend your preferred fix.

We can potentially race initialization of a new core's cache with accesses to that cache. * One thread (t1) can access its core's (i) cache, realize it is not initialized, and migrate to another core (j != i) during the initialization process. * Another thread (t2) can now try to access core i's cache and see begin = 0 and current > 0. This resembles an initialized cache condition, so the fallthrough path (where the thread would block on LowLevelOnceInit as t1 executes) is not taken. When storing the headers, we must ensure that no restartable sequence ever sees "begin < current". This is done following the pattern in TcmallocSlab<>::Drain. 1. We set begin = 0xFFFF. We FenceCpu() to ensure all restartable sequences on that core are complete. 2. We set current to the desired offset. We FenceCpu() again, to ensure any newly in-flight restartable sequences see begin > current at all times and can never see step #3's store to begin. 3. We set begin to the desired offset. begin = current > 0 and the cache can operate normally. This inserts two new TODOs for: * Consolidating both the implementation of Init and InitCPU for choosing offsets, to avoid non-lazy Init needing to do multiple iterations of FenceCpu when no races are possible. * Consolidating both the implementation of Drain and InitCPU, which are similar, but drain exposes a non-mutating interface to its callback function. PiperOrigin-RevId: 296195886 Change-Id: Iaf3a580ac8b31198b3dfab3f67819e5faf24c7e2

…om Filter The Guarded Page Allocator (aka. GWP-ASan), currently implements a relatively complex policy to filter allocations based on previously covered stack traces. The current implementation of StackTraceFilter is a fixed-size cache, where the hash of a stack trace is used to look up the entry count since the last eviction; evictions occur if different stack trace hashes map to the same cache entry, but otherwise explicit removals are not possible. The properties of this policy are difficult to reason about, and it is unclear if some of these requirements are met: 1. Prevent the pool from filling up with the same frequent or long-lived allocations: Due to the unpredictability of evictions on stack trace entry collisions, there is no guarantee that the pool will not fill up with allocations of the same source. This problem is most likely to manifest on large long-running services. 2. Ensure that GWP-ASan keeps sampling allocations (depends on #1): Since the cache has no explicit removals, once the StackTraceFilter contains all allocation stack traces possible for a program, the current policy will simply stop sampling altogether (see the "max per stack" case of the old policy). The current policy crucially relies on evictions to take place to keep sampling. This problem is most likely to manifest on small programs, such as tests. While the above are necessary properties to keep GWP-ASan working, the policy was introduced primarily to achieve this requirement: 3. Achieve diverse set of covered allocations: The current solution overfits towards rare allocations, although it is unclear to what extent due to the unpredictability of when evictions may occur. We argue that achieving requirement #3 is satisfied when both #1 and #2 are satisfied: one of GWP-ASan's main design principles is that with increasing total allocations sampled (across a fleet of machines), allocation coverage increases sufficiently to detect previously undetected bugs. In general, we know that coverage (esp. at the level available to us here) is a bad metric [1] to predict if we covered enough: we are looking for an unknown population of bugs, and recent coverage of an allocation of a given source does not provide any strong indicators that subsequent sampling of an allocation of the same source finds fewer bugs. [1] "Coverage is not strongly correlated with test suite effectiveness", https://dl.acm.org/doi/10.1145/2568225.2568271 This change simplifies the allocation filtering logic to achieve #1 and #2 with more predictable runtime properties: A. StackTraceFilter is changed to implement a thread-safe Counting Bloom Filter. A major benefit is that it allows to (a) choose parameters to achieve desirable false positive probabilities, and (b) allow for removals (by decrementing). The parameters for GuardedPageAllocator are chosen to give us relatively low false positive probabilities (10-20%) even in cases with a very diverse set of allocations in the GPA pool. See code comments for details. On top of that, DecayingStackTraceFilter adds ring-buffer based decaying of the Bloom Filter, which allows to further filter recent allocations, but will allow GWP-ASan to keep allocating if it only sees the same allocations over and over (they always eventually decay). On deallocation, an entry is now removed from the filter, which will help satisfy requirement #2: after deallocation and decaying of all same-source allocations, subsequent allocations of the same source are allowed to happen again. B. GuardedPageAllocator will filter an allocation that is currently or recently covered (i.e. currently in the pool or not yet decayed) with a probability that increases proportionally with pool utilization. Once the pool reaches more than 50% utilization, all currently or recently covered allocations will always be filtered on repeated allocation attempts. This will help satisfy requirements #1. Overall, the new implementation is simpler and easier to reason about. The implementation is based on what is implemented in Linux's KFENCE: https://lore.kernel.org/all/20210923104803.2620285-4-elver@google.com/ PiperOrigin-RevId: 641968555 Change-Id: Ie73bf5b06ce65fb42667e28c69f09daca0ea395e

manshreck reviewed Feb 12, 2020

View reviewed changes

Comment thread docs/quickstart.md

manshreck self-assigned this Feb 12, 2020

manshreck reviewed Feb 12, 2020

View reviewed changes

Comment thread docs/quickstart.md

manshreck reviewed Feb 12, 2020

View reviewed changes

Fix internal links

c0c8bcc

ckennelly merged commit 1676100 into google:master Feb 17, 2020

ckennelly mentioned this pull request Feb 17, 2020

Fix a broken link in quickstart.md #14

Closed

avieira-arm mentioned this pull request Jan 25, 2021

Migrate AArch64's RSEQ Push and Pop implementations to inline assembly #65

Merged

Richard-Feng mentioned this pull request Jun 3, 2021

unexpected crash when upgrade from 2.0 to 2.6.3 #84

Closed

YanaiEliyahu mentioned this pull request Mar 1, 2022

deadlock with dl_iterate_phdr #120

Closed

lhsoft mentioned this pull request Feb 24, 2025

Segmentation fault when sample small allocation #275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix broken internal links.#3

Fix broken internal links.#3
ckennelly merged 1 commit into
google:masterfrom
mlimber:master

mlimber commented Feb 12, 2020

Uh oh!

Uh oh!

Uh oh!

manshreck left a comment

Uh oh!

ckennelly commented Feb 14, 2020

Uh oh!

mlimber commented Feb 14, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mlimber commented Feb 12, 2020

Uh oh!

Uh oh!

Uh oh!

manshreck left a comment

Choose a reason for hiding this comment

Uh oh!

ckennelly commented Feb 14, 2020

Uh oh!

mlimber commented Feb 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mlimber commented Feb 14, 2020 •

edited

Loading