Skip to content

[DSC] Add a Strings table to the shared cache triage view#8277

Open
bdash wants to merge 3 commits into
devfrom
test_dsc_strings
Open

[DSC] Add a Strings table to the shared cache triage view#8277
bdash wants to merge 3 commits into
devfrom
test_dsc_strings

Conversation

@bdash

@bdash bdash commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

The table displays strings from every image in the shared cache. Double-clicking on a string loads its corresponding image or region and navigates to it.

The entire shared cache is scanned asynchronously on the worker pool to discover strings when the Strings tab is first selected. Table filtering and sorting is performed on background threads due to the sheer amount of data this table contains (over 15 millions of rows). Data is discarded shortly after the strings table is hidden.

This change introduces a BackgroundSortFilterRows helper class that handles sorting and filtering of flat UI models on background threads, and a TriageTablePanel that encapsulates a lot of the logic that is common across DSC's different panes. These are both used by the new strings table, and I updated the DSC's symbols table to use them as well.

bdash added 3 commits June 17, 2026 16:15
The table displays strings from every image in the shared cache.
Double-clicking on a string loads its corresponding image or region and
navigates to it.

The entire shared cache is scanned asynchronously on the worker pool to
discover strings when the Strings tab is first selected. Table filtering
and sorting is performed on background threads due to the sheer amount
of data this table contains (over 15 millions of rows). Data is
discarded shortly after the strings table is hidden.
This gives it asynchronous loading, filtering, and sorting, along with
lazy loading and releasing the table data after the view is hidden.
@bdash bdash requested a review from fuzyll June 18, 2026 16:08

@fuzyll fuzyll left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of questions and a few potential bugs. Ironically, the two things I am most concerned about aren't even in these code changes and are just things I saw when scrolling around and reading. 🙃

Good work regardless, thanks for taking this on. 🙂

size_t comparisons = 0;
try
{
std::sort(display.begin() + chunk.first, display.begin() + chunk.second, guarded(comparisons));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to have non-deterministic ordering here? If I'm following this correctly, results will arrive in the order in which workers complete. Since there's no tie-break on address/length/something else, wouldn't this have ties ordered by worker completion?

(Even if I'm right, I don't know how often this would occur, whether this would be noticeable, etc. Just an observation.)

// leaves the shared result empty, which commitSortJob never reads.
auto future = QtConcurrent::run(
[this, filterGeneration, sortGeneration, comparator, result, display = m_display]() mutable {
if (auto sorted = parallelSort(std::move(display), m_rows, comparator, filterGeneration,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because parallelSort uses blockingMap, if we're thread-starved, I think this could wind up in a state where this run would occupy one thread and then not be able to start the inner thread for parallelSort? Does Qt handle this situation for us? I genuinely don't know how QtConcurrent works and https://doc.qt.io/qt-6/qtconcurrentmap.html wasn't super helpful when trying to figure this out.

const uint64_t filterGeneration = m_filterGeneration.load();
PredicateFactory factory = m_filterFactory;

constexpr size_t chunkSize = 1024 * 1024;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be defined somewhere more visible/configurable?

raw_length: int
text: str
region_start: int
image_start: int

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In sharedcache.cpp, you state that "A zeroed imageStart means the region is not associated with an image". Should this detail be surfaced to Python as well in some capacity (even if just a doc comment)?

case Utf16String:
{
char* converted = BNUnicodeUTF16ToUTF8(data, std::min(ref.length, maxLength * 2));
std::string text(converted);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we fail to convert here, we'd try to construct a std::string from a nullptr. This seems like it might only happen if we fail to allocate the string? Which, is probably not likely to happen, but we still might want to check.

This caused me to feel like a fraud because it turns out I have no idea what actually happens if you try and construct a string from nullptr. Fortunately, it appears C++ also had no idea what should happen, but C++23 appears to be fixing this.

if (!region.has_value())
return;

QPointer<QFutureWatcher<bool>> watcher = new QFutureWatcher<bool>(this);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we leak this watcher here?

}

dialog->exec();
promptToLoadImage(image->name, string.address, string.address);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this works just fine. But, I was surprised it wasn't a bug. I think this could be clearer/more correct if we passed *string.imageStart or image->headerAddress since we're loading the image.

{
case Qt::DisplayRole:
{
const auto& string = stringAt(index.row());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is checking .isValid() enough to prevent issues here? https://doc.qt.io/qt-6/qmodelindex.html#isValid suggests this just means it's non-negative, not that it actually fits within the correct upper bound. I don't know that this could ever crash, but it might be possible to return wrong results if the model is ever stale?

{
auto symbol = symbolAt(index.row());
auto symbolType = GetSymbolTypeAsString(symbol.type);
const auto& symbol = symbolAt(index.row());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here regarding .isValid.

QString ImageNameLookup::widestImageName() const
{
QString widest;
for (const auto& [_, name] : m_state->imageNames)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image column can display region names for non-image rows, but we only consider image names for sizing. In practice, this is probably fine..?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants