win: improve PATH search worst-case performance#5120
Conversation
vtjnash
left a comment
There was a problem hiding this comment.
I'm not too keen on reimplementing this lookup manually in libuv. I expect that FindNextFileW can often end up much slower since it always has to read the whole directory, instead of using the btree for guaranteed performance.
It should be possible to check this assumption.
Test codefindfirst.c #include <windows.h>
#include <stdio.h>
int main() {
HANDLE find;
WIN32_FIND_DATAW find_data;
int num_iterations = 0;
find = FindFirstFileExW(L"foo*", FindExInfoBasic, &find_data, FindExSearchNameMatch, NULL, 0);
if (find == INVALID_HANDLE_VALUE) {
goto end;
}
do {
if (find_data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) continue;
num_iterations++;
} while (FindNextFileW(find, &find_data));
FindClose(find);
end:
printf("%d\n", num_iterations);
return 0;
}getattr.c populate.zig (to generate the 1 million files) Results: The two approaches perform the same, so it appears that there are likely some optimizations for trailing wildcards in the However, this does introduce the possibility for a new worst-case whenever there are a ton of files that match the wildcard search.
Test codepopulate-foo.zig const std = @import("std");
pub fn main(init: std.process.Init) !void {
const io = init.io;
const random_bytes_count = 12;
const sub_path_len = comptime std.base64.url_safe.Encoder.calcSize(random_bytes_count);
var sub_path: [sub_path_len + 3]u8 = undefined;
@memcpy(sub_path[0..3], "foo");
for (0..100_000) |_| {
var random_bytes: [random_bytes_count]u8 = undefined;
io.random(&random_bytes);
_ = std.base64.url_safe.Encoder.encode(sub_path[3..], &random_bytes);
const file = try std.Io.Dir.cwd().createFile(io, sub_path[0..], .{});
defer file.close(io);
}
}Now there's an extreme effect: For context, this whole approach was inspired from using test.bat Running for every entry in Something I missed, though, is that it doesn't fall into this worst-case trap when there are a ton of files that match the wildcard--it only uses the wildcard to determine if any match exists, and then it switches to non-wildcard searching: The use of Modified test.bat to only check the CWD rather than all of PATH: Ultimately, I think it should be possible to get a best-of-both-worlds out of this. Here's what I'm thinking:
Will try that out and report back, unless you still prefer the "N |
|
People canonically assume it will be PATHEXT (https://gitlab.kitware.com/cmake/cmake/-/work_items/23992), which is around 11 items or more: https://superuser.com/questions/1027078/what-is-the-default-value-of-the-pathext-environment-variable-for-windows. I like that option, though I don't think we should necessarily try to use the result either, since it would only be sufficient if it matches the first entry in the list, but the first entry in PATHEXT is |
|
Ah, right, without full iteration we can't fully know if a higher priority entry exists (
EDIT: Getting |
Take advantage of FindFirstFileExW and wildcards to reduce the number of Win32 API calls used per-directory when searching the PATH. Instead of <number of allowed extensions> GetFileAttributesW calls per directory in the PATH, each entry is now first checked for viability using a FindFirstFileExW call, and then only if matches are found will it check each possibility using GetFileAttributesW. This does not affect average case performance at all, but it does make an impact when both (a) many extensions are allowed, and (b) the file is not found early in the PATH search. The impact also scales with the size of the PATH (the more entries, the larger the potential impact).
498d3d8 to
362caa9
Compare
|
Went with the simpler approach (just use FindFirstFileExW to check for viability, ignore the result otherwise). All the info (and benchmark results) in the OP is still accurate (just made some minor updates to the description). (btw in terms of order of operations, I think it'd make the most sense to merge #5096 first since this PR kind of assumes those changes are going to be made [support for finding EDIT: For completeness, here's some confirmation that this approach avoids the worst case: Test codehybrid.c #include <windows.h>
#include <stdio.h>
int main() {
HANDLE find;
WIN32_FIND_DATAW find_data;
DWORD attrs;
int num_iterations = 0;
find = FindFirstFileExW(L"foo*", FindExInfoBasic, &find_data, FindExSearchNameMatch, NULL, 0);
if (find == INVALID_HANDLE_VALUE) {
goto end;
}
attrs = GetFileAttributesW(L"foo");
attrs = GetFileAttributesW(L"foo.com");
attrs = GetFileAttributesW(L"foo.exe");
attrs = GetFileAttributesW(L"foo.bat");
attrs = GetFileAttributesW(L"foo.cmd");
end:
printf("6\n");
return 0;
} |
Take advantage of FindFirstFileExW and wildcards to reduce the number of Win32 API calls used per-directory when searching the PATH. Instead of GetFileAttributesW calls per directory in the PATH, each entry is now first checked for viability using a FindFirstFileExW call, and then only if matches are found will it check each possibility using GetFileAttributesW.
This does not affect average case performance at all, but it does make an impact when both (a) many extensions are allowed, and (b) the file is not found early in the PATH search. The impact also scales with the size of the PATH (the more entries, the larger the potential impact).
benchmark code
For my testing,
hello.exeis found in the 25th entry inPATH.No difference with default spawn flags:
With the changes in #5096 and
UV_PROCESS_WINDOWS_RESOLVE_BATCHset, though, there is a difference since the extra 2 possible extensions add2 * <number of path entries searched>extraGetFileAttributesWcalls:The motivation for this is twofold:
options->filehas trailing.and space character(s)..and space safely ziglang/zig#23363 for contextI'm fairly certain the implementation could be further improved by swapping out
FindFirstFileExW/etc for aCreateFileW+ the lower-levelNtQueryDirectoryFilesince it allows for more control over the buffers used, memory allocated, etc. (and this is what Libuv'sscandiralready does). I'll probably try that out, but I don't think it needs to be a blocker.EDIT: I tried this and it didn't affect performance at all