Improve Duplicate File Insight Coverage#519
Conversation
| for f in files: | ||
| for f in all_files: | ||
| # Skip synthetic "/Other" nodes (residual bytes from .car files) | ||
| if f.path.endswith("/Other"): |
There was a problem hiding this comment.
super noisy case i found after flattening the files. It will report every Assets.car and Assets.car/Other as duplicates of each other
There was a problem hiding this comment.
Can you confirm that if the entire Assets.car is duplicated, we only show the asset catalog as duplicated and not all of the image entries inside it?
There was a problem hiding this comment.
good catch, i've updated the code to handle this better
| for f in files: | ||
| for f in all_files: | ||
| # Skip synthetic "/Other" nodes (residual bytes from .car files) | ||
| if f.path.endswith("/Other"): |
There was a problem hiding this comment.
Can you confirm that if the entire Assets.car is duplicated, we only show the asset catalog as duplicated and not all of the image entries inside it?
| privacy_group = next((g for g in result.groups if "xcprivacy" in g.name), None) | ||
| assert privacy_group is None | ||
|
|
||
| def test_nested_children_are_flattened_for_duplicate_detection(self): |
There was a problem hiding this comment.
Can you add a test case for what I mentioned above: have two duplicate Assets.car's with duplicate children so we can confirm only the Assets.car gets flagged as the duplicate.
| parts = path.split("/") | ||
| for depth in range(1, len(parts)): | ||
| parent_path = "/".join(parts[:depth]) | ||
| if parent_path in containers: |
There was a problem hiding this comment.
made this method just generally more efficient
In conjunction with getsentry/sentry-cli#3024 , addresses https://linear.app/getsentry/issue/EME-550/ios-insights-duplicate-files
This PR improves duplicate file detection for iOS assets by flattening nested children. Previously, assets inside .car files were invisible to duplicate detection.
The primary app I was testing on went from 120 duplicate files to 193. Those 73 new files are exclusively png, pdf and svg files