Skip to content

fix(anonymizer): truncate output files on open to prevent stale JSON#8248

Closed
Retr0-XD wants to merge 1 commit intojaegertracing:mainfrom
Retr0-XD:fix/anonymizer-output-file-truncation
Closed

fix(anonymizer): truncate output files on open to prevent stale JSON#8248
Retr0-XD wants to merge 1 commit intojaegertracing:mainfrom
Retr0-XD:fix/anonymizer-output-file-truncation

Conversation

@Retr0-XD
Copy link
Copy Markdown

What does this PR do?

When you run the anonymizer tool more than once and point it to the same output files, the old content isn't cleared before writing starts. The files are opened with O_CREATE|O_WRONLY but without O_TRUNC, so if a previous run wrote more data than the current one, the leftover bytes stick around at the end and produce invalid JSON.

This adds os.O_TRUNC to all three os.OpenFile calls — two in writer.go (for the captured and anonymized output files) and one in extractor.go (for the UI converter output). That ensures the file is always empty when we start writing.

Which issue(s) does this PR fix?

Fixes #8231

Checklist

  • Code follows the project style guidelines
  • Existing tests pass (uses t.TempDir() which always starts with an empty dir, so no test assertions are affected)
  • Change is minimal and focused

Signed-off-by: Sakthi Harish sakthi.harish@edgeverve.com

@Retr0-XD Retr0-XD requested a review from a team as a code owner March 26, 2026 13:11
Copilot AI review requested due to automatic review settings March 26, 2026 13:11
@dosubot dosubot bot added the bug label Mar 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the anonymizer/uiconv tooling so repeated runs targeting the same output path don’t leave trailing bytes from previous runs (which can corrupt JSON output).

Changes:

  • Add os.O_TRUNC when opening the captured spans output file.
  • Add os.O_TRUNC when opening the anonymized spans output file.
  • Add os.O_TRUNC when opening the UI converter output file.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
cmd/anonymizer/app/writer/writer.go Truncate captured/anonymized output files on open to prevent stale trailing JSON.
cmd/anonymizer/app/uiconv/extractor.go Truncate UI output file on open to prevent stale trailing JSON.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 51 to 52
cf, err := os.OpenFile(config.CapturedFile, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.ModePerm)
if err != nil {
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.OpenFile(..., os.ModePerm) creates the output JSON files with mode 0777 (subject to umask), which can unintentionally set executable bits and make captured/anonymized trace data world-readable/writable. Consider using a more appropriate default like 0o600 (safer for trace data) or 0o644 (typical for JSON output).

Copilot uses AI. Check for mistakes.
Comment on lines 51 to 54
cf, err := os.OpenFile(config.CapturedFile, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.ModePerm)
if err != nil {
return nil, fmt.Errorf("cannot create output file: %w", err)
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error returned on failure to open the captured file is always cannot create output file, which makes it hard to tell which path failed when multiple files are opened here. Include config.CapturedFile (and similarly for the anonymized file) in the error message to simplify troubleshooting.

Copilot uses AI. Check for mistakes.
// newExtractor creates extractor.
func newExtractor(uiFile string, traceID string, reader *spanReader, logger *zap.Logger) (*extractor, error) {
f, err := os.OpenFile(uiFile, os.O_CREATE|os.O_WRONLY, os.ModePerm)
f, err := os.OpenFile(uiFile, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.ModePerm)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.OpenFile(..., os.ModePerm) creates the UI JSON output file with mode 0777 (subject to umask), which can set executable bits and expose trace data more broadly than intended. Consider using a tighter/default mode such as 0o600 (safer) or 0o644.

Suggested change
f, err := os.OpenFile(uiFile, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.ModePerm)
f, err := os.OpenFile(uiFile, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0o600)

Copilot uses AI. Check for mistakes.
f, err := os.OpenFile(uiFile, os.O_CREATE|os.O_WRONLY, os.ModePerm)
f, err := os.OpenFile(uiFile, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.ModePerm)
if err != nil {
return nil, fmt.Errorf("cannot create output file: %w", err)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message cannot create output file doesn't include the uiFile path, which can make diagnosing failures harder when this tool is wired into scripts. Consider including the filename in the error message.

Suggested change
return nil, fmt.Errorf("cannot create output file: %w", err)
return nil, fmt.Errorf("cannot create output file %q: %w", uiFile, err)

Copilot uses AI. Check for mistakes.
When running the anonymizer tool multiple times pointing to the same
output paths, the files were opened with O_CREATE|O_WRONLY but without
O_TRUNC. This means any previous content is kept in place and new data
is written on top. If the new output happens to be shorter than the old
one, the leftover bytes at the end produce malformed JSON that cannot
be parsed.

Added os.O_TRUNC to all three os.OpenFile calls across writer.go and
extractor.go so the file is always zeroed out before writing begins.

Also tightened the file creation mode from os.ModePerm (0777) to 0o600
so output files containing potentially sensitive trace data are not
world-readable, and included the file path in each error message to
make failures easier to diagnose.

Fixes: jaegertracing#8231

Signed-off-by: Sakthi Harish <sakthi.harish@edgeverve.com>
@Retr0-XD Retr0-XD force-pushed the fix/anonymizer-output-file-truncation branch from e6be8ea to a72cc6f Compare March 26, 2026 13:33
@yurishkuro
Copy link
Copy Markdown
Member

there is already an open PR, closing as a dupe.

@yurishkuro yurishkuro closed this Mar 27, 2026
@github-actions github-actions bot added the waiting-for-author PR is waiting for author to respond to maintainer's comments label Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug waiting-for-author PR is waiting for author to respond to maintainer's comments

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: output file not cleared before writing, causing broken JSON

3 participants