Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 18, 2025

Executable.waitForExitAsync and getProcessInfoByIdAsync were incorrectly decoding multi-byte UTF-8 characters when character boundaries split across stream chunks, producing replacement characters (�) instead of the original text.

Changes

  • Executable.waitForExitAsync: Set encoding on stdout/stderr streams before attaching handlers via stream.setEncoding(). This delegates to Node.js's StringDecoder which maintains state across chunks to handle multi-byte boundaries correctly.

  • Executable.getProcessInfoByIdAsync: Set UTF-8 encoding on stdout before parsing process list output to handle process names with multi-byte characters.

  • Test coverage: Added test that writes multi-byte characters (Chinese text + emoji) byte-by-byte to verify decoding across chunk boundaries.

Example

Before fix:

// Output received as separate byte chunks: <Buffer e4>, <Buffer b8>, <Buffer 96>
// Result: "���" (replacement characters)

After fix:

childProcess.stdout.setEncoding('utf8');  // StringDecoder handles boundaries
// Result: "世" (correct character)
Original prompt

This section details on the original issue you should resolve

<issue_title>[node-core-library] Executable: Incorrect decoding of multi-byte characters depending on buffering</issue_title>
<issue_description>## Summary
Executable.waitForExitAsync currently assumes that the raw chunks received from the stdout and stderr stream will always end at a character boundary, which is not required to be true when writing multi-byte characters.

Repro steps

inner.js

const { setTimeout } = require('node:timers/promises');

const unicodeString = "Hello, 世界"; // "Hello, World" in Chinese
const encoded = Buffer.from(unicodeString, 'utf8');
async function writeChars() {
    for (let i = 0; i < encoded.length; i++) {
        process.stdout.write(encoded.subarray(i, i + 1));
        await setTimeout(0); // wait for a single async tick
    }
}
writeChars();

outer.js

const child_process = require('child_process');

const decoder = new TextDecoder('utf-8');

let decodedOutput = '';
let byChunkOutput = [];
// Execute ./inner.js using Node.js and log the raw binary content of each chunk of stdout
const child = child_process.spawn('node', ['./inner.js'], {
    stdio: ['ignore', 'pipe', 'inherit']
});
child.stdout.on('data', (chunk) => {
    console.log('Received chunk:', chunk);
    byChunkOutput.push(chunk.toString());
    decodedOutput += decoder.decode(chunk, { stream: true });
});
child.on('close', (code) => {
    console.log(`Child process exited with code ${code}`);
    decodedOutput += decoder.decode(); // flush the decoder
    console.log('Decoded output:', decodedOutput);
    console.log('By-chunk output:', byChunkOutput.join(''));
});

This prints:

Received chunk: <Buffer 48>
Received chunk: <Buffer 65>
Received chunk: <Buffer 6c>
Received chunk: <Buffer 6c>
Received chunk: <Buffer 6f>
Received chunk: <Buffer 2c>
Received chunk: <Buffer 20>
Received chunk: <Buffer e4>
Received chunk: <Buffer b8>
Received chunk: <Buffer 96>
Received chunk: <Buffer e7>
Received chunk: <Buffer 95>
Received chunk: <Buffer 8c>
Child process exited with code 0
Decoded output: Hello, 世界
By-chunk output: Hello, ������

Details

Executable needs to either concatenate, then decode, or use one of the multi-byte-aware stateful decoders (TextDecoder or the legacy string_decoder).

Standard questions

Please answer these questions to help us investigate your issue more quickly:

Question Answer
Package name: @rushstack/node-core-library
Package version? 5.19.1
Operating system? Linux
Would you consider contributing a PR? Yes
Node.js version (node -v)? 20.13.1
</issue_description>

<agent_instructions>Please update Executable.waitForExitAsync (and any other Executable APIs that merge input from multiple chunks of unknown encoding) to explicitly set the stream encoding for stdout and stderr before attaching handlers.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@dmichon-msft Only an issue if we don't explicitly set the stream encoding to `utf-8`.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits December 18, 2025 01:00
Co-authored-by: dmichon-msft <26827560+dmichon-msft@users.noreply.github.com>
Co-authored-by: dmichon-msft <26827560+dmichon-msft@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix incorrect decoding of multi-byte characters in executable Fix multi-byte UTF-8 decoding in Executable stream handling Dec 18, 2025
Copilot AI requested a review from dmichon-msft December 18, 2025 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Needs triage

Development

Successfully merging this pull request may close these issues.

[node-core-library] Executable: Incorrect decoding of multi-byte characters depending on buffering

2 participants