Skip to content

[node-core-library] Executable: Incorrect decoding of multi-byte characters depending on buffering #5509

@dmichon-msft

Description

@dmichon-msft

Summary

Executable.waitForExitAsync currently assumes that the raw chunks received from the stdout and stderr stream will always end at a character boundary, which is not required to be true when writing multi-byte characters.

Repro steps

inner.js

const { setTimeout } = require('node:timers/promises');

const unicodeString = "Hello, 世界"; // "Hello, World" in Chinese
const encoded = Buffer.from(unicodeString, 'utf8');
async function writeChars() {
    for (let i = 0; i < encoded.length; i++) {
        process.stdout.write(encoded.subarray(i, i + 1));
        await setTimeout(0); // wait for a single async tick
    }
}
writeChars();

outer.js

const child_process = require('child_process');

const decoder = new TextDecoder('utf-8');

let decodedOutput = '';
let byChunkOutput = [];
// Execute ./inner.js using Node.js and log the raw binary content of each chunk of stdout
const child = child_process.spawn('node', ['./inner.js'], {
    stdio: ['ignore', 'pipe', 'inherit']
});
child.stdout.on('data', (chunk) => {
    console.log('Received chunk:', chunk);
    byChunkOutput.push(chunk.toString());
    decodedOutput += decoder.decode(chunk, { stream: true });
});
child.on('close', (code) => {
    console.log(`Child process exited with code ${code}`);
    decodedOutput += decoder.decode(); // flush the decoder
    console.log('Decoded output:', decodedOutput);
    console.log('By-chunk output:', byChunkOutput.join(''));
});

This prints:

Received chunk: <Buffer 48>
Received chunk: <Buffer 65>
Received chunk: <Buffer 6c>
Received chunk: <Buffer 6c>
Received chunk: <Buffer 6f>
Received chunk: <Buffer 2c>
Received chunk: <Buffer 20>
Received chunk: <Buffer e4>
Received chunk: <Buffer b8>
Received chunk: <Buffer 96>
Received chunk: <Buffer e7>
Received chunk: <Buffer 95>
Received chunk: <Buffer 8c>
Child process exited with code 0
Decoded output: Hello, 世界
By-chunk output: Hello, ������

Details

Executable needs to either concatenate, then decode, or use one of the multi-byte-aware stateful decoders (TextDecoder or the legacy string_decoder).

Standard questions

Please answer these questions to help us investigate your issue more quickly:

Question Answer
Package name: @rushstack/node-core-library
Package version? 5.19.1
Operating system? Linux
Would you consider contributing a PR? Yes
Node.js version (node -v)? 20.13.1

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Needs triage

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions