Skip to content

fix: cloudwatch control message makes consumelogs crash with sigsegv#44231

Merged
ChrsMark merged 8 commits intoopen-telemetry:mainfrom
johnvox:fix/sigsegv-awsformatter
Dec 1, 2025
Merged

fix: cloudwatch control message makes consumelogs crash with sigsegv#44231
ChrsMark merged 8 commits intoopen-telemetry:mainfrom
johnvox:fix/sigsegv-awsformatter

Conversation

@johnvox
Copy link
Contributor

@johnvox johnvox commented Nov 13, 2025

Description

With following config:

receivers:
  awss3:
    s3downloader:
        region: "ca-central-1"
        s3_bucket: "<redacted>"
    encodings:
      - extension: awslogs_encoding/cloudwatch
    sqs:
      queue_url: "<redacted>"
      region: "ca-central-1"

exporters:
  nop:

service:
  extensions: [awslogs_encoding/cloudwatch]
  pipelines:
    logs/aws:
      receivers: [awss3]
      processors: []
      exporters: [ nop ]

extensions:
  awslogs_encoding/cloudwatch:
    format: cloudwatch

And this kind of message in the bucket (gzip in bucket btw)

{
    "messageType": "CONTROL_MESSAGE",
    "owner": "CloudwatchLogs",
    "logGroup": "",
    "logStream": "",
    "subscriptionFilters": [],
    "logEvents": [
        {
            "id": "",
            "timestamp": 1763029869851,
            "message": "CWL CONTROL MESSAGE: Checking health of destination Firehose."
        }
    ]
}{
    "messageType": "DATA_MESSAGE",
    "owner": "0000000000000",
    "logGroup": "/this/is/a/log/group",
    "logStream": "this-is-a-log-stream",
    "subscriptionFilters": [
        "EksToS3"
    ],
    "logEvents": [
        {
            "id": "000000000000011111111111111111122222222222222222233333333333333333",
            "timestamp": 1763029869386,
            "message": "dummy message",
            "extractedFields": {
                "@aws.account": "0000000000000",
                "@aws.region": "ca-central-1"
            }
        }
    ]
}

I got the following issue

[signal SIGSEGV: segmentation violation code=0x1 addr=0x4 pc=0xbbac4a]

goroutine 125 [running]:
go.opentelemetry.io/collector/pdata/internal.(*State).IsReadOnly(...)
        go.opentelemetry.io/collector/pdata@v1.44.0/internal/state.go:53
go.opentelemetry.io/collector/pdata/plog.Logs.IsReadOnly(...)
        go.opentelemetry.io/collector/pdata@v1.44.0/plog/logs.go:13
go.opentelemetry.io/collector/internal/fanoutconsumer.(*logsConsumer).ConsumeLogs(0xc000645260, {0x2d23670, 0xc0005760a0}, {0x0?, 0x0?})
        go.opentelemetry.io/collector/internal/fanoutconsumer@v0.138.0/logs.go:61 +0x1ea
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awss3receiver.(*logsReceiver).processReceivedData(0xc000446590, {0x2d23670, 0xc0005760a0}, 0xc000280a20, {0xc0001d7860, 0x57}, {0xc001578000, 0x9b07c, 0xac000})
        github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awss3receiver@v0.138.0/receiver.go:262 +0x6c7
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awss3receiver.(*awss3Receiver).receiveBytes(0xc000280a20, {0x2d23670, 0xc0005760a0}, {0xc0001d7860, 0x57}, {0xc001578000, 0x9b07c, 0xac000})
        github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awss3receiver@v0.138.0/receiver.go:138 +0x253
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awss3receiver.(*s3SQSNotificationReader).readAll(0xc000572ae0, {0x2d23670, 0xc0005760a0}, {0x2d70d98?, 0x23233c0?}, 0xc000518000)
        github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awss3receiver@v0.138.0/s3sqsreader.go:208 +0x1939
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awss3receiver.(*awss3Receiver).Start.func1()
        github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awss3receiver@v0.138.0/receiver.go:106 +0x8a
created by github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awss3receiver.(*awss3Receiver).Start in goroutine 1
        github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awss3receiver@v0.138.0/receiver.go:105 +0x14c

Changing to proper logs initialization fix the issue

Signed-off-by: Jean-Yves NOLEN <jynolen@gmail.com>
@johnvox
Copy link
Contributor Author

johnvox commented Nov 13, 2025

/tag bug

Copy link
Contributor

@constanca-m constanca-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced this is something we should change in the encoding extension. If there is an error, we shouldn't initialize any logs. This looks like a bug on the receiver that is throwing that error

@johnvox
Copy link
Contributor Author

johnvox commented Nov 13, 2025

I kind disagree.

I agree that in the caller responsability to check if return object is nil or not.
But if you choose to return a non nil object you had to ensure it's correctly initialize.

@constanca-m
Copy link
Contributor

We can't return a nil object for this use case, we would have to specify it as a pointer @jynolen which breaks the interface for encodings. So we return a zero value. I am going to let @axw see if he agrees with the change to see if we move forward with it then

@axw
Copy link
Contributor

axw commented Nov 17, 2025

I agree with both of you to some degree :)

The stacktrace in the description is related to the control message: we're returning an uninitialised/zero-value Logs with no error. We should return an initialised, empty value to avoid the panic.

It is standard practice in Go programs to ignore all other function results when a non-nil error is returned. The S3 receiver already ignores the results in error cases, so there's no need to change those.

@johnvox
Copy link
Contributor Author

johnvox commented Nov 17, 2025

Will do the change.

Another thing I found.

Cloudwatch => Firehose => S3 produced inline concat json messages like

{"owner":"123456789012","logGroup":"CloudTrail","logStream":"123456789012_CloudTrail_us-east-1","subscriptionFilters":["Destination"],"messageType":"DATA_MESSAGE","logEvents":[{"id":"31953106606966983378809025079804211143289615424298221568","timestamp":1432826855000,"message":"{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"},{"id":"31953106606966983378809025079804211143289615424298221569","timestamp":1432826855000,"message":"{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"},{"id":"31953106606966983378809025079804211143289615424298221570","timestamp":1432826855000,"message":"{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"}]}{"owner":"123456789012","logGroup":"CloudTrail","logStream":"123456789012_CloudTrail_us-east-1","subscriptionFilters":["Destination"],"messageType":"DATA_MESSAGE","logEvents":[{"id":"31953106606966983378809025079804211143289615424298221568","timestamp":1432826855000,"message":"{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"},{"id":"31953106606966983378809025079804211143289615424298221569","timestamp":1432826855000,"message":"{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"},{"id":"31953106606966983378809025079804211143289615424298221570","timestamp":1432826855000,"message":"{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"}]}{"owner":"123456789012","logGroup":"CloudTrail","logStream":"123456789012_CloudTrail_us-east-1","subscriptionFilters":["Destination"],"messageType":"DATA_MESSAGE","logEvents":[{"id":"31953106606966983378809025079804211143289615424298221568","timestamp":1432826855000,"message":"{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"},{"id":"31953106606966983378809025079804211143289615424298221569","timestamp":1432826855000,"message":"{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"},{"id":"31953106606966983378809025079804211143289615424298221570","timestamp":1432826855000,"message":"{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"}]}

I can add piece of code to handle those kind of format.
Do you prefer it in another PR or inline with this one ?

@axw
Copy link
Contributor

axw commented Nov 17, 2025

@jynolen thanks. For the concatenated JSON: can you please open a new issue? I would rather we handle that in a separate PR too. I thought we had some code to handle that, but I may be thinking of the CloudWatch Metrics encoding:

// Multiple metrics in each record separated by newline character

@johnvox
Copy link
Contributor Author

johnvox commented Nov 28, 2025

Change has been done and test added as required

Copy link
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jynolen!

@axw axw added the ready to merge Code review completed; ready to merge by maintainers label Nov 28, 2025
@johnvox
Copy link
Contributor Author

johnvox commented Nov 28, 2025

@jynolen thanks. For the concatenated JSON: can you please open a new issue? I would rather we handle that in a separate PR too. I thought we had some code to handle that, but I may be thinking of the CloudWatch Metrics encoding:

// Multiple metrics in each record separated by newline character

If got time will try to have a look to implement b4 next release

@axw axw removed the ready to merge Code review completed; ready to merge by maintainers label Nov 28, 2025
@axw
Copy link
Contributor

axw commented Nov 28, 2025

@jynolen sorry I didn't notice earlier: can you please add a changelog entry?

@axw axw added the ready to merge Code review completed; ready to merge by maintainers label Nov 28, 2025
Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ChrsMark
Copy link
Member

ChrsMark commented Dec 1, 2025

@constanca-m mind taking a look too, as 2nd code-owner of the component?

Copy link
Contributor

@constanca-m constanca-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jynolen

@ChrsMark ChrsMark merged commit b984794 into open-telemetry:main Dec 1, 2025
189 checks passed
@github-actions github-actions bot added this to the next release milestone Dec 1, 2025
@otelbot
Copy link
Contributor

otelbot bot commented Dec 1, 2025

Thank you for your contribution @jynolen! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extension/encoding/awslogsencoding ready to merge Code review completed; ready to merge by maintainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants