Skip to content

Use String#bytesize instead of String#length in #format_event#112

Merged
albertvaka merged 1 commit intoDataDog:masterfrom
shakrmedia:unicode-length
Jun 14, 2019
Merged

Use String#bytesize instead of String#length in #format_event#112
albertvaka merged 1 commit intoDataDog:masterfrom
shakrmedia:unicode-length

Conversation

@yosangwon
Copy link
Copy Markdown
Contributor

I found some issues which cause texts truncated on Datadog Event Stream when there are some characters outside ASCII range.

For example, a Korean letter(Hangul) takes 3 bytes in UTF-8 but current implementation considers it taking a byte. so every time a Korean letters there will be two bytes truncated.

text = "한글 한 자당 세 바이트abcdefghijklmnopqr"
text.length # => 31
text.bytesize # => 49

# remaining 18 bytes will be ignored in dd-agent (or datadog)
Datadog.dogstatsd.event("info", text, tags: ["test"], alert_type: "info")

Replacing String#length with String#bytesize would fix this issue.

@albertvaka
Copy link
Copy Markdown
Contributor

Hi @devleoper 😄Thanks for your contribution!

I gave a look at the code that receives these packets and indeed, it is counting bytes instead of characters: https://github.com/DataDog/datadog-agent/blob/master/pkg/dogstatsd/parser.go#L173

Merging!

@albertvaka albertvaka merged commit c95ae54 into DataDog:master Jun 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants