Skip to content
This repository was archived by the owner on Jun 30, 2022. It is now read-only.

Commit 3e657e9

Browse files
authored
Log potential encoding issues (#8)
Also gracefully handle any encoding issues in request bodies, while logging a warning. Bump to v0.2.5.
1 parent fbd6753 commit 3e657e9

File tree

4 files changed

+36
-9
lines changed

4 files changed

+36
-9
lines changed

Gemfile.lock

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
PATH
22
remote: .
33
specs:
4-
akita-har_logger (0.2.4)
4+
akita-har_logger (0.2.5)
55

66
GEM
77
remote: https://rubygems.org/

lib/akita/har_logger/har_utils.rb

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,20 @@ module HarLogger
55
class HarUtils
66
# Rack apparently uses 8-bit ASCII for everything, even when the string
77
# is not 8-bit ASCII. This reinterprets 8-bit ASCII strings as UTF-8.
8+
#
9+
# If we are unable to do this reinterpretation, return the string
10+
# unchanged, but log a warning that points to the caller.
811
def self.fixEncoding(v)
9-
if v == nil || v.encoding != Encoding::ASCII_8BIT then
10-
v
11-
else
12-
String.new(v).force_encoding(Encoding::UTF_8)
12+
if v != nil && v.encoding == Encoding::ASCII_8BIT then
13+
forced = String.new(v).force_encoding(Encoding::UTF_8)
14+
if forced.valid_encoding? then
15+
v = forced
16+
else
17+
Rails.logger.warn "[#{caller_locations(1, 1)}] Unable to fix encoding: not a valid UTF-8 string. This will likely cause JSON serialization to fail."
18+
end
1319
end
20+
21+
v
1422
end
1523

1624
# Converts a Hash into a list of Hash objects. Each entry in the given

lib/akita/har_logger/http_request.rb

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ def getPostDataCharSet(env)
8585
return Encoding::ISO_8859_1
8686
end
8787

88-
Encoding.default_external
88+
Encoding::ASCII_8BIT
8989
end
9090

9191
# Obtains the posted data from an HTTP environment.
@@ -117,8 +117,27 @@ def getPostData(env)
117117
# body when the request specifies UTF-8. Reinterpret the content
118118
# body according to what the request says it is, and re-encode into
119119
# UTF-8.
120-
result[:text] = req.body.string.encode(Encoding::UTF_8,
121-
getPostDataCharSet(env))
120+
#
121+
# Gracefully handle any characters that are invalid in the source
122+
# encoding and characters that have no UTF-8 representation by
123+
# replacing with '?'. Log a warning when this happens.
124+
source = req.body.string.force_encoding(getPostDataCharSet(env))
125+
utf8EncodingSuccessful = false
126+
if source.valid_encoding? then
127+
begin
128+
result[:text] = source.encode(Encoding::UTF_8)
129+
utf8EncodingSuccessful = true
130+
rescue Encoding::UndefinedConversionError
131+
Rails.logger.warn "[#{caller_locations(0, 1)}] Unable to losslessly convert request body from #{source.encoding} to UTF-8. Characters undefined in UTF-8 will be replaced with '?'."
132+
end
133+
else
134+
Rails.logger.warn "[#{caller_locations(0, 1)}] Request body is not valid #{source.encoding}. Invalid characters and characters undefined in UTF-8 will be replaced with '?'."
135+
end
136+
137+
if !utf8EncodingSuccessful then
138+
result[:text] = source.encode(Encoding::UTF_8,
139+
invalid: :replace, undef: :replace, replace: '?')
140+
end
122141
end
123142

124143
result

lib/akita/har_logger/version.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,6 @@
22

33
module Akita
44
module HarLogger
5-
VERSION = "0.2.4"
5+
VERSION = "0.2.5"
66
end
77
end

0 commit comments

Comments
 (0)