Skip to content

Replica: fix(smtp-server): encode subject to UTF-8 before database insertion#77

Open
lucaforni wants to merge 16 commits into
main-modalsourcefrom
DamienLGRCA-postal-fix/encode-subject-utf8
Open

Replica: fix(smtp-server): encode subject to UTF-8 before database insertion#77
lucaforni wants to merge 16 commits into
main-modalsourcefrom
DamienLGRCA-postal-fix/encode-subject-utf8

Conversation

@lucaforni

Copy link
Copy Markdown

Questa PR replica la PR originale: postalserver#3594

Autore originale: @DamienLGRCA
Branch originale: fix/encode-subject-utf8
Repository originale: DamienLGRCA/postal


Problem

Emails with subjects containing raw non-UTF-8 characters (e.g. ISO-8859-1 \xE9 for "é") without RFC 2047 MIME encoding cause Mysql2::Error: Incorrect string value during insertion into the messages table. The email is silently dropped — never delivered to the endpoint.

This is an issue that has existed since at least 2021 (see postalserver#1636). In environments receiving mail from legacy systems (ERPs, invoicing software), this can affect hundreds of emails per day.

Root cause

In lib/postal/message_db/message.rb, the subject is extracted via headers["subject"]&.last.to_s[0, 200] and inserted as-is. The Mail gem's field.decoded correctly handles RFC 2047 encoded headers (=?ISO-8859-1?Q?...?=), but when the sender does not encode the subject per RFC 2047 (raw ISO-8859-1 bytes), the string is passed through unchanged. MySQL in strict mode then rejects the invalid UTF-8.

Fix

Added a private encode_utf8 method that safely converts the subject to valid UTF-8:

  1. No-op if already valid UTF-8
  2. Re-interprets bytes as UTF-8 if possible
  3. Falls back to ISO-8859-1 → UTF-8 conversion (most common case for European accented characters)
  4. Last resort: replaces invalid characters with ?

Reproduction

# Send an email with raw ISO-8859-1 subject (\xE9 = é)
printf 'Subject: Facture/note de cr\xe9dit Nr 12345\r\n' | # via SMTP to Postal

Results in:

Mysql2::Error: Incorrect string value: '\xE9dit N...' for column \`postal-server-1\`.\`messages\`.\`subject\` at row 1

Closes postalserver#1636

adamcooke and others added 16 commits February 1, 2026 14:48
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
The app-wide CSP already blocks inline script execution, but the HTML
preview iframe for a stored email was same-origin and un-sandboxed, and
the html_raw response had no per-action hardening. Add a sandbox on the
iframe and tighten the CSP on html_raw to script-src 'none' with
nosniff and no-referrer so the preview has defence in depth against a
future CSP bypass or regression.

Relates to GHSA-f6g9-8555-cw28.
The /img/<server>/<message> endpoint accepted a src=<url> query
parameter and proxied the body of that URL back to the caller. Nothing
in the codebase ever produces a src= parameter — the parser only
inserts a plain tracking pixel and rewrites href links — so this branch
is dead code inherited from the original AppMail import.

Drop the src branch: requests with src now return 400. The no-src path
that serves the tracking pixel and records loads is unchanged, and a
spec covers both the pixel-serving path and the removed branch.
The endpoint and domain option helpers interpolated model attributes
straight into an HTML string before marking the whole buffer html_safe.
Wrap the interpolations in h() so untrusted attributes can't break out
of the surrounding tag.

Also stop the helpers glob in rails_helper from eagerly requiring
_spec.rb files so helper specs can live under spec/helpers/, and add a
small application helper spec covering the escape behaviour.
url_with_return_to only checked that return_to started with a forward
slash, which also allowed protocol-relative values like //host and
/\host. Rails 7.1 already refuses to follow those via redirect_to, so
the user just saw a 500. Reject the same shapes in the helper instead
so we fall back to the default URL cleanly.

Adds a sessions request spec covering the rejected shapes plus the
happy-path relative redirect.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…rfpg-3xr5)

The Legacy API message lookup endpoints parsed the request body as JSON and
passed the `id` parameter straight through to the message database. A JSON
object supplied for `id` arrived as a Ruby Hash and was used as a raw set of
SQL `WHERE` conditions. `hash_to_sql` interpolated each Hash key directly
inside backtick identifier quoting while escaping only the value, so a key
containing a backtick could break out of the identifier and inject arbitrary
SQL into the SELECT (blind, time-based) against the message database.

Fixes:

- Escape all identifiers (columns, tables, database names) through a new
  `escape_identifier` helper that wraps in backticks and doubles embedded
  backticks. Applied across hash_to_sql, select, insert, insert_multi,
  update and delete so no caller can inject via an identifier.
- Validate the Legacy API `id` parameter at the controller boundary: reject
  any non-scalar value before it reaches the database and coerce it to an
  integer. Internal Hash-based lookups (e.g. tracking middleware) are
  unaffected.

Adds regression tests at the unit (hash_to_sql / escape_identifier) and
request (legacy messages/deliveries) levels.
Webhook and HTTP message endpoint deliveries both flow through
Postal::HTTP, which parsed the user-supplied URL and connected to its
host with no address validation. An authenticated user could point a
webhook or endpoint at a private, loopback or link-local address (e.g.
127.0.0.1, 169.254.169.254 cloud metadata, RFC1918 hosts) and make the
server issue requests into its own internal network.

Add Postal::HTTP::AddressGuard, which resolves the destination host and
rejects private/loopback/link-local/reserved/multicast IPv4 and IPv6
addresses, then pins the connection to the validated address so it cannot
be redirected via a DNS-rebinding race. Administrators can permit specific
destinations via the new postal.allowed_request_destinations config option
(hostnames or IP/CIDR ranges).

Address selection only uses families this server can actually reach so we
do not pin to an IPv6 address on a host without IPv6 connectivity; IPv4 is
preferred for predictability. HTTPEndpoint now validates that its URL is a
well-formed HTTP(S) URL with a host.
The spec relied on the test machine having real IPv6 connectivity,
which GitHub Actions runners do not have.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Emails with subjects containing non-UTF-8 characters (e.g. ISO-8859-1
encoded accented characters like 'é' as raw \xE9) cause a
Mysql2::Error 'Incorrect string value' when Postal attempts to insert
the message into the database. This results in the email being
silently dropped.

This commit adds an encode_utf8 helper that:
1. Returns the string as-is if already valid UTF-8
2. Attempts to re-interpret bytes as UTF-8
3. Falls back to ISO-8859-1 to UTF-8 conversion (most common case
   for European accented characters)
4. As a last resort, replaces invalid characters with '?'

This fixes the issue reported in postalserver#1636 where emails sent with
ISO-8859-1 encoded subjects without RFC 2047 MIME encoding are
rejected by MySQL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mysql2::Error: Incorrect string value: '\xE9..' for column postal-server-4.messages.subject at row 1

3 participants