Skip to content

devall-org/feistel_cipher

Repository files navigation

FeistelCipher

Encrypted integer IDs using Feistel cipher

Database Support: PostgreSQL only (uses PostgreSQL triggers and functions)

Why?

Problem: Sequential IDs (1, 2, 3...) leak business information:

  • Competitors can estimate your growth rate
  • Users can enumerate resources (/posts/1, /posts/2...)
  • Total record counts are exposed

Common Solutions & Issues:

  • UUIDs/Random integers: Generate different IDs on every seed run, making dev/staging environments inconsistent
  • UUIDs: Fixed 16 bytes (36 characters) - too long for URLs (/posts/550e8400-e29b-41d4-a716-446655440000)
  • Random integers: Collision risks, complex generation logic

This Library's Approach:

  • Store sequential integers internally
  • Expose encrypted integers externally (non-sequential, unpredictable)
  • Deterministic encryption: same insertion order always produces same encrypted ID (consistent seed data)
  • Automatic encryption via database trigger
  • Adjustable bit size per column
  • Time-based prefix for PostgreSQL incremental backup optimization

Installation

Using Ash Framework?

If you're using Ash Framework, use ash_feistel_cipher instead! It provides a declarative DSL to configure Feistel cipher encryption directly in your Ash resources.

For plain Ecto users, continue below.

Using igniter (Recommended)

mix igniter.install feistel_cipher

Manual Installation

# mix.exs
def deps do
  [{:feistel_cipher, "~> 1.0"}]
end

Then run:

mix deps.get
mix feistel_cipher.install

Installation Options

Both methods support the following options:

  • --repo or -r: Specify an Ecto repo (required for manual installation)
  • --functions-prefix or -p: PostgreSQL schema prefix (default: public)
  • --functions-salt or -s: Cipher salt constant, max 2^31-1 (default: randomly generated)

⚠️ Security Note: A cryptographically random salt is generated by default for each project. This ensures that encryption patterns cannot be analyzed across different projects. Never use the same salt across multiple production projects.

Fun Fact: Notice the timestamp 19730501000000 in the migration file generated during installation? That's May 1, 1973 - the day Horst Feistel published his groundbreaking paper at IBM, introducing the cipher structure that powers this library. We thought it deserved a permanent timestamp in your database history! 🎂

Upgrading from v0.x

See UPGRADE.md for the migration guide.

Usage Example

1. Create Migration

defmodule MyApp.Repo.Migrations.CreatePosts do
  use Ecto.Migration

  def up do
    create table(:posts) do
      add :seq, :bigserial
      add :title, :string
    end

    # 1 day buckets
    execute FeistelCipher.up_for_trigger("public", "posts", "seq", "id",
      time_bucket: 86400
    )
  end

  def down do
    execute FeistelCipher.down_for_trigger("public", "posts", "seq", "id")
    drop table(:posts)
  end
end

2. Define Schema

defmodule MyApp.Post do
  use Ecto.Schema

  # Hide seq in API responses
  @derive {Jason.Encoder, except: [:seq]}

  schema "posts" do
    field :seq, :id, read_after_writes: true
    field :title, :string
  end
end

The read_after_writes: true option tells Ecto to fetch the seq value after INSERT (since it's generated by the database).

Now when you insert a record, seq auto-increments and the trigger automatically sets id = [time_prefix | feistel_cipher(seq)]:

%Post{title: "Hello"} |> Repo.insert!()
# => %Post{id: 8234567890123, seq: 1, title: "Hello"}

# In API responses, only id is exposed (seq is hidden)

Security Note: Keep seq internal. Only expose id in APIs to prevent enumeration attacks.

ID Structure

The generated ID has the structure [time_bits | data_bits]:

┌─────────────────┬──────────────────────────────────────────┐
│   time_bits     │              data_bits                   │
│   (12 bits)     │              (40 bits)                   │
│   time prefix   │     feistel_cipher(seq)                  │
└─────────────────┴──────────────────────────────────────────┘
  • time_bits (upper): Derived from current time. Rows created in the same time bucket share the same prefix, clustering them on nearby PostgreSQL pages.
  • data_bits (lower): The sequential value encrypted with Feistel cipher.

Why Time Prefix?

PostgreSQL incremental backups (e.g., pg_basebackup with WAL, pgBackRest) back up entire pages (8KB blocks). Without a time prefix, Feistel cipher distributes IDs uniformly across all pages — meaning each new row touches a different page, and incremental backups become as large as full backups.

With a time prefix, rows from the same time bucket land on nearby pages, so incremental backups only need to capture the recently-modified pages.

When to Use Time Prefix (time_bits > 0)

Use a time prefix when you want write locality and smaller incremental backups on large/high-write tables.

  • Example: events, logs, orders, messages tables that receive continuous inserts.
  • Typical config: time_bits: 15, time_bucket: 86400 (daily, default) or 3600 (hourly for tighter locality windows).
  • With time_bits: 15, time_bucket: 86400, and encrypt_time: false, the time prefix wraps after about 89 years 9 months.

When NOT to Use Time Prefix (time_bits: 0)

Disable time prefix when you only need opaque IDs and don't need backup/page-locality optimization.

  • Example: small reference tables (countries, roles, currencies) or low-write admin/config tables.
  • Also useful when you want the simplest mode: id = feistel_cipher(seq) with no time component.

Trigger Options

The up_for_trigger/5 function accepts these options:

⚠️ Important: Parameter changes should be handled as explicit migrations. Some options (like time_bits/time_bucket/encrypt_time) can be changed technically, but old/new IDs will use different semantics. Core cipher options (data_bits/key/rounds) should be treated as immutable in-place.

  • prefix, table, from, to: Table and column names (required)
  • time_bits: Time prefix bits (default: 15). Set to 0 for no time prefix
  • time_bucket: Time bucket size in seconds (default: 86400)
    • Example: 86400 for 1 day (default), 3600 for 1 hour
    • Rows inserted within the same bucket share the same time prefix
  • time_offset: Time offset in seconds applied before bucket calculation (default: 0)
    • Formula: time_value = floor((epoch + time_offset) / time_bucket)
    • Sign convention: positive values move the boundary earlier in local time; negative values move it later
    • Example: time_bucket: 86400, time_offset: 21600 shifts daily boundary from 00:00 UTC to 18:00 UTC (03:00 KST)
    • Use this when business day boundaries differ from UTC midnight, or when multiple countries need a stable operational cutover time
  • encrypt_time: Whether to encrypt the time prefix with Feistel cipher (default: false)
    • false: Time prefix may reflect recent bucket progression, but it is not a globally orderable timestamp
    • true: Time prefix is encrypted (hides time patterns, but same-bucket rows still share prefix). time_bits must be even
  • data_bits: Data cipher bits (default: 38, must be even)
    • Choose different sizes per column: Unlike UUIDs (fixed 16 bytes), tailor each column's ID length
    • Example: User ID = 32 bits (~4B values), Post ID = 40 bits (~1T values)
  • rounds: Number of Feistel rounds (default: 16, min: 1, max: 32)
    • Default 16 provides good security/performance balance
    • Note: Diagrams and proofs in this README use 2 rounds for simplicity
    • More rounds = more secure but slower
    • Odd rounds (1, 3, 5...) and even rounds (2, 4, 6...) are both supported
  • key: Encryption key (auto-generated if not specified)
  • functions_prefix: Schema where cipher functions reside (default: public)

Constraints:

  • time_bits + data_bits must be ≤ 63 when encrypt_time: false, and ≤ 62 when encrypt_time: true
  • time_bits must be even when encrypt_time: true
  • data_bits must be even

⚠️ You cannot reliably compare IDs by time_bits alone to determine temporal order. Because time_value = floor(now / time_bucket) mod 2^time_bits, the prefix wraps after time_bucket * 2^time_bits seconds. This feature is intended to improve PostgreSQL incremental backup locality, not to provide UUIDv7-style global time ordering.

Why time_offset Exists

time_bucket alone uses UTC-based boundaries. For daily buckets, that means bucket changes at UTC midnight, which may split a local business day at awkward local times (for example, evening in the Americas or early morning in Europe).

time_offset lets you align bucket boundaries to your operational day (for example, 03:00 local cutover) without changing time_bucket size. This improves practical continuity for time-prefix clustering, especially when encrypt_time: true is enabled and the prefix itself is not human-readable.

In this library, time_offset is added to epoch before bucketing. That is why +21600 (not -21600) gives a 03:00 KST boundary for daily buckets.

Example with custom options:

execute FeistelCipher.up_for_trigger(
  "public", "posts", "seq", "id",
  time_bits: 8,
  time_bucket: 86400,
  time_offset: 21600,
  data_bits: 32,
  key: 123456789,
  rounds: 8,
  functions_prefix: "crypto"
)

Example without time prefix:

execute FeistelCipher.up_for_trigger(
  "public", "posts", "seq", "id",
  time_bits: 0
)

Advanced Usage

Column Rename

When renaming columns that have triggers, use force_down_for_trigger/4 to safely drop and recreate the trigger:

defmodule MyApp.Repo.Migrations.RenamePostsColumns do
  use Ecto.Migration

  def change do
    # 1. Drop the old trigger
    execute FeistelCipher.force_down_for_trigger("public", "posts", "seq", "id")

    # 2. Rename columns
    rename table(:posts), :seq, to: :sequence
    rename table(:posts), :id, to: :external_id

    # 3. Recreate trigger with SAME encryption parameters
    # IMPORTANT: Generate key using OLD column names (seq, id)
    old_key = FeistelCipher.generate_key("public", "posts", "seq", "id")

    execute FeistelCipher.up_for_trigger("public", "posts", "sequence", "external_id",
      time_bits: 15,               # Must match original
      time_bucket: 86400,          # Must match original
      data_bits: 38,               # Must match original
      key: old_key,                # Key from OLD column names
      rounds: 16,                  # Must match original
      functions_prefix: "public"   # Must match original
    )
  end
end

⚠️ Critical: When recreating triggers, ALL encryption parameters (time_bits, time_bucket, data_bits, key, rounds, functions_prefix) MUST match the original values. Otherwise:

  • Updates will fail with exceptions
  • 1:1 mapping breaks (new inserts may produce duplicate encrypted values)

Note: down_for_trigger/4 includes a safety guard (RAISE EXCEPTION) to prevent accidental deletion. For legitimate use cases like column rename, use force_down_for_trigger/4 which bypasses the guard.

Alternative: Display-Only IDs

If you prefer to keep your sequential id as the primary key, you can use Feistel cipher for display-only columns. This approach is similar to using Hashids or other ID obfuscation libraries, but with database-native encryption.

# Migration
create table(:posts) do
  add :disp_id, :bigint    # Encrypted, for public APIs
  add :title, :string
end

create unique_index(:posts, [:disp_id])

execute FeistelCipher.up_for_trigger("public", "posts", "id", "disp_id",
  time_bucket: 86400
)

# Schema
defmodule MyApp.Post do
  use Ecto.Schema

  # Hide internal id in API responses
  @derive {Jason.Encoder, except: [:id]}

  schema "posts" do
    field :disp_id, :id, read_after_writes: true
    field :title, :string
  end
end

Then only expose disp_id in your APIs while keeping id internal.

Advantages over Hashids: Database-native (no encoding/decoding).

Performance

Encrypting 100,000 sequential values:

Rounds Total Time Per Encryption
1 119 ms ~1.2μs
2 197 ms ~2.0μs
4 349 ms ~3.5μs
8 609 ms ~6.1μs
16 1190 ms ~11.9μs
32 2246 ms ~22.5μs

Default is 16 rounds - provides good security/performance balance with cryptographic HMAC-SHA256. The overhead per INSERT/UPDATE is negligible for most applications.

Benchmark Environment

  • CPU: Apple M3 Pro (12 cores)
  • Database: PostgreSQL 17 (Postgres.app)
  • OS: macOS 15.6
  • Elixir: 1.18.3 / OTP 27

Running Benchmarks

MIX_ENV=test mix run benchmark/rounds_benchmark.exs

The benchmark encrypts 100,000 sequential values (1 to 100,000) using a SQL batch function to minimize overhead and measure pure encryption performance.

How It Works

The Feistel cipher is a symmetric structure used in the construction of block ciphers. This library implements a configurable Feistel network that transforms sequential integers into non-sequential encrypted integers with one-to-one mapping.

Feistel Cipher Diagram

Note: The diagram above illustrates a 2-round Feistel cipher for simplicity. By default, this library uses 16 rounds for better security. The number of rounds is configurable (see Trigger Options).

Self-Inverse Property

The Feistel cipher is self-inverse: applying the same function twice returns the original value. This means encryption and decryption use the exact same algorithm.

Mathematical Proof:

Let's denote the input as $(L_1, R_1)$ and the round function as $F(x)$.

First application (Encryption):

$$ \begin{aligned} L_2 &= R_1, & R_2 &= L_1 \oplus F(R_1) \\ L_3 &= R_2, & R_3 &= L_2 \oplus F(R_2) \\ \text{Output} &= (R_3, L_3) \end{aligned} $$

Second application (Decryption) - Starting with $(R_3, L_3)$:

$$ \begin{aligned} L_2' &= L_3, & R_2' &= R_3 \oplus F(L_3) \\ &= L_3, & &= R_3 \oplus F(R_2) \\ &= L_3, & &= (L_2 \oplus F(R_2)) \oplus F(R_2) \\ &= L_3, & &= L_2 = R_1 \quad \text{(XOR cancellation)} \\ \\ L_3' &= R_2' = R_1, & R_3' &= L_2' \oplus F(R_2') \\ &= R_1, & &= L_3 \oplus F(R_1) \\ &= R_1, & &= R_2 \oplus F(R_1) \\ &= R_1, & &= (L_1 \oplus F(R_1)) \oplus F(R_1) \\ &= R_1, & &= L_1 \quad \text{(XOR cancellation)} \\ \\ \text{Output} &= (R_3', L_3') = (L_1, R_1) \quad \checkmark \end{aligned} $$

Key Insight: The XOR operation's property $a \oplus b \oplus b = a$ ensures that each transformation is reversed when applied twice.

Database Implementation:

In the database trigger implementation, this means:

-- Encryption: seq → data part of id
data_component = feistel_cipher(seq, data_bits, key)

-- Decryption: data part of id → seq (using the same function!)
seq = feistel_cipher(data_component, data_bits, key)

Key Properties

  • Deterministic: Same input always produces same output
  • Non-sequential: Sequential inputs produce seemingly random outputs
  • Collision-free: One-to-one mapping within the bit range

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •  

Languages