Skip to content

Proposal: CTE-Based Shard Routing #265

@niclaflamme

Description

@niclaflamme

Overview

We currently support two approaches for manually forcing shard routing:

  • Comment: /* pgdog_shard: 1234 */ SELECT * from x
  • Session SET: SET pgdog.shard = 5; + SELECT * from x (two round trips).

My understanding is that Session SET is effectively a hack for ORM users who don't have an easy way to inject comments. I say this because any application developer using raw SQL would obviously opt for the comment approach and save a round trip and connection management mental overhead.

I propose we add a new, better hack to circumvent the commenting limitations of ORMs.

  • Single-statement CTE

The CTE hack is still a hack, but it's a better one than SETs IMO. To enable the CTE strategy, we plan to reserve the name pgdog_overrides (or an alternate) for this purpose.

It would look like this:

WITH pgdog_overrides AS (
  SELECT $1::INTEGER AS shard
)
SELECT *
FROM comments
WHERE user_id = $2
LIMIT 10;

Clients bind one parameter ($1), the integer shard index.

Scenario

Imagine an internal feedback app where users comment on their company’s wall.

  • The database shards by organization_id
  • The comments table only records user_id and body (ie, no shard key).
  • Every company’s comments live on and only on its organization’s shard.
  • The User struct contains organization_id, so the application can derive the correct shard for users.

Implementation Approaches

Session SET

Pros: Simple, uses standard SQL.

Cons: Requires two round trips per query (SET, then query). Not atomic with the query.

Single-statement CTE

Pros: One round trip. Shard context travels with the statement. Atomic.

Cons: Requires proxy support for parsing the reserved CTE.

Rust Example (sqlx)

async fn create_comment(
    pool: &sqlx::PgPool,
    user: &User,
    text: &str,
) -> Result<Comment, sqlx::Error> {
    sqlx::query_as!(Comment,
        r#"
        WITH pgdog_overrides AS (
          SELECT $1::integer AS shard
        )
        INSERT INTO comments (user_id, body)
        VALUES ($2, $3)
        RETURNING *
        "#,
        user.organization_id, // binds to $1
        user.id,              // binds to $2
        text                  // binds to $3
    )
    .fetch_one(pool)
    .await
}

ActiveRecord Example

The whole purpose of the SET hack is to support active record. This is how you would do it with the override CTE.

Define a one-time helper in ApplicationRecord:

# app/models/application_record.rb
class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  scope :with_shard, ->(shard) {
    cte_sql = "SELECT #{connection.quote(shard.to_i)}::integer AS shard"
    with(pgdog_overrides: Arel.sql(cte_sql))
  }
end

Usage:

comments = Comment
  .with_shard(user.organization_id) #  🎉 TADA!
  .where(user_id: user.id)
  .limit(10)

Emits:

WITH pgdog_overrides AS (
  SELECT 12::integer AS shard
)
SELECT "comments".*
FROM "comments"
WHERE "comments"."user_id" = 12345
LIMIT 10;

Implementation Steps

  • Reserve CTE name pgdog_overrides in the proxy parser.
  • In QueryParser, detect the pgdog_overrides CTE at the start of the query.
  • Extract the integer from the shard column. Mark routed = true.
  • Route the query to the specified shard index.
  • Strip the CTE before forwarding the statement to the backend. (100% optional, it's more to have undecorated usage logs in our users' database dashboards)
  • Deprecate SET?

Metadata

Metadata

Assignees

Labels

No fields configured for Feature.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions