Skip to content

Allow custom schema name in data Connect config#10059

Merged
mtr002 merged 1 commit into
mainfrom
mtr
Mar 10, 2026
Merged

Allow custom schema name in data Connect config#10059
mtr002 merged 1 commit into
mainfrom
mtr

Conversation

@mtr002
Copy link
Copy Markdown
Contributor

@mtr002 mtr002 commented Mar 10, 2026

CLI TEST:
Screenshot 2026-03-10 at 1 20 30 PM

@mtr002 mtr002 requested a review from fredzqm March 10, 2026 18:41
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces significant flexibility for Data Connect users by allowing them to define and utilize a custom PostgreSQL schema name. This change enables better organization of database objects and improved integration with existing PostgreSQL environments where a non-public schema is preferred. The modifications span configuration, core logic for identifying database components, and command-line tools, ensuring that the specified schema is consistently applied across all relevant Data Connect operations.

Highlights

  • Custom PostgreSQL Schema Support: Added the ability to specify a custom PostgreSQL schema name within the Data Connect configuration, moving beyond the default 'public' schema.
  • Schema Name Integration: Updated core functions like getIdentifiers, ensureServiceIsConnectedToCloudSql, and schema migration processes to correctly retrieve and utilize the configured schema name for database operations.
  • SQL Shell Functionality: Modified the Data Connect SQL shell to automatically set the search_path to the specified custom schema, ensuring unqualified table names resolve correctly.
  • Comprehensive Testing: Introduced new unit tests for getIdentifiers to validate the correct extraction of custom schema names, the default 'public' behavior, and robust error handling for missing database or CloudSQL instance configurations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • schema/dataconnect-yaml.json
    • Added a new schema property to the PostgreSQL configuration, allowing users to specify a custom schema name.
  • src/commands/dataconnect-sql-setup.ts
    • Removed the import of DEFAULT_SCHEMA as schema name is now dynamically determined.
    • Updated getIdentifiers to also return the schemaName.
    • Modified ensureServiceIsConnectedToCloudSql and getSchemaMetadata calls to pass the newly retrieved schemaName.
  • src/commands/dataconnect-sql-shell.ts
    • Updated getIdentifiers to extract the schemaName.
    • Added a SQL command to set the search_path to the configured schemaName when connecting to the database shell.
  • src/dataconnect/schemaMigration.spec.ts
    • Imported getIdentifiers for testing.
    • Added new test cases for getIdentifiers to verify custom schema name extraction, default 'public' schema behavior, and error handling for missing database or CloudSQL instance configurations.
  • src/dataconnect/schemaMigration.ts
    • Modified setupSchemaIfNecessary to accept schemaName as a parameter and use it for schema metadata retrieval.
    • Updated diffSchema, migrateSchema, and grantRoleToUserInSchema functions to retrieve schemaName from getIdentifiers and pass it to subsequent calls.
    • Extended the return type of getIdentifiers to include schemaName and implemented logic to extract it from the datasource configuration, defaulting to DEFAULT_SCHEMA.
    • Modified handleIncompatibleSchemaError to accept and utilize schemaName for schema metadata and role management.
    • Updated ensureServiceIsConnectedToCloudSql to accept an optional schemaName parameter and include it when updating the schema configuration.
  • src/dataconnect/types.ts
    • Added an optional schema property to the PostgreSql interface.
    • Added an optional schema property to the DatasourceYaml interface within the postgresql object.
    • Updated the toDatasource function to map the new schema property from DatasourceYaml to PostgreSql.
  • src/gcp/cloudsql/permissionsSetup.ts
    • Imported DEFAULT_SCHEMA.
    • Modified greenFieldSchemaSetup to use firebaseowner(databaseId, schema) when revoking cloudsqlsuperuser.
    • Updated grantRoleTo to accept an optional schema parameter (defaulting to DEFAULT_SCHEMA) and use it when constructing the fdcSqlRole.
Activity
  • No specific reviewer activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces the ability to specify a custom PostgreSQL schema name in the Data Connect configuration, enhancing its flexibility. However, a critical security vulnerability has been identified: the implementation is susceptible to SQL injection. The schemaName is directly interpolated into several SQL queries without proper escaping or sanitization, posing a risk where arbitrary SQL could be executed with the privileges of the database user if the dataconnect.yaml configuration is influenced by external contributors. To address this, it is essential to properly escape all identifiers using double quotes (and doubling any existing double quotes) or to use parameterized queries where supported.

const conn: pg.PoolClient = await pool.connect();

// Set search_path to the configured PostgreSQL schema so unqualified table names resolve correctly.
await conn.query(`SET search_path TO "${schemaName}"`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The schemaName variable is interpolated directly into a SQL query without proper escaping. Since schemaName is derived from the user-controlled dataconnect.yaml file, this allows for SQL injection. An attacker could provide a malicious schema name like public"; DROP TABLE users; -- to execute arbitrary SQL commands when a developer runs the SQL shell.

Suggested change
await conn.query(`SET search_path TO "${schemaName}"`);
await conn.query('SET search_path TO "' + schemaName.replace(/"/g, '""') + '"');

databaseId,
[`SET ROLE "${firebaseowner(databaseId)}"`, ...commandsToExecuteByOwner.map((d) => d.sql)],
[
`SET ROLE "${firebaseowner(databaseId, schemaName)}"`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The schemaName variable is used to construct a SET ROLE SQL command. Neither the schemaName nor the resulting role name are escaped, leading to a SQL injection vulnerability. A malicious schema name could be used to break out of the double quotes and execute arbitrary SQL during the migration process.

Suggested change
`SET ROLE "${firebaseowner(databaseId, schemaName)}"`,
'SET ROLE "' + firebaseowner(databaseId, schemaName).replace(/"/g, '""') + '"',

await cloudSqlAdminClient.createUser(projectId, instanceId, mode, user);

const fdcSqlRole = fdcSqlRoleMap[role as keyof typeof fdcSqlRoleMap](databaseId);
const fdcSqlRole = fdcSqlRoleMap[role as keyof typeof fdcSqlRoleMap](databaseId, schema);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The schema parameter, which is now user-controlled via the configuration, is used to construct the fdcSqlRole. This role name is later interpolated into SQL commands like GRANT without proper escaping. This allows for SQL injection if a malicious schema name is provided in the configuration file.

Copy link
Copy Markdown
Contributor

@fredzqm fredzqm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Great work!!

Can you add a CHANGELOG to announce this feature?

Have you verified that it works with a prod instance?

@mtr002 mtr002 force-pushed the mtr branch 5 times, most recently from b47578b to 803dda2 Compare March 10, 2026 20:31
@mtr002 mtr002 enabled auto-merge (squash) March 10, 2026 21:20
- Added support for specifying a custom PostgreSQL schema name in the configuration.
- Updated the `getIdentifiers` function to return the schema name, defaulting to 'public' if not specified.
- Modified relevant functions to utilize the schema name for database operations.
- Added tests to validate schema name handling in various scenarios.
@mtr002 mtr002 merged commit e4764b7 into main Mar 10, 2026
46 of 47 checks passed
@mtr002 mtr002 deleted the mtr branch March 10, 2026 21:31
andrewbrook pushed a commit that referenced this pull request Mar 25, 2026
- Added support for specifying a custom PostgreSQL schema name in the configuration.
- Updated the `getIdentifiers` function to return the schema name, defaulting to 'public' if not specified.
- Modified relevant functions to utilize the schema name for database operations.
- Added tests to validate schema name handling in various scenarios.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants