Skip to content

[SITES-41461] feat: add PlgOnboarding model for PLG ASO onboarding#1417

Merged
Kanishkavijay39 merged 9 commits intomainfrom
feat/plg-onboarding-model
Mar 12, 2026
Merged

[SITES-41461] feat: add PlgOnboarding model for PLG ASO onboarding#1417
Kanishkavijay39 merged 9 commits intomainfrom
feat/plg-onboarding-model

Conversation

@Kanishkavijay39
Copy link
Copy Markdown
Contributor

@Kanishkavijay39 Kanishkavijay39 commented Mar 9, 2026

Wiki: https://wiki.corp.adobe.com/pages/viewpage.action?spaceKey=AEMSites&title=ASO+PLG+Onboarding+Backend+Design

Summary

  • Add new PlgOnboarding model and collection to spacecat-shared-data-access to track the self-service onboarding lifecycle for PLG ASO customers
  • Define schema with attributes: imsOrgId, domain, baseURL, status, siteId, organizationId, steps, error, botBlocker, waitlistReason, completedAt
  • Add indexes for querying by imsOrgId, imsOrgId+domain, status, and baseURL
  • Support statuses: IN_PROGRESS, ONBOARDED, ERROR, WAITING_FOR_IP_WHITELISTING, WAITLISTED
  • Register entity in the entity registry and export types
  • Add unit tests for model and collection

Related Issue

Please ensure your pull request adheres to the following guidelines:

  • make sure to link the related issues in this description
  • when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes

Related Issues

Thanks for contributing!

Kanishka added 2 commits March 9, 2026 22:47
Add model, collection, schema, and TypeScript types for the
PlgOnboarding entity that tracks self-service ASO onboarding lifecycle.

Indexes: byImsOrgId, byImsOrgIdAndDomain, byStatus, byBaseURL.
@Kanishkavijay39 Kanishkavijay39 changed the title [WIP]Feat/PLG onboarding model [SITES-41461] feat: add PlgOnboarding model for PLG ASO onboarding Mar 10, 2026
Copy link
Copy Markdown
Member

@solaris007 solaris007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Kanishkavijay39,

Good work standing up the PlgOnboarding entity - the model follows the established patterns cleanly and the test coverage is solid. A few things to address before merging.

Strengths

  • Follows the established model/collection/schema/index pattern precisely (file structure, entity registry, barrel exports)
  • Good index strategy covering the key query patterns (by org, by org+domain, by status, by baseURL)
  • Schema-level enum enforcement on status, isObject validation on JSONB fields, isIsoDate on completedAt
  • Tests cover all getters/setters, constructor, and STATUSES constant
  • TypeScript types are thorough with getter/setter declarations and collection methods

Important (should fix before merge)

1. imsOrgId must be readOnly with format validation

plg-onboarding.schema.js - The imsOrgId is the tenant identity anchor but is currently mutable via setImsOrgId(). The Consumer model (consumer.schema.js) is the direct precedent and explicitly marks imsOrgId as readOnly: true with regex validation:

.addAttribute('imsOrgId', {
  type: 'string',
  required: true,
  readOnly: true,
  validate: (value) => /^[a-z0-9]{24}@AdobeOrg$/i.test(value),
})

Without readOnly, a caller can reassign an onboarding record to a different org via record.setImsOrgId('OTHER@AdobeOrg').save() - this is a tenant isolation gap. Also remove setImsOrgId from index.d.ts and update the test that exercises it.

2. STATUSES key/value mismatch - WAITING_FOR_IP_ALLOWLISTING vs WAITING_FOR_IP_WHITELISTING

plg-onboarding.model.js:30 - The key says ALLOWLISTING but the stored value says WHITELISTING:

WAITING_FOR_IP_ALLOWLISTING: 'WAITING_FOR_IP_WHITELISTING',

Every developer who references PlgOnboarding.STATUSES.WAITING_FOR_IP_ALLOWLISTING will expect the persisted value to match the key name. Anyone filtering by status string in SQL, logs, or Coralogix dashboards has to know the "real" value differs from the constant name. Since this is a new entity with no existing data, align both key and value to WAITING_FOR_IP_ALLOWLISTING (the Adobe inclusive language standard) and coordinate with mysticat-data-service PR #125 to update the Postgres enum value too.

3. Missing baseURL format validation

plg-onboarding.schema.js - Every other baseURL attribute in this repo validates with isValidUrl (site.schema.js, site-candidate.schema.js, import-job.schema.js, scrape-job.schema.js). This schema accepts any string. Since baseURL is also an index partition key, invalid values would corrupt the index. Add:

.addAttribute('baseURL', {
  type: 'string',
  required: true,
  validate: (value) => isValidUrl(value),
})

4. siteId and organizationId - consider addReference or at minimum add UUID validation

Every other entity that holds FK references to Site/Organization uses addReference('belongs_to', ...) which auto-generates UUID validation and navigational methods (getOrganization(), getSite()). Plain string attributes with no validation allow garbage values to be stored. At minimum add:

.addAttribute('siteId', {
  type: 'string',
  required: false,
  validate: (value) => !value || isValidUUID(value),
})

Note: using addReference would consume 2 of the 5 GSI slots (4 already used), so evaluate which indexes are essential if you go that route. If you intentionally keep these as plain attributes (because the onboarding record creates Site/Org rather than belonging to pre-existing ones), add a code comment explaining why.

5. TypeScript collection methods are incomplete

plg-onboarding/index.d.ts - The compound accessor methods generated by the indexes are missing from the type declarations:

  • allByImsOrgIdAndUpdatedAt / findByImsOrgIdAndUpdatedAt (from index 1)
  • allByBaseURLAndStatus / findByBaseURLAndStatus (from index 4)
  • allByStatusAndUpdatedAt / findByStatusAndUpdatedAt (from index 3)

TypeScript consumers calling these will get type errors even though the methods exist at runtime.

Minor (nice to have)

  • domain should be readOnly: It forms part of the unique identity (imsOrgId + domain index). Mutating it post-creation breaks lifecycle semantics.
  • domain has no format validation: A hostname regex or length cap would prevent garbage data in the index sort key.
  • Copyright year: Source files use Copyright 2025, test files use Copyright 2026. Use 2026 consistently for new code.
  • Consider type: 'map' with properties for steps and botBlocker since the shapes are known - provides schema documentation and prevents structural drift (see import-job.schema.js:118 for the initiatedBy pattern).
  • Consider a default for status: default: 'IN_PROGRESS' would reduce boilerplate in callers, matching how opportunity.schema.js uses default: 'NEW'.

Recommendations (non-blocking)

  • Add schema-level validation tests (invalid status rejected, completedAt rejects non-ISO, required fields enforced)
  • The allByStatus and allByBaseURL collection methods return cross-tenant results. If these are intended for internal/admin use only, add a code comment clarifying that. Access control enforcement belongs in the API layer but the data model should signal intent.

- Make imsOrgId readOnly with regex validation to prevent tenant reassignment
- Align STATUSES key/value: WAITING_FOR_IP_ALLOWLISTING (inclusive language)
- Add baseURL validation with isValidUrl
- Add UUID validation for siteId and organizationId
- Add missing compound collection methods to TypeScript types
- Remove setImsOrgId from types and tests
- Update copyright year to 2026

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

This PR will trigger a minor release when merged.

- Make domain readOnly with hostname format validation
- Add default status IN_PROGRESS to reduce caller boilerplate
- Use type: 'map' with properties for steps and botBlocker for schema docs
- Remove setDomain from TypeScript types and tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Kanishkavijay39
Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review! Addressed all feedback.

Important (all 5 fixed):

  1. imsOrgId — now readOnly: true with /^[a-z0-9]{24}@AdobeOrg$/i regex validation. Removed setImsOrgId from TypeScript types and tests.
  2. STATUSES — aligned key and value to WAITING_FOR_IP_ALLOWLISTING. Will coordinate with mysticat-data-service PR chore(deps): update dependency @adobe/eslint-config-helix to v2.0.6 #125 for the Postgres enum update.
  3. baseURL — added isValidUrl validation, consistent with site.schema.js and other schemas.
  4. siteId/organizationId — added isValidUUID validation. Kept as plain attributes (not addReference) since the onboarding record creates Site/Org during the flow rather than belonging to pre-existing ones.
  5. TypeScript types — added all 6 missing compound collection methods (allByImsOrgIdAndUpdatedAt, findByImsOrgIdAndUpdatedAt, allByStatusAndUpdatedAt, findByStatusAndUpdatedAt, allByBaseURLAndStatus,
    findByBaseURLAndStatus).

Minor (all addressed):

  • domain — now readOnly: true with hostname format validation (regex + 253 char length cap)
  • status — added default: PlgOnboarding.STATUSES.IN_PROGRESS
  • steps and botBlocker — changed from type: 'any' to type: 'map' with defined properties
  • Copyright year — updated to 2026 consistently across all files

Copy link
Copy Markdown
Member

@solaris007 solaris007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Kanishkavijay39,

Thanks for the updates - all 9 issues from the previous review have been addressed, and several findings from our cross-PR architectural review (posted on SITES-41461) have been resolved too.

Cross-PR Findings Progress

This update addresses several items from the cross-PR architectural review that evaluated all three PLG onboarding PRs as a system:

Cross-PR Finding Status
#1 (partial): readOnly + format validation on imsOrgId in shared model Fixed - readOnly: true + 24-char hex regex
#3: Align WHITELISTING -> ALLOWLISTING terminology Fixed - key and value both WAITING_FOR_IP_ALLOWLISTING
#4: No validation defense-in-depth (baseURL, domain, siteId, orgId) Mostly fixed - baseURL has isValidUrl, domain has hostname regex, siteId/organizationId have isValidUUID. Only gap: baseURL not readOnly to match PG immutability
#6: Steps shape mismatch between layers Worse - typed map with wrong properties now causes silent data loss (see Critical #1 below)

Good progress - 3 of 4 shared-model findings resolved. The steps shape is the remaining blocker.

Strengths

  • imsOrgId and domain are correctly readOnly with regex validation, preventing tenant reassignment and identity mutation (plg-onboarding.schema.js:28-42)
  • siteId and organizationId now validate with isValidUUID, baseURL with isValidUrl (plg-onboarding.schema.js:43-57)
  • steps and botBlocker use typed map with explicit properties instead of bare any (plg-onboarding.schema.js:58-76)
  • Status enum alignment - key and value both WAITING_FOR_IP_ALLOWLISTING, with IN_PROGRESS as default (plg-onboarding.model.js:26-32)
  • TypeScript declarations are comprehensive - all compound methods from all 4 indexes are present, setters correctly omit readOnly fields (index.d.ts)

Issues

Critical (Must Fix)

1. steps map properties will silently strip 5 of 7 keys written by the API controller - plg-onboarding.schema.js:58-65

ElectroDB's type: 'map' with properties acts as a whitelist - keys not listed in properties are silently removed on save (tywalch/electrodb#389). The schema defines { orgCreated, siteCreated, entitlementCreated, cdnSetup }, but the API controller (PR #1918) writes:

Controller writes Schema allows Preserved?
orgResolved - DROPPED
rumVerified - DROPPED
siteCreated siteCreated Yes
siteResolved - DROPPED
configUpdated - DROPPED
auditsEnabled - DROPPED
entitlementCreated entitlementCreated Yes

This was flagged as cross-PR finding #6 (steps shape mismatch), but the move from type: 'any' to type: 'map' with incorrect properties makes it worse - it's now a data loss bug rather than just a documentation issue.

Fix - either:

(a) Align properties with the controller:

.addAttribute('steps', {
  type: 'map',
  properties: {
    orgResolved: { type: 'boolean' },
    rumVerified: { type: 'boolean' },
    siteCreated: { type: 'boolean' },
    siteResolved: { type: 'boolean' },
    configUpdated: { type: 'boolean' },
    auditsEnabled: { type: 'boolean' },
    entitlementCreated: { type: 'boolean' },
  },
})

(b) Or use type: 'any' with isObject if step keys are still evolving:

.addAttribute('steps', {
  type: 'any',
  validate: (value) => !value || isObject(value),
})

Coordinate with API service PR #1918 before merging either PR.

Important (Should Fix)

2. Domain regex allows uppercase but PG enforces lowercase - plg-onboarding.schema.js:39

The domain validator uses /i (case-insensitive), accepting Example.COM. The PostgreSQL migration (PR #125) has CHECK (domain = lower(domain)), which would reject that value. Remove the /i flag:

validate: (value) => /^[a-z0-9]([a-z0-9-]*[a-z0-9])?(\.[a-z0-9]([a-z0-9-]*[a-z0-9])?)*$/.test(value) && value.length <= 253,

3. Test mock data uses values that fail the new validators - plg-onboarding.model.test.js:34, plg-onboarding.collection.test.js:38

imsOrgId: 'ABC123@AdobeOrg' has only 6 hex characters (validator requires 24). organizationId: 'org-uuid-1' is not a valid UUID. Tests pass because mocks bypass ElectroDB validation, but they never exercise the validation logic.

Fix:

imsOrgId: '1234567890abcdef12345678@AdobeOrg',
organizationId: 'e0491f53-0688-40f7-a443-7d585d79b471',

Minor (Nice to Have)

4. botBlocker.ipsToAllowlist should be type: 'list', not type: 'any' - plg-onboarding.schema.js:72

The controller writes an array of IP strings. type: 'list' with items: { type: 'string' } would preserve type safety.

5. baseURL could be marked readOnly - plg-onboarding.schema.js:43-46

The PG migration revokes UPDATE on base_url. Adding readOnly: true would enforce the same at the application layer (this would fully resolve cross-PR finding #4).

6. Inline regex patterns could be extracted to model constants - plg-onboarding.schema.js:33,39

Extracting to named constants (e.g., PlgOnboarding.IMS_ORG_ID_PATTERN) would make them reusable and testable.

Recommendations

  • Coordinate steps alignment with API service PR #1918 before merging either PR - merging this schema first with mismatched keys creates a window where production writes lose data.
  • Add schema-level validation tests that create entities through the actual schema (not mocks) to catch validator issues like the imsOrgId length requirement.

Assessment

Ready to merge? No
Reasoning: The steps map property mismatch is a data loss bug - 5 of 7 step flags will be silently dropped on every save. This must be fixed and coordinated with the API service PR. The remaining issues (domain case, test data) are straightforward fixes. Good progress on the cross-PR findings - 3 of 4 shared-model items resolved.

@solaris007 solaris007 added the enhancement New feature or request label Mar 11, 2026
Kanishka and others added 2 commits March 11, 2026 22:27
…nboarding

- Align steps map properties with API controller (7 keys: orgResolved,
  rumVerified, siteCreated, siteResolved, configUpdated, auditsEnabled,
  entitlementCreated) to prevent silent data loss
- Remove /i flag from domain regex to match PG lowercase constraint
- Mark baseURL as readOnly to match PG immutability, remove setBaseURL
  from TypeScript types and tests
- Change botBlocker.ipsToAllowlist from type 'any' to type 'list'
- Fix test mock data: use 24-char hex imsOrgId and valid UUIDs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Kanishkavijay39
Copy link
Copy Markdown
Contributor Author

Hi @solaris007 , Thanks for the thorough review! All issues addressed.

Critical:

  • steps map properties aligned with API controller — now has all 7 keys (orgResolved, rumVerified, siteCreated, siteResolved, configUpdated, auditsEnabled,
    entitlementCreated). Data loss bug resolved.

Important:

  • Domain regex — removed /i flag to match PG lower(domain) constraint
  • Test mock data — imsOrgId now uses 24-char hex, organizationId uses valid UUIDs

Minor:

  • botBlocker.ipsToAllowlist — changed to type: 'list' with items: { type: 'string' }
  • baseURL — marked readOnly: true to match PG immutability, removed setBaseURL from TypeScript types

Copy link
Copy Markdown
Member

@solaris007 solaris007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Kanishkavijay39,

All findings from the previous two reviews are addressed.

Previous Findings - All Resolved

Previous Finding Status
Critical #1: Steps map properties cause silent data loss Fixed - now orgResolved, rumVerified, siteCreated, siteResolved, configUpdated, auditsEnabled, entitlementCreated matching controller
Important #2: Domain regex /i allows uppercase, PG enforces lowercase Fixed - /i flag removed
Important #3: Test mock data fails validators Fixed - realistic 24-char hex imsOrgId, valid UUID organizationId
Minor #4: botBlocker.ipsToAllowlist type: 'any' Fixed - type: 'list', items: { type: 'string' }
Minor #5: baseURL not readOnly Fixed - readOnly: true, setBaseURL removed from TypeScript types

Strengths

  • Steps map now aligned across all three PRs - the data loss bug is resolved
  • Domain regex enforces lowercase only, matching the PG CHECK (domain = lower(domain)) constraint
  • All three identity fields (imsOrgId, domain, baseURL) are readOnly with validation - strong defense-in-depth
  • Test mock data uses realistic values that would pass schema validation
  • TypeScript types correctly omit setters for all readOnly fields
  • Clean model, schema, collection, and index design following established patterns

Issues

None blocking.

Minor (Nice to Have)

1. Inline regex patterns could be extracted to model constants - plg-onboarding.schema.js:33,39

The imsOrgId and domain validation patterns are still defined inline. Extracting to named constants would make them reusable and testable. Very low priority.

Assessment

Ready to merge? Yes
Reasoning: All Critical, Important, and Minor findings from the previous two reviews are resolved. The steps map alignment eliminates the data loss risk. The schema is well-designed with proper readOnly enforcement, validation, and typed map properties. This is the second PR in the merge order (after mysticat-data-service PR #125 which is already approved) and is ready to land.

Extract imsOrgId and domain validation patterns to named static
constants (IMS_ORG_ID_PATTERN, DOMAIN_PATTERN) on PlgOnboarding model
for reusability and testability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Kanishkavijay39
Copy link
Copy Markdown
Contributor Author

Thank you @solaris007 for the review! Addressed the minor suggestion as well to extract inline regex patterns to model constants (PlgOnboarding.IMS_ORG_ID_PATTERN, PlgOnboarding.DOMAIN_PATTERN)
in commit 231026e. As I can reuse these same patterns in the API service for consistent validation across layers.

@Kanishkavijay39 Kanishkavijay39 merged commit 1086795 into main Mar 12, 2026
7 checks passed
@Kanishkavijay39 Kanishkavijay39 deleted the feat/plg-onboarding-model branch March 12, 2026 04:33
@solaris007
Copy link
Copy Markdown
Member

🎉 This PR is included in version @adobe/spacecat-shared-data-access-v3.16.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants