[SITES-41461] feat: add PlgOnboarding model for PLG ASO onboarding#1417
[SITES-41461] feat: add PlgOnboarding model for PLG ASO onboarding#1417Kanishkavijay39 merged 9 commits intomainfrom
Conversation
Add model, collection, schema, and TypeScript types for the PlgOnboarding entity that tracks self-service ASO onboarding lifecycle. Indexes: byImsOrgId, byImsOrgIdAndDomain, byStatus, byBaseURL.
solaris007
left a comment
There was a problem hiding this comment.
Hey @Kanishkavijay39,
Good work standing up the PlgOnboarding entity - the model follows the established patterns cleanly and the test coverage is solid. A few things to address before merging.
Strengths
- Follows the established model/collection/schema/index pattern precisely (file structure, entity registry, barrel exports)
- Good index strategy covering the key query patterns (by org, by org+domain, by status, by baseURL)
- Schema-level enum enforcement on status,
isObjectvalidation on JSONB fields,isIsoDateoncompletedAt - Tests cover all getters/setters, constructor, and STATUSES constant
- TypeScript types are thorough with getter/setter declarations and collection methods
Important (should fix before merge)
1. imsOrgId must be readOnly with format validation
plg-onboarding.schema.js - The imsOrgId is the tenant identity anchor but is currently mutable via setImsOrgId(). The Consumer model (consumer.schema.js) is the direct precedent and explicitly marks imsOrgId as readOnly: true with regex validation:
.addAttribute('imsOrgId', {
type: 'string',
required: true,
readOnly: true,
validate: (value) => /^[a-z0-9]{24}@AdobeOrg$/i.test(value),
})Without readOnly, a caller can reassign an onboarding record to a different org via record.setImsOrgId('OTHER@AdobeOrg').save() - this is a tenant isolation gap. Also remove setImsOrgId from index.d.ts and update the test that exercises it.
2. STATUSES key/value mismatch - WAITING_FOR_IP_ALLOWLISTING vs WAITING_FOR_IP_WHITELISTING
plg-onboarding.model.js:30 - The key says ALLOWLISTING but the stored value says WHITELISTING:
WAITING_FOR_IP_ALLOWLISTING: 'WAITING_FOR_IP_WHITELISTING',Every developer who references PlgOnboarding.STATUSES.WAITING_FOR_IP_ALLOWLISTING will expect the persisted value to match the key name. Anyone filtering by status string in SQL, logs, or Coralogix dashboards has to know the "real" value differs from the constant name. Since this is a new entity with no existing data, align both key and value to WAITING_FOR_IP_ALLOWLISTING (the Adobe inclusive language standard) and coordinate with mysticat-data-service PR #125 to update the Postgres enum value too.
3. Missing baseURL format validation
plg-onboarding.schema.js - Every other baseURL attribute in this repo validates with isValidUrl (site.schema.js, site-candidate.schema.js, import-job.schema.js, scrape-job.schema.js). This schema accepts any string. Since baseURL is also an index partition key, invalid values would corrupt the index. Add:
.addAttribute('baseURL', {
type: 'string',
required: true,
validate: (value) => isValidUrl(value),
})4. siteId and organizationId - consider addReference or at minimum add UUID validation
Every other entity that holds FK references to Site/Organization uses addReference('belongs_to', ...) which auto-generates UUID validation and navigational methods (getOrganization(), getSite()). Plain string attributes with no validation allow garbage values to be stored. At minimum add:
.addAttribute('siteId', {
type: 'string',
required: false,
validate: (value) => !value || isValidUUID(value),
})Note: using addReference would consume 2 of the 5 GSI slots (4 already used), so evaluate which indexes are essential if you go that route. If you intentionally keep these as plain attributes (because the onboarding record creates Site/Org rather than belonging to pre-existing ones), add a code comment explaining why.
5. TypeScript collection methods are incomplete
plg-onboarding/index.d.ts - The compound accessor methods generated by the indexes are missing from the type declarations:
allByImsOrgIdAndUpdatedAt/findByImsOrgIdAndUpdatedAt(from index 1)allByBaseURLAndStatus/findByBaseURLAndStatus(from index 4)allByStatusAndUpdatedAt/findByStatusAndUpdatedAt(from index 3)
TypeScript consumers calling these will get type errors even though the methods exist at runtime.
Minor (nice to have)
domainshould bereadOnly: It forms part of the unique identity (imsOrgId+domainindex). Mutating it post-creation breaks lifecycle semantics.domainhas no format validation: A hostname regex or length cap would prevent garbage data in the index sort key.- Copyright year: Source files use
Copyright 2025, test files useCopyright 2026. Use 2026 consistently for new code. - Consider
type: 'map'withpropertiesforstepsandbotBlockersince the shapes are known - provides schema documentation and prevents structural drift (seeimport-job.schema.js:118for theinitiatedBypattern). - Consider a
defaultforstatus:default: 'IN_PROGRESS'would reduce boilerplate in callers, matching howopportunity.schema.jsusesdefault: 'NEW'.
Recommendations (non-blocking)
- Add schema-level validation tests (invalid status rejected,
completedAtrejects non-ISO, required fields enforced) - The
allByStatusandallByBaseURLcollection methods return cross-tenant results. If these are intended for internal/admin use only, add a code comment clarifying that. Access control enforcement belongs in the API layer but the data model should signal intent.
- Make imsOrgId readOnly with regex validation to prevent tenant reassignment - Align STATUSES key/value: WAITING_FOR_IP_ALLOWLISTING (inclusive language) - Add baseURL validation with isValidUrl - Add UUID validation for siteId and organizationId - Add missing compound collection methods to TypeScript types - Remove setImsOrgId from types and tests - Update copyright year to 2026 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
This PR will trigger a minor release when merged. |
- Make domain readOnly with hostname format validation - Add default status IN_PROGRESS to reduce caller boilerplate - Use type: 'map' with properties for steps and botBlocker for schema docs - Remove setDomain from TypeScript types and tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the detailed review! Addressed all feedback. Important (all 5 fixed):
Minor (all addressed):
|
solaris007
left a comment
There was a problem hiding this comment.
Hey @Kanishkavijay39,
Thanks for the updates - all 9 issues from the previous review have been addressed, and several findings from our cross-PR architectural review (posted on SITES-41461) have been resolved too.
Cross-PR Findings Progress
This update addresses several items from the cross-PR architectural review that evaluated all three PLG onboarding PRs as a system:
| Cross-PR Finding | Status |
|---|---|
| #1 (partial): readOnly + format validation on imsOrgId in shared model | Fixed - readOnly: true + 24-char hex regex |
| #3: Align WHITELISTING -> ALLOWLISTING terminology | Fixed - key and value both WAITING_FOR_IP_ALLOWLISTING |
| #4: No validation defense-in-depth (baseURL, domain, siteId, orgId) | Mostly fixed - baseURL has isValidUrl, domain has hostname regex, siteId/organizationId have isValidUUID. Only gap: baseURL not readOnly to match PG immutability |
| #6: Steps shape mismatch between layers | Worse - typed map with wrong properties now causes silent data loss (see Critical #1 below) |
Good progress - 3 of 4 shared-model findings resolved. The steps shape is the remaining blocker.
Strengths
imsOrgIdanddomainare correctlyreadOnlywith regex validation, preventing tenant reassignment and identity mutation (plg-onboarding.schema.js:28-42)siteIdandorganizationIdnow validate withisValidUUID,baseURLwithisValidUrl(plg-onboarding.schema.js:43-57)stepsandbotBlockeruse typedmapwith explicit properties instead of bareany(plg-onboarding.schema.js:58-76)- Status enum alignment - key and value both
WAITING_FOR_IP_ALLOWLISTING, withIN_PROGRESSas default (plg-onboarding.model.js:26-32) - TypeScript declarations are comprehensive - all compound methods from all 4 indexes are present, setters correctly omit readOnly fields (
index.d.ts)
Issues
Critical (Must Fix)
1. steps map properties will silently strip 5 of 7 keys written by the API controller - plg-onboarding.schema.js:58-65
ElectroDB's type: 'map' with properties acts as a whitelist - keys not listed in properties are silently removed on save (tywalch/electrodb#389). The schema defines { orgCreated, siteCreated, entitlementCreated, cdnSetup }, but the API controller (PR #1918) writes:
| Controller writes | Schema allows | Preserved? |
|---|---|---|
orgResolved |
- | DROPPED |
rumVerified |
- | DROPPED |
siteCreated |
siteCreated |
Yes |
siteResolved |
- | DROPPED |
configUpdated |
- | DROPPED |
auditsEnabled |
- | DROPPED |
entitlementCreated |
entitlementCreated |
Yes |
This was flagged as cross-PR finding #6 (steps shape mismatch), but the move from type: 'any' to type: 'map' with incorrect properties makes it worse - it's now a data loss bug rather than just a documentation issue.
Fix - either:
(a) Align properties with the controller:
.addAttribute('steps', {
type: 'map',
properties: {
orgResolved: { type: 'boolean' },
rumVerified: { type: 'boolean' },
siteCreated: { type: 'boolean' },
siteResolved: { type: 'boolean' },
configUpdated: { type: 'boolean' },
auditsEnabled: { type: 'boolean' },
entitlementCreated: { type: 'boolean' },
},
})(b) Or use type: 'any' with isObject if step keys are still evolving:
.addAttribute('steps', {
type: 'any',
validate: (value) => !value || isObject(value),
})Coordinate with API service PR #1918 before merging either PR.
Important (Should Fix)
2. Domain regex allows uppercase but PG enforces lowercase - plg-onboarding.schema.js:39
The domain validator uses /i (case-insensitive), accepting Example.COM. The PostgreSQL migration (PR #125) has CHECK (domain = lower(domain)), which would reject that value. Remove the /i flag:
validate: (value) => /^[a-z0-9]([a-z0-9-]*[a-z0-9])?(\.[a-z0-9]([a-z0-9-]*[a-z0-9])?)*$/.test(value) && value.length <= 253,3. Test mock data uses values that fail the new validators - plg-onboarding.model.test.js:34, plg-onboarding.collection.test.js:38
imsOrgId: 'ABC123@AdobeOrg' has only 6 hex characters (validator requires 24). organizationId: 'org-uuid-1' is not a valid UUID. Tests pass because mocks bypass ElectroDB validation, but they never exercise the validation logic.
Fix:
imsOrgId: '1234567890abcdef12345678@AdobeOrg',
organizationId: 'e0491f53-0688-40f7-a443-7d585d79b471',Minor (Nice to Have)
4. botBlocker.ipsToAllowlist should be type: 'list', not type: 'any' - plg-onboarding.schema.js:72
The controller writes an array of IP strings. type: 'list' with items: { type: 'string' } would preserve type safety.
5. baseURL could be marked readOnly - plg-onboarding.schema.js:43-46
The PG migration revokes UPDATE on base_url. Adding readOnly: true would enforce the same at the application layer (this would fully resolve cross-PR finding #4).
6. Inline regex patterns could be extracted to model constants - plg-onboarding.schema.js:33,39
Extracting to named constants (e.g., PlgOnboarding.IMS_ORG_ID_PATTERN) would make them reusable and testable.
Recommendations
- Coordinate
stepsalignment with API service PR #1918 before merging either PR - merging this schema first with mismatched keys creates a window where production writes lose data. - Add schema-level validation tests that create entities through the actual schema (not mocks) to catch validator issues like the imsOrgId length requirement.
Assessment
Ready to merge? No
Reasoning: The steps map property mismatch is a data loss bug - 5 of 7 step flags will be silently dropped on every save. This must be fixed and coordinated with the API service PR. The remaining issues (domain case, test data) are straightforward fixes. Good progress on the cross-PR findings - 3 of 4 shared-model items resolved.
…nboarding - Align steps map properties with API controller (7 keys: orgResolved, rumVerified, siteCreated, siteResolved, configUpdated, auditsEnabled, entitlementCreated) to prevent silent data loss - Remove /i flag from domain regex to match PG lowercase constraint - Mark baseURL as readOnly to match PG immutability, remove setBaseURL from TypeScript types and tests - Change botBlocker.ipsToAllowlist from type 'any' to type 'list' - Fix test mock data: use 24-char hex imsOrgId and valid UUIDs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Hi @solaris007 , Thanks for the thorough review! All issues addressed. Critical:
Important:
Minor:
|
solaris007
left a comment
There was a problem hiding this comment.
Hey @Kanishkavijay39,
All findings from the previous two reviews are addressed.
Previous Findings - All Resolved
| Previous Finding | Status |
|---|---|
| Critical #1: Steps map properties cause silent data loss | Fixed - now orgResolved, rumVerified, siteCreated, siteResolved, configUpdated, auditsEnabled, entitlementCreated matching controller |
Important #2: Domain regex /i allows uppercase, PG enforces lowercase |
Fixed - /i flag removed |
| Important #3: Test mock data fails validators | Fixed - realistic 24-char hex imsOrgId, valid UUID organizationId |
Minor #4: botBlocker.ipsToAllowlist type: 'any' |
Fixed - type: 'list', items: { type: 'string' } |
| Minor #5: baseURL not readOnly | Fixed - readOnly: true, setBaseURL removed from TypeScript types |
Strengths
- Steps map now aligned across all three PRs - the data loss bug is resolved
- Domain regex enforces lowercase only, matching the PG
CHECK (domain = lower(domain))constraint - All three identity fields (
imsOrgId,domain,baseURL) arereadOnlywith validation - strong defense-in-depth - Test mock data uses realistic values that would pass schema validation
- TypeScript types correctly omit setters for all readOnly fields
- Clean model, schema, collection, and index design following established patterns
Issues
None blocking.
Minor (Nice to Have)
1. Inline regex patterns could be extracted to model constants - plg-onboarding.schema.js:33,39
The imsOrgId and domain validation patterns are still defined inline. Extracting to named constants would make them reusable and testable. Very low priority.
Assessment
Ready to merge? Yes
Reasoning: All Critical, Important, and Minor findings from the previous two reviews are resolved. The steps map alignment eliminates the data loss risk. The schema is well-designed with proper readOnly enforcement, validation, and typed map properties. This is the second PR in the merge order (after mysticat-data-service PR #125 which is already approved) and is ready to land.
Extract imsOrgId and domain validation patterns to named static constants (IMS_ORG_ID_PATTERN, DOMAIN_PATTERN) on PlgOnboarding model for reusability and testability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thank you @solaris007 for the review! Addressed the minor suggestion as well to extract inline regex patterns to model constants ( |
|
🎉 This PR is included in version @adobe/spacecat-shared-data-access-v3.16.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Wiki: https://wiki.corp.adobe.com/pages/viewpage.action?spaceKey=AEMSites&title=ASO+PLG+Onboarding+Backend+Design
Summary
PlgOnboardingmodel and collection tospacecat-shared-data-accessto track the self-service onboarding lifecycle for PLG ASO customersimsOrgId,domain,baseURL,status,siteId,organizationId,steps,error,botBlocker,waitlistReason,completedAtimsOrgId,imsOrgId+domain,status, andbaseURLIN_PROGRESS,ONBOARDED,ERROR,WAITING_FOR_IP_WHITELISTING,WAITLISTEDRelated Issue
Please ensure your pull request adheres to the following guidelines:
Related Issues
Thanks for contributing!