Skip to content

Refactor structure_type to block_type and align canonical vocabulary#677

Merged
jfrench9 merged 4 commits into
mainfrom
refactor/info-block-alignment
May 16, 2026
Merged

Refactor structure_type to block_type and align canonical vocabulary#677
jfrench9 merged 4 commits into
mainfrom
refactor/info-block-alignment

Conversation

@jfrench9
Copy link
Copy Markdown
Member

Summary

Large-scale vocabulary alignment refactor that renames structure_type to block_type across the entire codebase and updates concept arrangement patterns in taxonomy definitions to use canonical terminology. This touches 101 files spanning models, operations, migrations, taxonomy packages, GraphQL types, adapters, and tests.

Key Accomplishments

Terminology Alignment: structure_typeblock_type

  • Renamed the structure_type field/column to block_type across all layers of the stack: database models, API models, GraphQL types/resolvers, operations, and test fixtures
  • Updated all references in information block, taxonomy block, and library operations
  • Migrated database schema definitions and migration scripts to reflect the new column name
  • Ensured consistent usage in SEC adapters, QuickBooks pipeline, Arelle context, and MCP middleware tools

Canonical Vocabulary Updates for Taxonomy Packages

  • Updated concept arrangement patterns across all taxonomy JSON-LD packages (FAC, RS-GAAP hierarchy, presentations, calculations, disclosures, reporting styles, bridges, etc.) to align with the canonical vocabulary
  • Refined structure models (robosystems/models/extensions/structure.py) and rule models with expanded type definitions and updated enum values
  • Updated classification models to match the new vocabulary
  • Aligned taxonomy seed data and JSON-LD loaders with the revised terminology

Migration Updates

  • Modified initial schema migration (0001), taxonomy library migration (0002), frameworks/bridges migration (0007), and reporting style migration (0008) to use block_type and updated arrangement patterns
  • Updated extension database helpers to match the new schema

Breaking Changes

  • Database schema change: structure_type column renamed to block_type — requires migration execution before deployment
  • API contract change: Any external consumers referencing structure_type in GraphQL queries, API responses, or model serializations will need to update to block_type
  • Taxonomy package changes: Updated arrangement pattern values in JSON-LD taxonomy files may affect downstream taxonomy consumers or cached taxonomy data

Testing Notes

  • All 28 affected test files have been updated to use the new block_type terminology
  • Tests cover the full scope of changes: information block operations (envelope, metric, statement, schedule, reads, registry, rules), taxonomy block operations (auto rules, library creator, validators), event block handlers, reporting style commands, SEC structure classification, GraphQL extensions, roboledger reports/schedules, and taxonomy seed/presentation/calculation tests
  • Verify that migration scripts apply cleanly on existing databases and that no residual structure_type references remain

Infrastructure Considerations

  • Database migrations must be run as part of deployment to rename the column and update seeded taxonomy data
  • Any cached taxonomy data or materialized views referencing structure_type or old arrangement pattern values should be invalidated/rebuilt
  • Downstream services or integrations consuming the GraphQL API or taxonomy packages should be coordinated for the vocabulary change

🤖 Generated with Claude Code

Branch Info:

  • Source: refactor/info-block-alignment
  • Target: main
  • Type: refactor

Co-Authored-By: Claude noreply@anthropic.com

jfrench9 added 2 commits May 15, 2026 22:39
- Updated all instances of `structure_type` to `block_type` in test files to maintain consistency in terminology.
- Adjusted assertions and mock structures to reflect the change from `structure_type` to `block_type`.
- Modified relevant test descriptions and comments to align with the new terminology.
- Ensured that member arrangements and concept arrangements are updated where necessary, specifically changing "aggregation" to "whole_part" in relevant tests.
@jfrench9
Copy link
Copy Markdown
Member Author

@claude please review this PR

@claude
Copy link
Copy Markdown

claude Bot commented May 16, 2026

Claude finished @jfrench9's task in 5m 24s —— View job


PR Review: structure_typeblock_type Vocabulary Alignment

Tasks:

  • Understand request (code review)
  • Examine PR diff and key changed files
  • Review database migrations
  • Review model changes
  • Review taxonomy/JSON-LD changes
  • Review operation and adapter changes
  • Review test coverage
  • Post comprehensive review feedback

Summary

This is a large, well-executed vocabulary alignment refactor. The rename is complete — no residual structure_type references remain in Python or JSON-LD files. The semantic improvements (rule_pattern/rule_check_kind XOR split, regulatory_disclosure, canonical CAP vocabulary) are all sound decisions. A few issues below need attention before merging.


Issues

🔴 Critical: Missing data migration for block_type='disclosure''regulatory_disclosure'

The CHECK constraints in all migrations and the SQLAlchemy model now require 'regulatory_disclosure' instead of 'disclosure'. The JSON-LD packages correctly emit blockType: "regulatory_disclosure" for new seeds. However, there is no UPDATE statement anywhere in the migrations to rename existing rows that have block_type='disclosure'.

Since the library immutability trigger blocks tenant-scope UPDATEs, and the library seed uses INSERT (not upsert), any existing structures with block_type='disclosure' in public or tenant schemas will silently coexist alongside new 'regulatory_disclosure' rows after deployment. Any filtering logic that uses block_type = 'regulatory_disclosure' will miss the old rows.

A data migration step is needed, e.g. in migration 0007 or 0008's upgrade:

UPDATE public.structures SET block_type = 'regulatory_disclosure'
WHERE block_type = 'disclosure';
-- and repeat for_each_tenant_schema

Fix this →

🔴 Critical: Downgrade paths in 0007 and 0008 restore wrong prior state

_PRIOR_BLOCK_TYPE_CHECK in both 0007_frameworks_bridges.py and 0008_reporting_style.py — which is used for the downgrade to restore the pre-migration CHECK — now contains 'regulatory_disclosure' instead of 'disclosure'. This is semantically wrong: the "prior" constraint (pre-0007) had 'disclosure', not 'regulatory_disclosure'.

If a downgrade is run on a database that still has rows with block_type='disclosure', the restored CHECK will reject future INSERTs of the old value while leaving existing rows unconstrained — an inconsistent state.

The _PRIOR_BLOCK_TYPE_CHECK in both files should retain 'disclosure' (not the new name) since it represents what the constraint looked like before these migrations ran.


🟡 Moderate: member_arrangement='aggregation' rows orphaned if 0002 was previously applied

migrations/0002 line ~1383 changed the backfill default from COALESCE(member_arrangement, 'aggregation') to COALESCE(member_arrangement, 'whole_part'). The new CHECK constraint no longer allows 'aggregation'. Any database where 0002 was already applied (e.g., dev/staging) will have existing rows with member_arrangement='aggregation' that won't be updated — they won't fail the existing CHECK (constraints only enforce new writes), but queries filtering on member_arrangement = 'whole_part' will silently miss them.

A backfill UPDATE should be included:

UPDATE public.structures SET member_arrangement = 'whole_part'
WHERE member_arrangement = 'aggregation';

🟡 Moderate: TaxonomyBlockRuleRequest still requires rule_pattern, no rule_check_kind field

robosystems/models/api/taxonomy_block.py:179 — The user-facing API for creating taxonomy blocks via TaxonomyBlockRuleRequest requires rule_pattern as a mandatory Literal[...] and has no rule_check_kind field. This means the API can only create arithmetic pattern rules; structural check-kind rules (NoCycles, LeafHasClassification, etc.) are silently restricted to auto-rule emission.

This may be intentional (structural checks are system-managed), but the docstring doesn't say so. At minimum, add a note explaining that structural check kinds are injected via emit_auto_rules and cannot be created via the API.


🟢 Minor: _load_reporting_structure internal parameter mismatch

robosystems/operations/roboledger/reports/fact_grid.py:1340 — The function signature uses report_type: str internally while the call site now passes it positionally as block_type. The function's docstring still says "CAP declared on the Disclosure (arithmetic / roll_up / roll_forward / hierarchy)," note the stale hierarchy value.

🟢 Minor: test_auto_rules.py doesn't assert rule_pattern is None for structural auto-rules

tests/operations/taxonomy_block/test_auto_rules.py — Tests verify {r.rule_check_kind for r in added} but don't assert all(r.rule_pattern is None for r in added). The XOR contract (check_rule_pattern_kind_xor) isn't covered in unit tests for the auto-rule path.

Fix this →


Strengths

  • Complete rename: Zero residual structure_type references remain in Python or JSON-LD — thorough grep confirms a clean cut.
  • rule_pattern / rule_check_kind XOR split: Excellent design. Structural checks (graph-walking) were wrongly lumped with arithmetic patterns (fact-value evaluation). The new XOR CHECK constraint + separate columns makes this distinction explicit and DB-enforced.
  • CAP vocabulary alignment: component and hierarchyset correctly maps to Charlie Hoffman's canonical enumeration; level1_textblock retained as the cm.xsd specialization. The 15-value closed vocabulary is now enforced by a CHECK constraint.
  • member_arrangement 5-value spectrum: is_a | whole_part | nested_whole_part | two_dimension_aggregation | complex_aggregating_whole_part properly captures the aggregation continuum rather than a binary flag.
  • parenthetical_noterenderer_note: The new comment correctly distinguishes the renderer hint from XBRL footnote parentheticals (Charlie's parenthetical is fact-level, not structure-level).
  • auto_rules.py refactor: All structural check kinds are correctly moved to rule_check_kind, with rule_pattern=None explicitly set — implementation is correct even if the unit test assertion is missing.
  • Test coverage breadth: 28 test files updated consistently across all layers.

@jfrench9 jfrench9 merged commit 814ec20 into main May 16, 2026
7 checks passed
@jfrench9 jfrench9 deleted the refactor/info-block-alignment branch May 16, 2026 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant