Skip to content

Refactor: Reorganize storage configurations into dedicated module#131

Merged
jfrench9 merged 2 commits into
mainfrom
refactor/storage-configs
Jan 4, 2026
Merged

Refactor: Reorganize storage configurations into dedicated module#131
jfrench9 merged 2 commits into
mainfrom
refactor/storage-configs

Conversation

@jfrench9
Copy link
Copy Markdown
Member

@jfrench9 jfrench9 commented Jan 4, 2026

Summary

This PR refactors the storage configuration management by creating a dedicated storage module and reorganizing shared data components. The primary goal is to improve code organization and maintainability of storage-related configurations across the system.

Key Changes

  • New Storage Module: Created robosystems/config/storage/ package with proper module structure
  • Configuration Consolidation: Moved DATA_SOURCES and DataSourceType definitions to the new storage module
  • Import Updates: Updated all import statements across 8 files to reference the new module structure
  • Graph Configuration: Added comprehensive graph configuration management (322 lines) in storage/graph.py
  • File Reorganization: Renamed and relocated shared_data.py to storage/shared.py for better organization

Accomplishments

  • ✅ Centralized storage configuration management in a dedicated module
  • ✅ Improved code organization and separation of concerns
  • ✅ Maintained consistency in S3 bucket configurations and data source handling
  • ✅ Updated all dependent modules to use the new import structure
  • ✅ Enhanced maintainability of storage-related configurations

Breaking Changes

⚠️ Import Path Changes: Any external code importing from robosystems.config.shared_data will need to update imports to use the new robosystems.config.storage module structure.

Testing Notes

  • All existing functionality should work unchanged due to proper import updates
  • Test files have been updated to reflect the new module structure
  • Integration tests maintain coverage of subgraph operations
  • Backup service tests updated to use new configuration paths

Infrastructure Considerations

This refactoring improves the system's configuration management architecture by:

  • Creating a more scalable structure for storage-related configurations
  • Establishing clearer boundaries between different types of system configurations
  • Facilitating future enhancements to storage configuration capabilities
  • Reducing coupling between storage configurations and other system components

🤖 Generated with Claude Code

Branch Info:

  • Source: refactor/storage-configs
  • Target: main
  • Type: refactor

Co-Authored-By: Claude noreply@anthropic.com

…Type to a new storage module. Update imports across the codebase to reflect this change, ensuring consistency in S3 bucket configurations and data source handling.
@jfrench9
Copy link
Copy Markdown
Member Author

jfrench9 commented Jan 4, 2026

@claude please review this PR

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@claude
Copy link
Copy Markdown

claude Bot commented Jan 4, 2026

Claude finished @jfrench9's task in 1m 44s —— View job


PR Review Complete ✅

Todo List

  • Read and understand the PR changes
  • Review new storage module structure
  • Analyze import path changes
  • Check test updates
  • Provide comprehensive review feedback

Summary

This refactoring successfully reorganizes storage configurations into a dedicated module with good separation of concerns. The changes are well-structured and improve code organization. I've identified a few areas for improvement below.

Strengths

  1. Clear Module Organization - The new robosystems/config/storage/ structure creates a logical separation between shared data sources and graph-specific storage configurations.

  2. Comprehensive Graph Storage Helpers - The new storage/graph.py file (322 lines) provides excellent helper functions for S3 path construction across different graph storage types (staging, backups, databases).

  3. Complete Import Updates - All 8 dependent files have been correctly updated to use the new import paths.

  4. No Breaking Functionality - The refactoring is purely organizational; the actual code in shared.py is identical to the original shared_data.py.

  5. Good Documentation - The module docstrings clearly explain the S3 path structures and usage patterns.

Issues & Recommendations

🔴 Critical

1. Inconsistent Import Usage in backup.py (robosystems/operations/lbug/backup.py:76-77)

The backup service uses env.USER_DATA_BUCKET but the new storage/graph.py module doesn't seem to be imported or used. The backup service manually constructs S3 paths like:

self.s3_prefix = f"graph-databases/{environment}"

Recommendation: The backup service should use the new graph storage helpers:

from robosystems.config.storage import graph

# In __init__:
self.s3_prefix = graph.get_instance_backup_prefix(environment).rstrip('/')

# In backup_graph_database:
s3_key = graph.get_instance_backup_key(self.environment, graph_id, start_time)

This would eliminate hardcoded path construction and ensure consistency with the centralized configuration.

🟡 Minor

2. Test Import Changes Missing Context (tests/operations/lbug/test_backup_service.py)

The test file imports were updated from:

from robosystems.config.shared_data import ...

However, I don't see these imports being used in the test file. The test file appears to only import from robosystems.operations.lbug.backup. Could you verify whether these storage imports are actually needed in the test file?

3. Removed Test Code Without Explanation (tests/integration/test_subgraph_operations.py)

25 lines were removed from the integration test file. While this is noted in the PR description, there's no explanation of what was removed or why. Looking at the remaining code, it appears to be setup/fixture code, but it would be helpful to document:

  • What was the removed code testing?
  • Why was it safe to remove?
  • Is the functionality still tested elsewhere?

4. Missing __all__ Export in storage/graph.py

The storage/__init__.py exports key items with __all__, but storage/graph.py itself doesn't define __all__. While not critical, adding it would make the public API explicit:

__all__ = [
    "GraphStorageType",
    "GraphStorageConfig", 
    "GRAPH_STORAGE",
    "get_staging_key",
    "get_staging_prefix",
    "get_backup_key",
    # ... etc
]

💡 Suggestions

5. Consider Adding Integration Tests

The new graph storage helpers have excellent docstrings with examples, but I don't see dedicated tests for them. Consider adding tests to verify:

  • S3 key construction for all storage types
  • URI building functions
  • Prefix generation with various filter combinations

6. CLAUDE.md Update Needed

The CLAUDE.md file mentions configuration modules but doesn't reference the new storage/ module. Consider adding a note about storage configuration patterns, especially since this is now a substantial module (350+ lines total).

Architecture Review

The refactoring aligns well with the stated goals in CLAUDE.md:

Configuration System (/robosystems/config/): All configuration is centralized and config-as-code

The new structure provides:

  • ✅ Clear separation between shared data sources and graph-specific storage
  • ✅ Centralized S3 path construction
  • ✅ Better scalability for future storage types
  • ✅ Reduced coupling between storage configs and other components

Breaking Changes

The PR correctly notes breaking changes for external imports. However, since this is an internal refactoring and all internal imports have been updated, the impact should be minimal. The old robosystems.config.shared_data module no longer exists, so any external code using it will fail with a clear import error.

Verdict

Approve with Recommendations

This is a solid refactoring that improves code organization. The critical issue (#1) should be addressed to ensure the backup service uses the new centralized path construction helpers. The minor issues are nice-to-haves that would further improve the quality.

Recommendation: Address issue #1 (backup service should use graph storage helpers) before merging. The other issues can be addressed in follow-up work if preferred.


Files Reviewed

  • ✅ robosystems/config/storage/init.py (new)
  • ✅ robosystems/config/storage/graph.py (new, 322 lines)
  • ✅ robosystems/config/storage/shared.py (renamed, unchanged content)
  • ✅ robosystems/operations/lbug/backup.py (imports updated)
  • ✅ robosystems/adapters/sec/processors/ingestion.py (imports updated)
  • ✅ robosystems/dagster/assets/sec.py (imports updated)
  • ✅ robosystems/dagster/sensors/sec.py (imports updated)
  • ✅ robosystems/scripts/sec_pipeline.py (imports updated)
  • ✅ tests/integration/test_subgraph_operations.py (25 lines removed)
  • ✅ tests/operations/lbug/test_backup_service.py (imports updated)

…s for S3 backup operations. Updated S3 prefix and key generation methods to improve consistency and maintainability in backup processes.
@jfrench9 jfrench9 merged commit 4fbd618 into main Jan 4, 2026
6 checks passed
@jfrench9 jfrench9 linked an issue Jan 4, 2026 that may be closed by this pull request
19 tasks
@jfrench9 jfrench9 deleted the refactor/storage-configs branch January 4, 2026 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SEC Shared Repository Production Launch

1 participant