Fix Dagster deployment configuration and streamline deployment scripts#94
Conversation
- Added `run_job_cpu` and `run_job_memory` inputs to the deployment workflows for better resource management in Dagster run jobs. - Updated the CloudFormation template to reference the task definition ARN for the run job. - Modified the tunnels script to reflect the correct port for the Dagster webserver tunnel.
|
@claude please review this PR |
|
Claude finished @jfrench9's task in 1m 29s —— View job Code Review CompleteTasks
SummaryThis PR successfully addresses critical Dagster deployment configuration issues and significantly reduces maintenance overhead by removing ~400 lines of obsolete bastion operation scripts. The changes are well-structured and improve the deployment pipeline. Key Findings✅ Strengths
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| echo 'alias bastion="sudo /usr/local/bin/run-bastion-operation.sh"' >> /home/ec2-user/.bashrc | ||
|
|
||
| # Create log directory for infrastructure operations | ||
| # Create log directory for operations |
There was a problem hiding this comment.
Restore bastion operation entrypoint for SSM admin commands
This userdata no longer installs /usr/local/bin/run-bastion-operation.sh, but the admin CLI still invokes that path via SSM for non-dev SEC commands (see robosystems/admin/cli.py:1488-1507). In staging/prod, admin sec load/health will now fail with “file not found” on the bastion, so those operational workflows break unless another script provides the same entrypoint.
Useful? React with 👍 / 👎.
- Deleted the GHA Runner bootstrap workflow file, user data script, and CloudFormation template, streamlining the CI infrastructure. - Updated the package scripts to reflect the removal of the GHA Runner user data script from the upload list.
…lopment instructions - Refactored `sec_load` and `sec_health` commands to clarify that SEC operations in staging/prod are managed by Dagster, directing users to the Dagster UI. - Updated local development instructions to reflect the correct port for accessing the Dagster webserver. - Removed legacy command handling for non-dev environments, streamlining the CLI experience.
|
@claude can you review the latest changes and do one final review |
|
Claude finished @jfrench9's task in 2m 51s —— View job Final PR ReviewTasks
SummaryThis PR successfully addresses deployment configuration issues and significantly reduces maintenance overhead by removing ~1,700 lines of obsolete code. The changes improve the deployment pipeline while maintaining backward compatibility. Status of Previous Review Issues✅ Fixed Issues
🔍 New Findings1. Network Configuration for ECS Run Launcher (IMPORTANT - Need Clarification)Location: dagster_home/dagster_prod.yaml:58-60, cloudformation/dagster.yaml:477-482 Current State: The comment in # Note: Network configuration (subnets, security groups) is inherited from the
# ECS cluster configuration. The EcsRunLauncher will use the same network
# settings as the daemon/webserver services.Analysis:
Question: Is Dagster's EcsRunLauncher smart enough to automatically copy the network configuration from the daemon service, or does this need to be explicitly configured? Recommendation: Either:
Impact: If network config is missing and not inherited, run jobs will fail to launch with network configuration errors. 2. Bastion Operation Script Removal (CONFIRMED ISSUE from Codex)Location: bin/userdata/bastion.sh:392, robosystems/admin/cli.py:1474-1507 Issue: The Codex review correctly identified that:
Resolution: This is properly handled. The PR author correctly:
This is the right architectural decision - SEC operations should be managed through Dagster in production environments. 📋 Additional Observations✅ Good Improvements
🔍 Minor Observations
🎯 Testing RecommendationsBefore deploying to production:
🔒 Security Review
📝 Final RecommendationStatus: ✅ Ready to merge with one verification Action Required:
Rationale:
Overall Assessment: This is an excellent cleanup PR that addresses the original deployment issues while significantly improving the codebase. The removal of 1,700+ lines of obsolete code is a major win for maintainability. The only uncertainty is the ECS network configuration inheritance behavior, which should be verified in staging before production deployment. |
…tfile - Removed deprecated GitHub Actions Runner bootstrap command from the justfile. - Enhanced the runner availability action to support organization and repository-level checks, improving compatibility for forks. - Updated workflow files to utilize dynamic runner labels and scope variables for better configuration management. - Added CPU and memory settings for Dagster jobs in the setup script, optimizing resource allocation for production and staging environments.
- Enhanced the runner availability action to default to "github-hosted" if no labels are specified, improving usability. - Updated multiple workflow files to ensure they use "github-hosted" as the default runner label. - Added comprehensive Dagster configuration settings for webserver and run worker, optimizing resource allocation for production and staging environments.
|
@claude can you review the latest changes |
|
Claude finished @jfrench9's task in 2m 38s —— View job Final PR ReviewTasks
SummaryThis PR successfully addresses deployment configuration issues and significantly reduces maintenance overhead by removing ~1,900 lines of obsolete code. The changes improve the deployment pipeline while maintaining backward compatibility. Status of Previous Review Issues✅ Fixed Issues
🔍 New Findings1. Network Configuration for ECS Run Launcher (IMPORTANT - Need Clarification)Location: dagster_home/dagster_prod.yaml:58-60, cloudformation/dagster.yaml:477-482 Current State: The comment in # Note: Network configuration (subnets, security groups) is inherited from the
# ECS cluster configuration. The EcsRunLauncher will use the same network
# settings as the daemon/webserver services.Analysis:
Question: Is Dagster's EcsRunLauncher smart enough to automatically copy the network configuration from the daemon service's task, or does this need to be explicitly configured? Recommendation: Either:
Impact: If network config is missing and not inherited, run jobs will fail to launch with network configuration errors. 2. Bastion Operation Script Removal (RESOLVED - Properly Handled)Location: bin/userdata/bastion.sh:392, robosystems/admin/cli.py:1474-1507 Codex Review Finding: The Codex review flagged that Resolution: This is properly handled. The PR author correctly:
This is the right architectural decision - SEC operations should be managed through Dagster in production environments, not via bastion scripts. 📋 Additional Observations✅ Good Improvements
🎯 Testing RecommendationsBefore deploying to production:
🔒 Security Review
📝 Final RecommendationStatus: ✅ Ready to merge with one verification Action Required:
Rationale:
Overall Assessment: This is an excellent cleanup PR that addresses the original deployment issues while significantly improving the codebase. The removal of 1,900+ lines of obsolete code is a major win for maintainability. The Codex finding about bastion operations was actually a non-issue - the admin CLI properly redirects users to Dagster. The only uncertainty is the ECS network configuration inheritance behavior, which should be verified in staging before production deployment. |
Summary
This PR addresses critical issues in the Dagster deployment pipeline by enhancing the deployment configuration with proper run job parameters and significantly streamlining deployment-related scripts.
Key Changes
Key Accomplishments
Breaking Changes
None. This is a bug fix that maintains backward compatibility while improving deployment reliability.
Testing Notes
Infrastructure Considerations
🤖 Generated with Claude Code
Branch Info:
bugfix/dagster-deploy-fixesmainCo-Authored-By: Claude noreply@anthropic.com