Skip to content

Refactor SEC daily schedule with dynamic quarter scanning#211

Merged
jfrench9 merged 1 commit into
mainfrom
refactor/improve-sec-daily-schedule
Jan 18, 2026
Merged

Refactor SEC daily schedule with dynamic quarter scanning#211
jfrench9 merged 1 commit into
mainfrom
refactor/improve-sec-daily-schedule

Conversation

@jfrench9
Copy link
Copy Markdown
Member

Summary

This PR enhances the SEC data processing system by implementing a more robust daily download schedule with dynamic quarter scanning capabilities. The refactoring improves the efficiency and reliability of SEC data ingestion workflows.

Key Accomplishments

  • Dynamic Quarter Detection: Implemented intelligent quarter scanning logic that automatically determines relevant quarters for data processing
  • Enhanced Daily Schedule: Improved the daily download schedule to better handle SEC data availability patterns and timing
  • Workflow Optimization: Streamlined the SEC job processing pipeline with more efficient data scanning and retrieval mechanisms
  • Code Architecture: Refactored existing SEC processing logic for better maintainability and extensibility

Breaking Changes

None - this is a refactoring effort that maintains existing API compatibility while improving internal implementation.

Testing Notes

  • Verify that daily SEC downloads continue to function as expected
  • Confirm that quarter detection logic correctly identifies relevant time periods
  • Test schedule execution during different market conditions and quarter transitions
  • Validate that existing downstream consumers of SEC data remain unaffected

Infrastructure Considerations

  • The enhanced quarter scanning may result in different processing patterns during quarter boundaries
  • Monitor job execution times as dynamic scanning could impact performance characteristics
  • Consider resource allocation for periods with high SEC filing activity
  • Ensure logging and monitoring capture the new dynamic behavior for operational visibility

🤖 Generated with Claude Code

Branch Info:

  • Source: refactor/improve-sec-daily-schedule
  • Target: main
  • Type: refactor

Co-Authored-By: Claude noreply@anthropic.com

- Added a new function to determine quarters to scan for SEC filings, ensuring late-indexed filings are captured.
- Refactored the SEC daily download schedule to utilize the new quarter scanning logic, improving the scheduling process.
- Enhanced logging to provide visibility into the scheduled quarters for downloads.
@jfrench9
Copy link
Copy Markdown
Member Author

@claude please review this PR

@claude
Copy link
Copy Markdown

claude Bot commented Jan 18, 2026

Claude finished @jfrench9's task in 1m 13s —— View job


PR Review: SEC Daily Schedule Refactoring

Summary

This PR successfully implements dynamic quarter scanning for the SEC daily download schedule. The changes are well-architected, properly documented, and align with RoboSystems patterns. I've identified a few minor suggestions for improvement, but overall this is high-quality work.


✅ Strengths

  1. Smart Quarter Boundary Logic: The _get_quarters_to_scan() function elegantly handles the edge case of late-indexed filings at quarter boundaries by scanning the previous quarter for the first 3 days of a new quarter.

  2. Improved Schedule Pattern: Converting from ScheduleDefinition to the @schedule decorator with generator pattern allows multiple partition runs per schedule tick, which is the idiomatic Dagster approach.

  3. Excellent Documentation: The docstrings clearly explain the business logic (why we scan previous quarter) and technical details (partition key format).

  4. Deduplication via run_key: Using run_key with date suffix prevents duplicate runs for the same quarter on the same day.


💡 Suggestions for Improvement

1. Edge Case: Year Boundary Quarter Calculation

Location: robosystems/dagster/jobs/sec.py:147

The quarter calculation logic looks correct, but consider adding a unit test to verify year boundary scenarios:

  • January 1st (Q1 start) → should scan 2024-Q4
  • January 4th (after 3-day window) → should only scan 2025-Q1

Suggested test cases:

# Test Q1 start (should include previous year Q4)
assert _get_quarters_to_scan(datetime(2025, 1, 1, tzinfo=UTC)) == ["2025-Q1", "2024-Q4"]
assert _get_quarters_to_scan(datetime(2025, 1, 3, tzinfo=UTC)) == ["2025-Q1", "2024-Q4"]

# Test after 3-day window
assert _get_quarters_to_scan(datetime(2025, 1, 4, tzinfo=UTC)) == ["2025-Q1"]

# Test other quarters
assert _get_quarters_to_scan(datetime(2025, 4, 1, tzinfo=UTC)) == ["2025-Q2", "2025-Q1"]

2. Configuration for 3-Day Window

Location: robosystems/dagster/jobs/sec.py:157

The 3-day window (now.day <= 3) is hardcoded. Consider making this configurable via environment variable for flexibility:

# In robosystems/config/env.py
SEC_QUARTER_OVERLAP_DAYS: int = Field(default=3, ge=0, le=7)

# In sec.py
if now.month == quarter_start_month and now.day <= env.SEC_QUARTER_OVERLAP_DAYS:

This would allow operators to tune the overlap window based on observed EFTS indexing delays without code changes.

3. Logging Enhancement

Location: robosystems/dagster/jobs/sec.py:178

The current log message is good, but consider adding more context for operational visibility:

context.log.info(
    f"Scheduling SEC download for quarters: {quarters} "
    f"(current_date={now.date()}, is_quarter_start={now.month == quarter_start_month and now.day <= 3})"
)

This makes it immediately clear in logs why we're scanning multiple quarters.

4. Type Hint for Context Parameter

Location: robosystems/dagster/jobs/sec.py:171

Add type hint for better IDE support and clarity:

from dagster import ScheduleEvaluationContext

def sec_daily_download_schedule(context: ScheduleEvaluationContext):

🔍 Code Quality Assessment

Architecture: ✅ Excellent

  • Clean separation of concerns (_get_quarters_to_scan is pure function)
  • Follows Dagster best practices (generator pattern for multi-partition schedules)
  • Maintains backward compatibility

Security: ✅ No issues

  • No user input processing
  • No SQL injection vectors
  • No secret exposure

Performance: ✅ Good

  • Minimal computation (date math only)
  • skip_existing=True prevents redundant downloads
  • Generator pattern prevents loading all quarters into memory

Testing: ⚠️ No tests included

Maintainability: ✅ Excellent

  • Clear variable names and logic flow
  • Good comments explaining business rationale
  • Self-documenting code structure

📋 Verification Checklist

Before merging, verify:

  • Test the schedule behavior on quarter boundaries (e.g., Jan 1-3, Apr 1-3)
  • Confirm logs show expected quarter scanning behavior
  • Validate that skip_existing=True prevents duplicate downloads when re-running same partition
  • Monitor first production run to ensure no unexpected EFTS API errors

🎯 Conclusion

This is a well-executed refactoring that delivers on the PR's promises. The dynamic quarter scanning is implemented correctly with appropriate safeguards. The suggestions above are minor enhancements, not blockers.

Recommendation: ✅ Approve with minor suggestions

The code is production-ready as-is. Consider addressing the testing and configurability suggestions in a follow-up PR if desired.


@jfrench9 jfrench9 merged commit 9f214f9 into main Jan 18, 2026
7 checks passed
@jfrench9 jfrench9 deleted the refactor/improve-sec-daily-schedule branch January 20, 2026 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant