feat(core): add optional lifecycle hooks to AbstractCollector#264
Conversation
📝 WalkthroughWalkthroughAdds pre_collect/post_collect/on_error hooks and reimplements AbstractCollector.run to orchestrate the lifecycle and error handling; expands lifecycle tests; migrates BoostMailingListTracker to use hooks, per-run counters, and a 3-tuple persist result that preserves workspace JSON on DB persist failures; pins aiohttp. ChangesCollector Lifecycle Hooks and BoostMailingListTracker Migration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py (1)
236-251: ⚡ Quick winInitialize lifecycle state in
__init__to avoid implicit hook coupling.
_resolved_start_dateand the counters are created in later hooks, which makes state validity depend on call order and can trip static checks. Initialize them once in__init__with safe defaults.Proposed change
class BoostMailingListTrackerCollector(AbstractCollector): @@ def __init__( @@ ) -> None: self.stdout = stdout self.style = style self.start_date = start_date self.end_date = end_date self.dry_run = dry_run self.pinecone_app_type = pinecone_app_type self.pinecone_namespace = pinecone_namespace + self._resolved_start_date = start_date + self._fetched_email_count = 0 + self._created_count = 0 + self._skipped_count = 0Also applies to: 310-330
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py` around lines 236 - 251, The class leaves lifecycle state like _resolved_start_date and counters (_fetched_email_count, _created_count, _skipped_count) uninitialized until later hooks (collect and others), causing implicit hook-coupling and static-check failures; initialize these attributes in __init__ with safe defaults (e.g., None or empty string for _resolved_start_date and 0 for the counters) so any method (including collect and the code at lines ~310-330) can rely on them being present and avoid ordering bugs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py`:
- Around line 322-324: The final summary string incorrectly describes skipped
reasons; update the message built in the run command that uses
self._skipped_count (near the completion log in
run_boost_mailing_list_tracker.py) so it lists all actual skip causes—e.g.
"already existed, empty, malformed/invalid payloads, or missing msg_id"—instead
of only "already existed or empty"; ensure the log still uses
self._skipped_count and keep phrasing concise and grammatically correct.
---
Nitpick comments:
In
`@boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py`:
- Around line 236-251: The class leaves lifecycle state like
_resolved_start_date and counters (_fetched_email_count, _created_count,
_skipped_count) uninitialized until later hooks (collect and others), causing
implicit hook-coupling and static-check failures; initialize these attributes in
__init__ with safe defaults (e.g., None or empty string for _resolved_start_date
and 0 for the counters) so any method (including collect and the code at lines
~310-330) can rely on them being present and avoid ordering bugs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0e6d8863-c1e0-4873-8ddc-35a785921c2d
📒 Files selected for processing (3)
boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.pycore/collectors/base_collector.pycore/tests/test_collectors_base.py
…ess CVE-2026-34993 and CVE-2026-47265; modify success message in BoostMailingListTracker for clarity
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py`:
- Around line 289-294: _persist_email currently sets skipped=True for both
duplicates and exception paths, causing _process_existing_workspace_json to
unlink json_path on DB write failures; update _persist_email to return a
distinct persist_failed (or persist_failed=True) flag when an exception occurs
(instead of using skipped), and adjust _process_existing_workspace_json's
handling of the tuple from _persist_email (check was_created, skipped,
persist_failed) so that json_path.unlink(missing_ok=True) is only called when
the message was successfully persisted or explicitly skipped as a duplicate, not
when persist_failed is True; also ensure any counters
(_created_count/_skipped_count) are only updated for the corresponding true
conditions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ca010466-0548-43a9-80d8-26ca596935cb
⛔ Files ignored due to path filters (2)
requirements-dev.lockis excluded by!**/*.lockrequirements.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.pycore/collectors/base_collector.pycore/tests/test_collectors_base.pyrequirements.in
…rsist_failed status and update related logic
Summary
Extend
AbstractCollectorwith optional, backward-compatible lifecycle hooks on_CollectorLifecycleMixin:pre_collect()— setup / incremental state before validationpost_collect()— reporting after successful collectionon_error(exc)— collector-specific cleanup whenrun()failsrun()now orchestratespre_collect→validate_config→collect→post_collect, callingon_erroron failure (with a guard if the hook itself raises) before re-raising.BaseCollectorCommandis unchanged.BoostMailingListTrackerCollectoris migrated as the reference implementation: workspace JSON processing and start-date resolution move topre_collect; dry-run/success summaries move topost_collect.Apps touched
Test plan
python -m pytest core/tests/test_collectors_base.py -vpython -m pytest boost_mailing_list_tracker/tests/test_commands.py -v -k "collector_run or dry_run"uv run pyright(if typed code changed)lint-imports(if imports or cross-app coupling changed)Docs / coupling
python scripts/generate_service_docs.pyrun (ifservices.pyorcore/protocols.pychanged)docs/updated (if behavior or ops changed)Closes #256
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores