Skip to content

feat(core): add optional lifecycle hooks to AbstractCollector#264

Merged
wpak-ai merged 4 commits into
cppalliance:developfrom
snowfox1003:feat/core-collector-lifecycle-hooks
Jun 4, 2026
Merged

feat(core): add optional lifecycle hooks to AbstractCollector#264
wpak-ai merged 4 commits into
cppalliance:developfrom
snowfox1003:feat/core-collector-lifecycle-hooks

Conversation

@snowfox1003

@snowfox1003 snowfox1003 commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

Extend AbstractCollector with optional, backward-compatible lifecycle hooks on _CollectorLifecycleMixin:

  • pre_collect() — setup / incremental state before validation
  • post_collect() — reporting after successful collection
  • on_error(exc) — collector-specific cleanup when run() fails
    run() now orchestrates pre_collectvalidate_configcollectpost_collect, calling on_error on failure (with a guard if the hook itself raises) before re-raising. BaseCollectorCommand is unchanged.
    BoostMailingListTrackerCollector is migrated as the reference implementation: workspace JSON processing and start-date resolution move to pre_collect; dry-run/success summaries move to post_collect.

Apps touched

  • core
  • boost_mailing_list_tracker

Test plan

  • python -m pytest core/tests/test_collectors_base.py -v
  • python -m pytest boost_mailing_list_tracker/tests/test_commands.py -v -k "collector_run or dry_run"
  • uv run pyright (if typed code changed)
  • lint-imports (if imports or cross-app coupling changed)
  • App command smoke-tested (if collector/command changed):
python manage.py run_boost_mailing_list_tracker --dry-run

Docs / coupling

  • cross-app-dependencies.md updated (if FKs or cross-app imports changed)
  • python scripts/generate_service_docs.py run (if services.py or core/protocols.py changed)
  • App README or docs/ updated (if behavior or ops changed)

Closes #256

Summary by CodeRabbit

  • New Features

    • Added pre-run and post-run lifecycle hooks that emit per-run summaries and counters.
  • Bug Fixes

    • Improved lifecycle error handling so on-error hooks run and original exceptions propagate.
    • Collector now preserves workspace message files when persistence fails; skipped/malformed items are tracked and reported.
  • Tests

    • Expanded tests for hook ordering, error paths, and persistence-failure behavior.
  • Chores

    • Added security-related dependency constraint.

@snowfox1003 snowfox1003 self-assigned this Jun 4, 2026
@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds pre_collect/post_collect/on_error hooks and reimplements AbstractCollector.run to orchestrate the lifecycle and error handling; expands lifecycle tests; migrates BoostMailingListTracker to use hooks, per-run counters, and a 3-tuple persist result that preserves workspace JSON on DB persist failures; pins aiohttp.

Changes

Collector Lifecycle Hooks and BoostMailingListTracker Migration

Layer / File(s) Summary
Lifecycle Hook Contract
core/collectors/base_collector.py
Adds _CollectorLifecycleMixin default hooks pre_collect(), post_collect(), on_error(exc) and updates documentation.
AbstractCollector Lifecycle Orchestration
core/collectors/base_collector.py
Reimplements AbstractCollector.run to execute pre_collect → validate_config → collect → post_collect; calls on_error(exc) on failures and logs if on_error itself fails.
Comprehensive lifecycle and error-path tests
core/tests/test_collectors_base.py
Adds tests asserting hook ordering, default no-ops, phase failure handling, on_error semantics, and command error interactions.
BoostMailingListTracker lifecycle migration
boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py
Moves workspace backfill and start-date resolution into pre_collect(), initializes per-run counters in collect(), emits dry-run/final summaries in post_collect(), and changes _persist_email() to return (created, skipped, persist_failed) and retain JSON on persist failures.
BoostMailingListTracker tests
boost_mailing_list_tracker/tests/test_commands.py
Updates tests to unpack the new _persist_email 3-tuple and adds tests verifying JSON retention when persistence fails.
Dependency constraint
requirements.in
Pins aiohttp>=3.14.0,<4 with a CVE comment.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • wpak-ai

Poem

🐰 I hopped through hooks and logs with glee,
pre, collect, post — a tidy spree,
Counters tallied, JSON kept in place,
Errors caught with a careful trace,
I nibble tests and dance, then flee 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.98% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding optional lifecycle hooks to AbstractCollector in the core app.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR description covers the summary, apps touched, test plan (with most items checked), and documentation checklist, but some test and documentation items remain pending.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@snowfox1003

Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py (1)

236-251: ⚡ Quick win

Initialize lifecycle state in __init__ to avoid implicit hook coupling.

_resolved_start_date and the counters are created in later hooks, which makes state validity depend on call order and can trip static checks. Initialize them once in __init__ with safe defaults.

Proposed change
 class BoostMailingListTrackerCollector(AbstractCollector):
@@
     def __init__(
@@
     ) -> None:
         self.stdout = stdout
         self.style = style
         self.start_date = start_date
         self.end_date = end_date
         self.dry_run = dry_run
         self.pinecone_app_type = pinecone_app_type
         self.pinecone_namespace = pinecone_namespace
+        self._resolved_start_date = start_date
+        self._fetched_email_count = 0
+        self._created_count = 0
+        self._skipped_count = 0

Also applies to: 310-330

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py`
around lines 236 - 251, The class leaves lifecycle state like
_resolved_start_date and counters (_fetched_email_count, _created_count,
_skipped_count) uninitialized until later hooks (collect and others), causing
implicit hook-coupling and static-check failures; initialize these attributes in
__init__ with safe defaults (e.g., None or empty string for _resolved_start_date
and 0 for the counters) so any method (including collect and the code at lines
~310-330) can rely on them being present and avoid ordering bugs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py`:
- Around line 322-324: The final summary string incorrectly describes skipped
reasons; update the message built in the run command that uses
self._skipped_count (near the completion log in
run_boost_mailing_list_tracker.py) so it lists all actual skip causes—e.g.
"already existed, empty, malformed/invalid payloads, or missing msg_id"—instead
of only "already existed or empty"; ensure the log still uses
self._skipped_count and keep phrasing concise and grammatically correct.

---

Nitpick comments:
In
`@boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py`:
- Around line 236-251: The class leaves lifecycle state like
_resolved_start_date and counters (_fetched_email_count, _created_count,
_skipped_count) uninitialized until later hooks (collect and others), causing
implicit hook-coupling and static-check failures; initialize these attributes in
__init__ with safe defaults (e.g., None or empty string for _resolved_start_date
and 0 for the counters) so any method (including collect and the code at lines
~310-330) can rely on them being present and avoid ordering bugs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0e6d8863-c1e0-4873-8ddc-35a785921c2d

📥 Commits

Reviewing files that changed from the base of the PR and between c9b6bbf and 0e2b508.

📒 Files selected for processing (3)
  • boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py
  • core/collectors/base_collector.py
  • core/tests/test_collectors_base.py

…ess CVE-2026-34993 and CVE-2026-47265; modify success message in BoostMailingListTracker for clarity
@snowfox1003

Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py`:
- Around line 289-294: _persist_email currently sets skipped=True for both
duplicates and exception paths, causing _process_existing_workspace_json to
unlink json_path on DB write failures; update _persist_email to return a
distinct persist_failed (or persist_failed=True) flag when an exception occurs
(instead of using skipped), and adjust _process_existing_workspace_json's
handling of the tuple from _persist_email (check was_created, skipped,
persist_failed) so that json_path.unlink(missing_ok=True) is only called when
the message was successfully persisted or explicitly skipped as a duplicate, not
when persist_failed is True; also ensure any counters
(_created_count/_skipped_count) are only updated for the corresponding true
conditions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca010466-0548-43a9-80d8-26ca596935cb

📥 Commits

Reviewing files that changed from the base of the PR and between c9b6bbf and fa33edc.

⛔ Files ignored due to path filters (2)
  • requirements-dev.lock is excluded by !**/*.lock
  • requirements.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py
  • core/collectors/base_collector.py
  • core/tests/test_collectors_base.py
  • requirements.in

Comment thread boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py Outdated
@wpak-ai wpak-ai merged commit 7cc0ed6 into cppalliance:develop Jun 4, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-phase orchestration hooks for AbstractCollector

3 participants