feat(core): add optional lifecycle hooks to AbstractCollector by snowfox1003 · Pull Request #264 · cppalliance/boost-data-collector

snowfox1003 · 2026-06-04T15:15:24Z

Summary

Extend AbstractCollector with optional, backward-compatible lifecycle hooks on _CollectorLifecycleMixin:

pre_collect() — setup / incremental state before validation
post_collect() — reporting after successful collection
on_error(exc) — collector-specific cleanup when run() fails
run() now orchestrates pre_collect → validate_config → collect → post_collect, calling on_error on failure (with a guard if the hook itself raises) before re-raising. BaseCollectorCommand is unchanged.
BoostMailingListTrackerCollector is migrated as the reference implementation: workspace JSON processing and start-date resolution move to pre_collect; dry-run/success summaries move to post_collect.

Apps touched

core
boost_mailing_list_tracker

Test plan

python -m pytest core/tests/test_collectors_base.py -v
python -m pytest boost_mailing_list_tracker/tests/test_commands.py -v -k "collector_run or dry_run"
uv run pyright (if typed code changed)
lint-imports (if imports or cross-app coupling changed)
App command smoke-tested (if collector/command changed):

python manage.py run_boost_mailing_list_tracker --dry-run

Docs / coupling

cross-app-dependencies.md updated (if FKs or cross-app imports changed)
python scripts/generate_service_docs.py run (if services.py or core/protocols.py changed)
App README or docs/ updated (if behavior or ops changed)

Closes #256

Summary by CodeRabbit

New Features
- Added pre-run and post-run lifecycle hooks that emit per-run summaries and counters.
Bug Fixes
- Improved lifecycle error handling so on-error hooks run and original exceptions propagate.
- Collector now preserves workspace message files when persistence fails; skipped/malformed items are tracked and reported.
Tests
- Expanded tests for hook ordering, error paths, and persistence-failure behavior.
Chores
- Added security-related dependency constraint.

coderabbitai · 2026-06-04T15:15:37Z

📝 Walkthrough

Walkthrough

Adds pre_collect/post_collect/on_error hooks and reimplements AbstractCollector.run to orchestrate the lifecycle and error handling; expands lifecycle tests; migrates BoostMailingListTracker to use hooks, per-run counters, and a 3-tuple persist result that preserves workspace JSON on DB persist failures; pins aiohttp.

Changes

Collector Lifecycle Hooks and BoostMailingListTracker Migration

Layer / File(s)	Summary
Lifecycle Hook Contract `core/collectors/base_collector.py`	Adds `_CollectorLifecycleMixin` default hooks `pre_collect()`, `post_collect()`, `on_error(exc)` and updates documentation.
AbstractCollector Lifecycle Orchestration `core/collectors/base_collector.py`	Reimplements `AbstractCollector.run` to execute `pre_collect → validate_config → collect → post_collect`; calls `on_error(exc)` on failures and logs if `on_error` itself fails.
Comprehensive lifecycle and error-path tests `core/tests/test_collectors_base.py`	Adds tests asserting hook ordering, default no-ops, phase failure handling, `on_error` semantics, and command error interactions.
BoostMailingListTracker lifecycle migration `boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py`	Moves workspace backfill and start-date resolution into `pre_collect()`, initializes per-run counters in `collect()`, emits dry-run/final summaries in `post_collect()`, and changes `_persist_email()` to return `(created, skipped, persist_failed)` and retain JSON on persist failures.
BoostMailingListTracker tests `boost_mailing_list_tracker/tests/test_commands.py`	Updates tests to unpack the new `_persist_email` 3-tuple and adds tests verifying JSON retention when persistence fails.
Dependency constraint `requirements.in`	Pins `aiohttp>=3.14.0,<4` with a CVE comment.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

cppalliance/boost-data-collector#169: Tests and behavior updates touching _persist_email and workspace JSON retention, closely related to this change.

Suggested reviewers

wpak-ai

Poem

🐰 I hopped through hooks and logs with glee,
pre, collect, post — a tidy spree,
Counters tallied, JSON kept in place,
Errors caught with a careful trace,
I nibble tests and dance, then flee 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.98% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adding optional lifecycle hooks to AbstractCollector in the core app.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	PR description covers the summary, apps touched, test plan (with most items checked), and documentation checklist, but some test and documentation items remain pending.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ection processes

snowfox1003 · 2026-06-04T15:23:10Z

@coderabbitai full review

coderabbitai · 2026-06-04T15:23:16Z

✅ Action performed

Full review finished.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py (1)

236-251: ⚡ Quick win

Initialize lifecycle state in __init__ to avoid implicit hook coupling.

_resolved_start_date and the counters are created in later hooks, which makes state validity depend on call order and can trip static checks. Initialize them once in __init__ with safe defaults.

Proposed change

 class BoostMailingListTrackerCollector(AbstractCollector):
@@
     def __init__(
@@
     ) -> None:
         self.stdout = stdout
         self.style = style
         self.start_date = start_date
         self.end_date = end_date
         self.dry_run = dry_run
         self.pinecone_app_type = pinecone_app_type
         self.pinecone_namespace = pinecone_namespace
+        self._resolved_start_date = start_date
+        self._fetched_email_count = 0
+        self._created_count = 0
+        self._skipped_count = 0

Also applies to: 310-330

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py`
around lines 236 - 251, The class leaves lifecycle state like
_resolved_start_date and counters (_fetched_email_count, _created_count,
_skipped_count) uninitialized until later hooks (collect and others), causing
implicit hook-coupling and static-check failures; initialize these attributes in
__init__ with safe defaults (e.g., None or empty string for _resolved_start_date
and 0 for the counters) so any method (including collect and the code at lines
~310-330) can rely on them being present and avoid ordering bugs.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py`:
- Around line 322-324: The final summary string incorrectly describes skipped
reasons; update the message built in the run command that uses
self._skipped_count (near the completion log in
run_boost_mailing_list_tracker.py) so it lists all actual skip causes—e.g.
"already existed, empty, malformed/invalid payloads, or missing msg_id"—instead
of only "already existed or empty"; ensure the log still uses
self._skipped_count and keep phrasing concise and grammatically correct.

---

Nitpick comments:
In
`@boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py`:
- Around line 236-251: The class leaves lifecycle state like
_resolved_start_date and counters (_fetched_email_count, _created_count,
_skipped_count) uninitialized until later hooks (collect and others), causing
implicit hook-coupling and static-check failures; initialize these attributes in
__init__ with safe defaults (e.g., None or empty string for _resolved_start_date
and 0 for the counters) so any method (including collect and the code at lines
~310-330) can rely on them being present and avoid ordering bugs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0e6d8863-c1e0-4873-8ddc-35a785921c2d

📥 Commits

Reviewing files that changed from the base of the PR and between c9b6bbf and 0e2b508.

📒 Files selected for processing (3)

boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py
core/collectors/base_collector.py
core/tests/test_collectors_base.py

…ess CVE-2026-34993 and CVE-2026-47265; modify success message in BoostMailingListTracker for clarity

snowfox1003 · 2026-06-04T15:50:33Z

@coderabbitai full review

coderabbitai · 2026-06-04T15:50:39Z

✅ Action performed

Full review finished.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py`:
- Around line 289-294: _persist_email currently sets skipped=True for both
duplicates and exception paths, causing _process_existing_workspace_json to
unlink json_path on DB write failures; update _persist_email to return a
distinct persist_failed (or persist_failed=True) flag when an exception occurs
(instead of using skipped), and adjust _process_existing_workspace_json's
handling of the tuple from _persist_email (check was_created, skipped,
persist_failed) so that json_path.unlink(missing_ok=True) is only called when
the message was successfully persisted or explicitly skipped as a duplicate, not
when persist_failed is True; also ensure any counters
(_created_count/_skipped_count) are only updated for the corresponding true
conditions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca010466-0548-43a9-80d8-26ca596935cb

📥 Commits

Reviewing files that changed from the base of the PR and between c9b6bbf and fa33edc.

⛔ Files ignored due to path filters (2)

requirements-dev.lock is excluded by !**/*.lock
requirements.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py
core/collectors/base_collector.py
core/tests/test_collectors_base.py
requirements.in

…rsist_failed status and update related logic

snowfox1003 self-assigned this Jun 4, 2026

snowfox1003 requested review from jonathanMLDev and wpak-ai as code owners June 4, 2026 15:15

refactor(collector): implement lifecycle hooks for pre- and post-coll…

0e2b508

…ection processes

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py

chore: update aiohttp to version 3.14.0 in requirements files to addr…

fa33edc

…ess CVE-2026-34993 and CVE-2026-47265; modify success message in BoostMailingListTracker for clarity

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py Outdated

snowfox1003 and others added 2 commits June 4, 2026 12:07

refactor(persist_email): enhance _persist_email function to return pe…

5dd0b0d

…rsist_failed status and update related logic

Merge branch 'develop' into feat/core-collector-lifecycle-hooks

14756a7

jonathanMLDev approved these changes Jun 4, 2026

View reviewed changes

Comment thread boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py

wpak-ai approved these changes Jun 4, 2026

View reviewed changes

wpak-ai merged commit 7cc0ed6 into cppalliance:develop Jun 4, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): add optional lifecycle hooks to AbstractCollector#264

feat(core): add optional lifecycle hooks to AbstractCollector#264
wpak-ai merged 4 commits into
cppalliance:developfrom
snowfox1003:feat/core-collector-lifecycle-hooks

snowfox1003 commented Jun 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

snowfox1003 commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

snowfox1003 commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

snowfox1003 commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Apps touched

Test plan

Docs / coupling

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

snowfox1003 commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

snowfox1003 commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

snowfox1003 commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading