feat(core): require TrackerResult from collectors and migrate in-repo trackers#279
Conversation
…ibrariesResult DTOs
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughCollectors and commands now return protocol-conformant TrackerResult (and optional IncrementalState) objects; core lifecycle enforces result validation, backfills duration, persists incremental state, app-specific frozen DTOs added/expanded, tests updated for runtime conformance, and docs/scaffold adjusted. ChangesCore Protocol & Framework Integration
Tracker App Protocol Implementations
Test Infrastructure & Documentation
Estimated code review effort: 🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
…tor_protocol_conformance.py
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
discord_activity_tracker/management/commands/run_discord_activity_tracker.py (1)
170-180:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
channelscount is wrong in full-sync mode.When no
--channelsallowlist is set, the command can process many channels but returns"channels": 0. This makesTrackerResult.countsinaccurate.Suggested fix
def task_discord_sync( @@ -) -> int: +) -> tuple[int, int]: @@ - processed_total = 0 + processed_total = 0 + processed_channels = 0 @@ count = asyncio.run( collector._persist_channel(guild_info, channel_info, messages) ) processed_total += count + processed_channels += 1 @@ - return processed_total + return processed_total, processed_channels @@ - messages_synced = task_discord_sync( + messages_synced, channels_synced = task_discord_sync( @@ counts={ "messages": messages_synced, - "channels": ( - len(collector.channel_ids) if collector.channel_ids else 0 - ), + "channels": channels_synced, }, )Also applies to: 224-277, 641-672
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@discord_activity_tracker/management/commands/run_discord_activity_tracker.py` around lines 170 - 180, The reported bug is that the "channels" count in the result is set to 0 when no --channels allowlist is provided (full-sync); inside task_discord_sync you must compute and return the actual number of channels processed rather than using len(channel_ids) which is 0 for full-sync. Fix by deriving the count from the actual channels iterated/processed (e.g. use the collector's resolved channel list or the processed_channels collection produced by collector.run) and assign that value to the "channels" field returned in TrackerResult.counts; update the same logic in the other affected blocks (the other task_discord_sync usages at the noted ranges) so they all use the real processed channel list instead of channel_ids.discord_activity_tracker/management/commands/backfill_discord_activity_tracker.py (1)
136-150:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate per-file failures into
TrackerResultinstead of always returning success.
collect()catches file-level exceptions, but the final DTO still reportssuccess=Truewith noerrors. That misreports failed runs as successful and weakens the new structured outcome contract.💡 Suggested fix
@@ - processed_total = 0 + processed_total = 0 + failures: list[str] = [] @@ except Exception as exc: rel = _json_display_path(import_dir, json_path) logger.error("Failed to process %s: %s", rel, exc) self.stdout.write(self.style.ERROR(f" Failed {rel}: {exc}")) + failures.append(f"{rel}: {exc}") @@ - return DiscordCollectionTrackerResult( - success=True, - counts={"messages": processed_total, "files": len(json_files)}, - ) + return DiscordCollectionTrackerResult( + success=not failures, + counts={ + "messages": processed_total, + "files": len(json_files), + "failed_files": len(failures), + }, + errors=tuple(failures), + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@discord_activity_tracker/management/commands/backfill_discord_activity_tracker.py` around lines 136 - 150, The current collect() implementation logs per-file exceptions but always returns DiscordCollectionTrackerResult(success=True, counts=...), which hides failures; modify collect() to accumulate per-file errors (e.g., create an errors = [] list and append a structured entry containing rel (from _json_display_path(import_dir, json_path)) and exc inside the except block where logger.error and stdout.write are called), then set success = len(errors) == 0 and include errors in the returned DiscordCollectionTrackerResult (e.g., DiscordCollectionTrackerResult(success=success, counts={"messages": processed_total, "files": len(json_files)}, errors=errors)); keep existing logging/stdout behavior but ensure the DTO reflects any failures.
🧹 Nitpick comments (1)
cppa_slack_tracker/management/commands/run_cppa_slack_tracker.py (1)
219-234: 💤 Low valueMissing error count tracking for channel memberships.
sync_channel_usersreturns(success_count, error_count)(line 223), but unlike_sync_users,_sync_channels, and_sync_messages, theerror_countis not accumulated intoself._counts. This results in inconsistent error tracking in the finalSlackTrackerResult.Suggested fix
self._counts["channel_memberships"] = ( self._counts.get("channel_memberships", 0) + success_count ) + self._counts["channel_membership_errors"] = ( + self._counts.get("channel_membership_errors", 0) + error_count + ) logger.info(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cppa_slack_tracker/management/commands/run_cppa_slack_tracker.py` around lines 219 - 234, The _sync_channel_users method is not accumulating error_count into self._counts for "channel_memberships"; update _sync_channel_users (which calls sync_channel_users) to increment self._counts["channel_memberships_errors"] (or follow the existing error key pattern used elsewhere) by error_count after the call, mirroring how success_count is added to self._counts so the final SlackTrackerResult includes channel membership errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@boost_library_docs_tracker/protocol_impl.py`:
- Around line 13-16: LibraryDocsTrackerResult is frozen but its counts field is
a plain dict created in from_run, allowing mutation; update the from_run
constructor to defensively copy counts and wrap them in an immutable mapping
(e.g., types.MappingProxyType(dict(copy)) or an equivalent immutable/frozendict)
before assigning to the counts field so the frozen dataclass is deep-immutable
for that attribute, and keep the declared type Mapping[str,int] unchanged;
reference LibraryDocsTrackerResult and its from_run factory to locate and modify
the creation/assignment of counts.
In `@boost_library_tracker/protocol_impl.py`:
- Around line 13-17: CollectBoostLibrariesResult is frozen but still accepts
mutable dicts for counts via from_totals() and empty(), allowing external
mutation; fix by making counts truly immutable by wrapping any dict passed into
counts with types.MappingProxyType. Update from_totals() and empty() to import
types.MappingProxyType and construct counts as MappingProxyType(totals_dict) (or
MappingProxyType({}) for empty) so the frozen dataclass cannot be mutated at
runtime; adjust any other places that construct CollectBoostLibrariesResult to
perform the same wrapping.
In `@core/collectors/base_collector.py`:
- Around line 94-97: The code currently trusts the value in
self._incremental_state_out and calls self.persist_incremental_state(state_out)
without validation; update the lifecycle boundary after load_incremental_state
(the place where _incremental_state_out is read) to validate the
incremental-state shape/type before persisting: call the same validation logic
you use for TrackerResult (or implement a small validator) to ensure required
keys/types/serializability, and if validation fails either raise a clear error
or log and skip persisting. Specifically modify the block reading
self._incremental_state_out (and the analogous spot around line 237) to run
validate_incremental_state(state_out) (or inline checks) and only call
persist_incremental_state(state_out) when valid, otherwise handle the error path
consistently.
- Around line 34-36: The run method on CollectorRunnable currently has a
nullable return (def run(self) -> TrackerResult | None) which allows
implementations to return None and undermines the new contract; change the
signature to def run(self) -> TrackerResult (remove | None) in base_collector.py
(CollectorRunnable / AbstractCollector), update any subclass implementations of
CollectorRunnable.run (and helper functions that construct or call run) to
always return a TrackerResult instead of None, and update import/type hints
accordingly so callers and implementations type-check against the non-nullable
TrackerResult contract.
In `@core/incremental_state.py`:
- Around line 9-15: GenericIncrementalState currently accepts a plain dict via
the field(default_factory=dict) so extras remains mutable; change it to an
immutable mapping by wrapping the provided mapping in types.MappingProxyType in
a __post_init__ (use object.__setattr__ because the dataclass is frozen).
Specifically, add import MappingProxyType, implement
GenericIncrementalState.__post_init__ that replaces self.extras with
MappingProxyType(dict(self.extras)) (or MappingProxyType(self.extras) after
copying) to ensure extras is read-only after construction and preserve the
frozen DTO contract.
In `@core/tests/test_collector_protocol_conformance.py`:
- Around line 34-37: The test module unconditionally imports
Wg21ReflectorIncrementalState and Wg21ReflectorTrackerResult causing pytest
collection to fail when wg21_reflector_collector is absent; wrap the import in a
try/except ImportError and set a boolean flag (e.g. HAS_WG21_REFLECTOR) so the
rest of the file still loads, then make the reflector-specific parametrized
cases conditional (either skip them with pytest.mark.skipif(not
HAS_WG21_REFLECTOR) or only append those params when HAS_WG21_REFLECTOR is True)
so references to Wg21ReflectorIncrementalState and Wg21ReflectorTrackerResult
are only used when the package is available.
In `@core/tracker_result.py`:
- Around line 12-19: GenericTrackerResult is frozen but its counts field can
still hold a mutable Mapping; make counts truly immutable by converting/copying
it into an immutable mapping in the dataclass post-init. Add a __post_init__ on
GenericTrackerResult that uses types.MappingProxyType(dict(self.counts)) (or
similar defensive copy-to-immutable) and assign it with object.__setattr__(self,
"counts", <mapping-proxy>) so callers passing mutable dicts cannot mutate the
stored counts; ensure you import types.MappingProxyType (or chosen immutable
wrapper). This change preserves the frozen dataclass semantics and fixes the
mutability for GenericTrackerResult.counts (also apply same pattern where the
same pattern is used).
- Around line 46-47: with_duration_if_missing currently unconditionally calls
dataclasses.replace(result, duration_seconds=...), which fails for dataclass
fields that are not init=True or are property-backed; update the function to
first inspect dataclasses.fields(result) for a field named "duration_seconds"
and only call replace if that field exists and field.init is True, otherwise
attempt to set the attribute directly via setattr(result, "duration_seconds",
duration_seconds) if the instance is mutable (i.e., not frozen), and if neither
approach works return the original result unchanged; reference the function
with_duration_if_missing, the variable result, and
dataclasses.replace/dataclasses.fields in your change.
In `@cppa_pinecone_sync/management/commands/run_cppa_pinecone_sync.py`:
- Around line 89-93: The completion log is using the wrong metric key: the
logging call that formats "CPPA Pinecone Sync completed: upserted=%s, total=%s,
failed_count=%s" is reading result.counts.get("errors", 0) instead of the
intended failed_count metric; update the logging call to use
result.counts.get("failed_count", 0) (locate the logger call that references
result.counts and change the key), ensuring the upserted/total/failed_count
values reflect the correct keys.
In `@cppa_slack_tracker/protocol_impl.py`:
- Around line 13-16: SlackTrackerResult.counts and SlackIncrementalState.extras
are currently mutable dicts despite dataclass(frozen=True); update construction
sites (SlackTrackerResult.from_counts, SlackTrackerResult.dry_run,
SlackIncrementalState.from_team and any default_factory=dict) to wrap/replace
dicts with an immutable mapping (e.g., types.MappingProxyType or a
frozendict-equivalent) before assigning so their contents cannot be mutated
post-instantiation; ensure the dataclass fields remain typed as Mapping[str,int]
/ Mapping[str,Any] and replace any default_factory=dict with an immutable
default (or None plus a wrapper) to preserve frozen semantics.
In `@cppa_youtube_script_tracker/protocol_impl.py`:
- Around line 13-17: YoutubeScriptTrackerResult is frozen but its counts field
can be a mutable dict; wrap counts in an immutable mapping to prevent mutation
by callers. In the YoutubeScriptTrackerResult dataclass add a __post_init__ that
imports types.MappingProxyType and uses object.__setattr__(self, "counts",
MappingProxyType(dict(self.counts))) (or MappingProxyType(self.counts) after
copying) so counts becomes an immutable mapping; reference the
YoutubeScriptTrackerResult class and the counts attribute when making this
change.
In `@wg21_paper_tracker/protocol_impl.py`:
- Around line 16-19: Wg21PaperTrackerResult's counts is currently a mutable dict
allowing post-creation mutation; wrap all dict literals assigned to counts in an
immutable mapping (e.g., types.MappingProxyType) and ensure the default factory
produces an immutable mapping as well. Concretely, import types and replace any
assignments in from_pipeline() and dry_run() like {"new_papers": n} or {} with
types.MappingProxyType({"new_papers": n}) / types.MappingProxyType({}), and
change the dataclass field default_factory for counts to a lambda returning
types.MappingProxyType({}) so counts is deeply immutable after construction.
---
Outside diff comments:
In
`@discord_activity_tracker/management/commands/backfill_discord_activity_tracker.py`:
- Around line 136-150: The current collect() implementation logs per-file
exceptions but always returns DiscordCollectionTrackerResult(success=True,
counts=...), which hides failures; modify collect() to accumulate per-file
errors (e.g., create an errors = [] list and append a structured entry
containing rel (from _json_display_path(import_dir, json_path)) and exc inside
the except block where logger.error and stdout.write are called), then set
success = len(errors) == 0 and include errors in the returned
DiscordCollectionTrackerResult (e.g.,
DiscordCollectionTrackerResult(success=success, counts={"messages":
processed_total, "files": len(json_files)}, errors=errors)); keep existing
logging/stdout behavior but ensure the DTO reflects any failures.
In
`@discord_activity_tracker/management/commands/run_discord_activity_tracker.py`:
- Around line 170-180: The reported bug is that the "channels" count in the
result is set to 0 when no --channels allowlist is provided (full-sync); inside
task_discord_sync you must compute and return the actual number of channels
processed rather than using len(channel_ids) which is 0 for full-sync. Fix by
deriving the count from the actual channels iterated/processed (e.g. use the
collector's resolved channel list or the processed_channels collection produced
by collector.run) and assign that value to the "channels" field returned in
TrackerResult.counts; update the same logic in the other affected blocks (the
other task_discord_sync usages at the noted ranges) so they all use the real
processed channel list instead of channel_ids.
---
Nitpick comments:
In `@cppa_slack_tracker/management/commands/run_cppa_slack_tracker.py`:
- Around line 219-234: The _sync_channel_users method is not accumulating
error_count into self._counts for "channel_memberships"; update
_sync_channel_users (which calls sync_channel_users) to increment
self._counts["channel_memberships_errors"] (or follow the existing error key
pattern used elsewhere) by error_count after the call, mirroring how
success_count is added to self._counts so the final SlackTrackerResult includes
channel membership errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f9f2033b-5987-49ac-85ed-8f3fcb6b3274
📒 Files selected for processing (50)
boost_library_docs_tracker/management/commands/run_boost_library_docs_tracker.pyboost_library_docs_tracker/protocol_impl.pyboost_library_docs_tracker/tests/test_run_boost_library_docs_tracker_command.pyboost_library_tracker/management/commands/collect_boost_libraries.pyboost_library_tracker/management/commands/run_boost_github_activity_tracker.pyboost_library_tracker/protocol_impl.pyboost_library_tracker/tests/test_collect_boost_libraries_command.pyboost_library_tracker/tests/test_protocol_impl.pyboost_library_usage_dashboard/collectors.pyboost_library_usage_dashboard/protocol_impl.pyboost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.pyboost_mailing_list_tracker/protocol_impl.pyboost_mailing_list_tracker/tests/test_protocol_impl.pyboost_usage_tracker/management/commands/run_boost_usage_tracker.pyclang_github_tracker/collectors.pyclang_github_tracker/protocol_impl.pyclang_github_tracker/tests/test_protocol_impl.pycore/collectors/README.mdcore/collectors/__init__.pycore/collectors/base_collector.pycore/collectors/command_base.pycore/incremental_state.pycore/management/commands/startcollector.pycore/protocols.pycore/pyright_samples/protocol_assignment_positive.pycore/tests/test_collector_protocol_conformance.pycore/tests/test_collectors_base.pycore/tests/test_protocols.pycore/tracker_result.pycppa_pinecone_sync/management/commands/run_cppa_pinecone_sync.pycppa_pinecone_sync/protocol_impl.pycppa_pinecone_sync/tests/test_protocol_impl.pycppa_slack_tracker/management/commands/run_cppa_slack_tracker.pycppa_slack_tracker/protocol_impl.pycppa_slack_tracker/tests/test_protocol_impl.pycppa_user_tracker/management/commands/run_cppa_user_tracker.pycppa_youtube_script_tracker/management/commands/run_cppa_youtube_script_tracker.pycppa_youtube_script_tracker/protocol_impl.pydiscord_activity_tracker/management/commands/backfill_discord_activity_tracker.pydiscord_activity_tracker/management/commands/run_discord_activity_tracker.pydiscord_activity_tracker/protocol_impl.pydocs/Core_public_API.mddocs/How_to_add_a_collector.mddocs/Tutorial_building_a_collector.mddocs/service_api/core_protocols.mdgithub_activity_tracker/protocol_impl.pygithub_activity_tracker/tests/test_protocol_impl.pywg21_paper_tracker/collectors.pywg21_paper_tracker/protocol_impl.pywg21_paper_tracker/tests/test_protocol_impl.py
…s tracker result classes
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/collectors/base_collector.py (1)
250-253:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDefer committing
last_resultuntil the run fully succeeds.If
post_collect()orpersist_incremental_state()raises,run()fails butlast_resulthas already been updated to the new result. That contradicts the property docstring and can expose a failed run as the “most recent successful” one.💡 Minimal fix
def run(self) -> TrackerResult: + previous_result = getattr(self, "_last_result", None) self._incremental_state_in = None self._incremental_state_out = None started = time.monotonic() try: self.pre_collect() @@ raw_result = self.collect() result = require_tracker_result(raw_result) elapsed = time.monotonic() - started result = with_duration_if_missing(result, elapsed) self._last_result = result self.post_collect() return result except Exception as exc: + self._last_result = previous_result try: self.on_error(exc)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/collectors/base_collector.py` around lines 250 - 253, The code currently assigns self._last_result before calling post_collect() (and before any persist_incremental_state() inside run()), which can mark a failed run as “most recent successful”; change the sequence in run()/collect flow so that result = with_duration_if_missing(...) is computed but self._last_result is only updated after post_collect() and after any call to persist_incremental_state() completes without raising—i.e., move the assignment to self._last_result to the end of the successful run path (after post_collect() and persistence), leaving elapsed/with_duration_if_missing intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@core/collectors/base_collector.py`:
- Around line 250-253: The code currently assigns self._last_result before
calling post_collect() (and before any persist_incremental_state() inside
run()), which can mark a failed run as “most recent successful”; change the
sequence in run()/collect flow so that result = with_duration_if_missing(...) is
computed but self._last_result is only updated after post_collect() and after
any call to persist_incremental_state() completes without raising—i.e., move the
assignment to self._last_result to the end of the successful run path (after
post_collect() and persistence), leaving elapsed/with_duration_if_missing
intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 684e1b60-7f1b-48e4-83ee-ad771c0beeb9
📒 Files selected for processing (12)
boost_library_docs_tracker/protocol_impl.pyboost_library_tracker/protocol_impl.pycore/collectors/base_collector.pycore/collectors/command_base.pycore/incremental_state.pycore/tracker_result.pycppa_pinecone_sync/management/commands/run_cppa_pinecone_sync.pycppa_pinecone_sync/protocol_impl.pycppa_pinecone_sync/tests/test_protocol_impl.pycppa_slack_tracker/protocol_impl.pycppa_youtube_script_tracker/protocol_impl.pywg21_paper_tracker/protocol_impl.py
🚧 Files skipped from review as they are similar to previous changes (11)
- cppa_youtube_script_tracker/protocol_impl.py
- cppa_slack_tracker/protocol_impl.py
- core/incremental_state.py
- wg21_paper_tracker/protocol_impl.py
- boost_library_tracker/protocol_impl.py
- core/tracker_result.py
- core/collectors/command_base.py
- boost_library_docs_tracker/protocol_impl.py
- cppa_pinecone_sync/tests/test_protocol_impl.py
- cppa_pinecone_sync/management/commands/run_cppa_pinecone_sync.py
- cppa_pinecone_sync/protocol_impl.py
jonathanMLDev
left a comment
There was a problem hiding this comment.
Should also update CHANGELOG.md, STABILITY.md
Summary
Extend
AbstractCollector.collect()to returnTrackerResultinstead ofNone.run()validates the result, backfillsduration_seconds, and exposeslast_result.Add shared DTOs (
GenericTrackerResult,GenericIncrementalState) and optional incremental checkpoint hooks (load_incremental_state/persist_incremental_state).Migrate all in-monorepo collectors to frozen
protocol_impl.pyDTOs and log structured outcomes fromBaseCollectorCommand.Update collector docs (
Core_public_API.md, tutorial, how-to) and add protocol conformance tests.Blocks the companion PR in
wg21-reflector-collector(overlay CI depends on this core contract).Apps touched
Test plan
python -m pytest core/tests/test_collectors_base.py core/tests/test_collector_protocol_conformance.py core/tests/test_protocols.pypython -m pytest */tests/test_protocol_impl.pyuv run pyright(if typed code changed)lint-imports(if imports or cross-app coupling changed)Docs / coupling
python scripts/generate_service_docs.pyrun (ifservices.pyorcore/protocols.pychanged)docs/updated (if behavior or ops changed)Closes #275
Summary by CodeRabbit
New Features
Improvements
Tests & Documentation