feat(discord): Update discord_activity_tracker app: per-channel incremental export#283
Conversation
…incremental exports and improved staging management
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds workspace-configured Discord session extraction and JSON persistence; refactors exporter to per-channel-per-UTC-day descriptors; merges exported JSON into per-day archives by message id; updates orchestration, Make/Docker/session tooling, tests, and docs. ChangesDiscord Internal Token Extraction and Per-Channel Daily Export
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/service_api/discord_activity_tracker.md (1)
85-97:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocument the per-channel incremental lower bound.
This still reads like a guild-wide “latest DB message” resume path, but
run_discord_activity_trackernow resumes each channel from its own latest stored message (or today UTC when empty). The current wording can send operators back to the quiet-channel skip behavior this PR is fixing.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/service_api/discord_activity_tracker.md` around lines 85 - 97, Update the docs text around run_discord_activity_tracker/DiscordChatExporter to clarify that the resume lower bound is per-channel, not guild-wide: replace the phrase "the lower bound is the latest stored message time for this guild (and channel allowlist)" with wording that each channel resumes from its own latest stored message time (and if a channel has no stored rows, only today UTC for that channel is exported), and also ensure the note about merging raw archives (`YYYY-MM-DD.json`) and the behavior when both --since and --until are set still applies per-channel; reference run_discord_activity_tracker and DiscordChatExporter in the doc so readers know where the behavior is implemented.
🧹 Nitpick comments (1)
discord_activity_tracker/sync/raw_archive.py (1)
92-98: ⚡ Quick winFix docstring indentation.
The closing line of the docstring (line 98) is indented with 2 spaces instead of 4, inconsistent with Python docstring conventions.
📝 Proposed fix
Returns the number of messages written to the merged file. - """ + """🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@discord_activity_tracker/sync/raw_archive.py` around lines 92 - 98, The docstring block in discord_activity_tracker/sync/raw_archive.py that documents merging exporter JSON has its closing triple-quote indented by 2 spaces; adjust the closing triple-quote to use 4-space indentation so it lines up with the other lines of the docstring (making the entire docstring consistently indented and PEP-style), i.e., move the closing """ to the same indentation level as the opening docstring in that function/module.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@discord_activity_tracker/sync/chat_exporter.py`:
- Around line 546-553: The current logic sets explicit_after = after_date if not
per_channel_incremental else None which drops a caller-supplied --since when
per_channel_incremental is True; change this so explicit_after always carries
the provided after_date (i.e., do not null it based on per_channel_incremental)
and pass that explicit_after through to resolve_channel_export_after(guild_id,
ch_id, explicit_after=explicit_after) so per-channel incremental uses each
channel's checkpoint only when no explicit --since was supplied. Ensure
resolve_channel_export_after still prefers explicit_after when present.
In `@discord_activity_tracker/sync/exporter_window.py`:
- Around line 104-108: Normalize naive `before` to UTC the same way `after` and
`now` are handled: when computing `upper` in exporter_window.py, treat a naive
`before` as UTC instead of local time by checking `before.tzinfo` and using
`before.replace(tzinfo=timezone.utc)` for naive datetimes, otherwise call
`before.astimezone(timezone.utc)`; keep the existing fallback to `now`. This
ensures `upper`, `before`, `after`, and `now` are consistently UTC-aware.
---
Outside diff comments:
In `@docs/service_api/discord_activity_tracker.md`:
- Around line 85-97: Update the docs text around
run_discord_activity_tracker/DiscordChatExporter to clarify that the resume
lower bound is per-channel, not guild-wide: replace the phrase "the lower bound
is the latest stored message time for this guild (and channel allowlist)" with
wording that each channel resumes from its own latest stored message time (and
if a channel has no stored rows, only today UTC for that channel is exported),
and also ensure the note about merging raw archives (`YYYY-MM-DD.json`) and the
behavior when both --since and --until are set still applies per-channel;
reference run_discord_activity_tracker and DiscordChatExporter in the doc so
readers know where the behavior is implemented.
---
Nitpick comments:
In `@discord_activity_tracker/sync/raw_archive.py`:
- Around line 92-98: The docstring block in
discord_activity_tracker/sync/raw_archive.py that documents merging exporter
JSON has its closing triple-quote indented by 2 spaces; adjust the closing
triple-quote to use 4-space indentation so it lines up with the other lines of
the docstring (making the entire docstring consistently indented and PEP-style),
i.e., move the closing """ to the same indentation level as the opening
docstring in that function/module.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3abf2f92-aaff-4575-b53a-9d81ca93cdf9
📒 Files selected for processing (32)
.env.exampleMakefileSECURITY.mdconfig/settings.pydiscord_activity_tracker/README.mddiscord_activity_tracker/management/commands/extract_discord_tokens.pydiscord_activity_tracker/management/commands/run_discord_activity_tracker.pydiscord_activity_tracker/sync/chat_exporter.pydiscord_activity_tracker/sync/exporter_window.pydiscord_activity_tracker/sync/raw_archive.pydiscord_activity_tracker/tests/test_chat_exporter_branch_coverage.pydiscord_activity_tracker/tests/test_discord_internal_tokens_store.pydiscord_activity_tracker/tests/test_discord_tokens.pydiscord_activity_tracker/tests/test_exporter_window.pydiscord_activity_tracker/tests/test_extract_discord_tokens_command.pydiscord_activity_tracker/tests/test_raw_archive.pydiscord_activity_tracker/tests/test_run_command_coverage.pydiscord_activity_tracker/tests/test_run_discord_activity_tracker_command.pydiscord_activity_tracker/tests/test_sync_chat_exporter.pydiscord_activity_tracker/tests/test_task_discord_sync_coverage.pydiscord_activity_tracker/tests/test_workspace.pydiscord_activity_tracker/utils/__init__.pydiscord_activity_tracker/utils/discord_internal_tokens_store.pydiscord_activity_tracker/utils/discord_tokens.pydiscord_activity_tracker/workspace.pydocker-compose.ymldocs/Docker.mddocs/Workspace.mddocs/operations/discord_chat_exporter.mddocs/service_api/discord_activity_tracker.mdscripts/clean-macos.shscripts/wait_discord_chrome_profile.sh
…t and enhance date handling in exports
…for empty databases
…edentials and improve documentation clarity
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
discord_activity_tracker/utils/discord_internal_tokens_store.py (1)
102-114: ⚡ Quick winClarify the docstring to distinguish Chrome profile extraction from JSON persistence.
The docstring says "Load credentials from workspace storage" which could be misread as loading from the workspace JSON file itself. The actual flow is: extract from Chrome profile LevelDB (workspace storage) → persist to workspace JSON. Consider rewording to make the source and destination clearer.
📝 Suggested docstring improvement
- """Load credentials from workspace storage and persist to workspace JSON.""" + """Extract credentials from Chrome profile (workspace storage) and persist to workspace JSON."""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@discord_activity_tracker/utils/discord_internal_tokens_store.py` around lines 102 - 114, The docstring for extract_and_save_discord_internal_tokens is ambiguous about sources/destinations; update it to clearly state this function extracts tokens from the Chrome profile LevelDB (via extract_discord_token_auto) and then persists them into the workspace JSON using save_discord_internal_tokens, e.g., "Extract tokens from Chrome profile LevelDB and persist to workspace JSON"; keep function name references (extract_and_save_discord_internal_tokens, extract_discord_token_auto, save_discord_internal_tokens) to locate the change.discord_activity_tracker/management/commands/run_discord_activity_tracker.py (1)
528-530: ⚡ Quick winClarify help text to reflect per-channel incremental behavior.
The help text says "or today (UTC) only when the DB has no rows" which could be read as guild-wide behavior. However, when
--sinceis omitted, the command uses per-channel incremental mode where each channel resumes from its own latest message (or today if that channel is empty). Consider clarifying that this is per-channel behavior to match the docstring at lines 109-111.📝 Suggested help text improvement
help="Exporter lower bound (--after): YYYY-MM-DD or ISO-8601 (UTC). " - "If omitted, uses the latest message time already in the DB for this guild " - "(and channel allowlist), or today (UTC) only when the DB has no rows.", + "If omitted, each channel resumes from the UTC day start of its own latest " + "stored message (per-channel incremental); channels with no rows export today (UTC) only.",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@discord_activity_tracker/management/commands/run_discord_activity_tracker.py` around lines 528 - 530, Update the help string for the --since/--after argument to state that the incremental resume behavior is applied per channel (each channel resumes from its own latest message, or today UTC if that channel has no rows), not guild-wide; edit the help text passed to the ArgumentParser (the add_argument call that defines "--since"/"--after") to mirror the docstring behavior described in run_discord_activity_tracker.py (see the module docstring around lines 109-111) so the help explicitly mentions "per-channel" incremental resume.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@SECURITY.md`:
- Around line 87-88: Update the rotation guidance to explicitly include
revocation and cleanup of persisted internal session credentials in addition to
bot/API tokens: mention that after rotating SLACK_BOT_TOKEN_<team_id>,
SLACK_APP_TOKEN_<team_id>, and DISCORD_TOKEN you must also revoke any workspace
JSON/session material and internal session tokens, delete or invalidate stored
session files, and re-establish fresh sessions for impacted services; reference
the persisted "workspace JSON/session material" and any internal session
credentials when describing steps to revoke, purge, and verify cleanup so
rotations truly invalidate access.
---
Nitpick comments:
In
`@discord_activity_tracker/management/commands/run_discord_activity_tracker.py`:
- Around line 528-530: Update the help string for the --since/--after argument
to state that the incremental resume behavior is applied per channel (each
channel resumes from its own latest message, or today UTC if that channel has no
rows), not guild-wide; edit the help text passed to the ArgumentParser (the
add_argument call that defines "--since"/"--after") to mirror the docstring
behavior described in run_discord_activity_tracker.py (see the module docstring
around lines 109-111) so the help explicitly mentions "per-channel" incremental
resume.
In `@discord_activity_tracker/utils/discord_internal_tokens_store.py`:
- Around line 102-114: The docstring for
extract_and_save_discord_internal_tokens is ambiguous about
sources/destinations; update it to clearly state this function extracts tokens
from the Chrome profile LevelDB (via extract_discord_token_auto) and then
persists them into the workspace JSON using save_discord_internal_tokens, e.g.,
"Extract tokens from Chrome profile LevelDB and persist to workspace JSON"; keep
function name references (extract_and_save_discord_internal_tokens,
extract_discord_token_auto, save_discord_internal_tokens) to locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5eae28d2-234d-4d26-8aae-8b7c3c6d6bb7
📒 Files selected for processing (26)
CHANGELOG.mdSECURITY.mdcore/operations/slack_ops/fetcher.pycore/tests/operations/test_slack_fetcher.pydiscord_activity_tracker/README.mddiscord_activity_tracker/management/commands/extract_discord_tokens.pydiscord_activity_tracker/management/commands/run_discord_activity_tracker.pydiscord_activity_tracker/sync/chat_exporter.pydiscord_activity_tracker/tests/test_discord_internal_tokens_store.pydiscord_activity_tracker/tests/test_extract_discord_tokens_command.pydiscord_activity_tracker/tests/test_run_discord_activity_tracker_command.pydiscord_activity_tracker/utils/discord_internal_tokens_store.pydiscord_activity_tracker/utils/discord_tokens.pydiscord_activity_tracker/workspace.pydocs/Architecture_overview.mddocs/Deployment.mddocs/Docker.mddocs/GCP_Production_Checklist.mddocs/operations/discord_chat_exporter.mddocs/service_api/discord_activity_tracker.mdslack_event_handler/management/commands/extract_slack_tokens.pyslack_event_handler/tests/test_extract_slack_tokens_command.pyslack_event_handler/tests/test_slack_internal_tokens_store.pyslack_event_handler/utils/slack_internal_tokens_store.pyslack_event_handler/utils/slack_tokens.pyslack_event_handler/workspace.py
💤 Files with no reviewable changes (2)
- docs/Docker.md
- docs/Deployment.md
✅ Files skipped from review due to trivial changes (11)
- slack_event_handler/workspace.py
- docs/Architecture_overview.md
- core/tests/operations/test_slack_fetcher.py
- CHANGELOG.md
- docs/GCP_Production_Checklist.md
- discord_activity_tracker/README.md
- core/operations/slack_ops/fetcher.py
- slack_event_handler/utils/slack_internal_tokens_store.py
- docs/operations/discord_chat_exporter.md
- slack_event_handler/utils/slack_tokens.py
- docs/service_api/discord_activity_tracker.md
🚧 Files skipped from review as they are similar to previous changes (6)
- discord_activity_tracker/tests/test_extract_discord_tokens_command.py
- discord_activity_tracker/workspace.py
- discord_activity_tracker/tests/test_run_discord_activity_tracker_command.py
- discord_activity_tracker/utils/discord_tokens.py
- discord_activity_tracker/sync/chat_exporter.py
- discord_activity_tracker/management/commands/extract_discord_tokens.py
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
SECURITY.md (1)
80-92:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd guidance on revoking persisted session credentials after a suspected compromise.
This PR introduces persisted internal Discord and Slack session credentials (workspace JSON files:
slack_internal_tokens.json, per-channel Discord exports). The rotation table (lines 87–88) only lists environment variables. After rotating bot/API tokens, deployments must also revoke and delete workspace-scoped session credential files to invalidate leaked sessions.Suggested addition to the table or as a follow-up bullet:
- Explicitly state that after rotating credentials, operators must delete or invalidate
{WORKSPACE_DIR}/slack_event_handler/slack_internal_tokens.jsonand any persisted Discord session/token files.- Reference the workspace JSON locations and re-establish fresh sessions afterward (matching the pattern described for Slack/Discord token extraction).
This aligns with your prior review comment and ensures rotations are complete.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@SECURITY.md` around lines 80 - 92, The rotation guidance is missing revocation of persisted workspace session files; update SECURITY.md near the "Credential rotation after a suspected compromise" section to instruct operators that after rotating bot/API tokens they must also revoke and delete persisted session credential files (for example {WORKSPACE_DIR}/slack_event_handler/slack_internal_tokens.json and any per-channel Discord session/token exports) and then re-establish fresh sessions; mention both the Slack internal tokens JSON and the persisted Discord session files by name and include a short note to recreate sessions after deletion.
🧹 Nitpick comments (1)
discord_activity_tracker/sync/exporter_window.py (1)
110-136: ⚡ Quick winEliminate duplicate UTC conversions for
afterandbefore.The code converts
aftertoafter_utcat lines 114-118, then recomputes the same conversion at lines 122-128. Similarly,beforeis converted to UTC forupperat lines 102-108, then the identical conversion is repeated forbefore_utcat lines 130-136. This duplication makes the code harder to maintain and unnecessarily verbose.Refactor by computing
after_utcandbefore_utconce at the top of the function, then reuse those values when computingfirst_day,last_day, and the clipping logic.♻️ Suggested refactor
+ # Normalize after and before to UTC once + after_utc: datetime | None = None + if after is not None: + after_utc = ( + after.astimezone(timezone.utc) + if after.tzinfo is not None + else after.replace(tzinfo=timezone.utc) + ) + + before_utc: datetime | None = None + if before is not None: + before_utc = ( + before.astimezone(timezone.utc) + if before.tzinfo is not None + else before.replace(tzinfo=timezone.utc) + ) + - upper = now - if before is not None: - upper = ( - before.astimezone(timezone.utc) - if before.tzinfo is not None - else before.replace(tzinfo=timezone.utc) - ) + upper = before_utc if before_utc is not None else now - if after is None: + if after_utc is None: first_day = utc_day_start(now) last_day = first_day else: - after_utc = ( - after.astimezone(timezone.utc) - if after.tzinfo is not None - else after.replace(tzinfo=timezone.utc) - ) first_day = utc_day_start(after_utc) last_day = utc_day_start(upper) - after_utc: datetime | None = None - if after is not None: - after_utc = ( - after.astimezone(timezone.utc) - if after.tzinfo is not None - else after.replace(tzinfo=timezone.utc) - ) - - before_utc: datetime | None = None - if before is not None: - before_utc = ( - before.astimezone(timezone.utc) - if before.tzinfo is not None - else before.replace(tzinfo=timezone.utc) - ) - result: list[tuple[str, datetime, datetime]] = []🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@discord_activity_tracker/sync/exporter_window.py` around lines 110 - 136, Compute after_utc and before_utc once at the start (convert timezone-aware datetimes to UTC or set tzinfo=UTC when missing) and reuse those variables when calculating first_day = utc_day_start(after_utc), last_day = utc_day_start(upper), and any clipping logic; remove the duplicated conversion blocks and any redundant reassignments so only the single initial after_utc and before_utc conversions are referenced throughout the function (look for symbols after_utc, before_utc, utc_day_start, first_day, last_day, upper).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@discord_activity_tracker/utils/discord_tokens.py`:
- Around line 135-149: The current exception handlers inside the for loop over
DISCORD_TOKEN_KEY and DISCORD_TOKEN_KEY_LEGACY return None on any error,
preventing fallback to the legacy key; change the except blocks in the loop that
catch ValueError and Exception so they log the error via logger.warning
(including the exception) but use continue to proceed to the next key instead of
returning, and only return None after the loop completes (e.g., if neither
_read_leveldb_value/_parse_discord_token_raw succeeded); update references to
_read_leveldb_value, _parse_discord_token_raw, DISCORD_TOKEN_KEY and
DISCORD_TOKEN_KEY_LEGACY accordingly.
---
Outside diff comments:
In `@SECURITY.md`:
- Around line 80-92: The rotation guidance is missing revocation of persisted
workspace session files; update SECURITY.md near the "Credential rotation after
a suspected compromise" section to instruct operators that after rotating
bot/API tokens they must also revoke and delete persisted session credential
files (for example
{WORKSPACE_DIR}/slack_event_handler/slack_internal_tokens.json and any
per-channel Discord session/token exports) and then re-establish fresh sessions;
mention both the Slack internal tokens JSON and the persisted Discord session
files by name and include a short note to recreate sessions after deletion.
---
Nitpick comments:
In `@discord_activity_tracker/sync/exporter_window.py`:
- Around line 110-136: Compute after_utc and before_utc once at the start
(convert timezone-aware datetimes to UTC or set tzinfo=UTC when missing) and
reuse those variables when calculating first_day = utc_day_start(after_utc),
last_day = utc_day_start(upper), and any clipping logic; remove the duplicated
conversion blocks and any redundant reassignments so only the single initial
after_utc and before_utc conversions are referenced throughout the function
(look for symbols after_utc, before_utc, utc_day_start, first_day, last_day,
upper).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ef93b794-ffab-42da-9326-077a00b50172
📒 Files selected for processing (45)
.env.exampleCHANGELOG.mdMakefileSECURITY.mdconfig/settings.pyconfig/test_settings.pycore/operations/slack_ops/fetcher.pycore/tests/operations/test_slack_fetcher.pydiscord_activity_tracker/README.mddiscord_activity_tracker/management/commands/extract_discord_tokens.pydiscord_activity_tracker/management/commands/run_discord_activity_tracker.pydiscord_activity_tracker/sync/chat_exporter.pydiscord_activity_tracker/sync/exporter_window.pydiscord_activity_tracker/sync/raw_archive.pydiscord_activity_tracker/tests/test_chat_exporter_branch_coverage.pydiscord_activity_tracker/tests/test_discord_internal_tokens_store.pydiscord_activity_tracker/tests/test_discord_tokens.pydiscord_activity_tracker/tests/test_exporter_window.pydiscord_activity_tracker/tests/test_extract_discord_tokens_command.pydiscord_activity_tracker/tests/test_raw_archive.pydiscord_activity_tracker/tests/test_run_command_coverage.pydiscord_activity_tracker/tests/test_run_discord_activity_tracker_command.pydiscord_activity_tracker/tests/test_sync_chat_exporter.pydiscord_activity_tracker/tests/test_task_discord_sync_coverage.pydiscord_activity_tracker/tests/test_workspace.pydiscord_activity_tracker/utils/__init__.pydiscord_activity_tracker/utils/discord_internal_tokens_store.pydiscord_activity_tracker/utils/discord_tokens.pydiscord_activity_tracker/workspace.pydocker-compose.ymldocs/Architecture_overview.mddocs/Deployment.mddocs/Docker.mddocs/GCP_Production_Checklist.mddocs/Workspace.mddocs/operations/discord_chat_exporter.mddocs/service_api/discord_activity_tracker.mdscripts/clean-macos.shscripts/wait_discord_chrome_profile.shslack_event_handler/management/commands/extract_slack_tokens.pyslack_event_handler/tests/test_extract_slack_tokens_command.pyslack_event_handler/tests/test_slack_internal_tokens_store.pyslack_event_handler/utils/slack_internal_tokens_store.pyslack_event_handler/utils/slack_tokens.pyslack_event_handler/workspace.py
💤 Files with no reviewable changes (2)
- docs/Deployment.md
- docs/Docker.md
Summary
Improves scheduled Discord export reliability for
run_discord_activity_tracker:raw/discord_activity_tracker/<server_id>/<channel_id>/YYYY-MM-DD.jsonmerges by message id.Apps touched
Test plan
python -m pytest discord_activity_tracker/tests/ -vuv run pyright(if typed code changed)lint-imports(if imports or cross-app coupling changed)Docs / coupling
python scripts/generate_service_docs.pyrun (ifservices.pyorcore/protocols.pychanged)docs/updated (if behavior or ops changed)Closes #278
Summary by CodeRabbit