Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,18 @@ DATABASE_URL=postgres://user:password@localhost:5432/boost_dashboard
# See docs/operations/discord_chat_exporter.md (Tyrrrz upstream: Token and IDs, CLI guide).
# DISCORD_USER_TOKEN=your.user.token
#
# --- Internal Discord user token (compliance-gated) ---
# Do not put user token in .env when using workspace JSON. When enabled, tokens live in
# workspace JSON and are loaded at runtime (not at Django startup). Export can re-extract
# from the Chrome profile when JSON tokens are stale but the browser session is still valid.
# ALLOW_INTERNAL_DISCORD_TOKENS=false
# DISCORD_INTERNAL_TOKENS_JSON=
# Default path: workspace/discord_activity_tracker/discord_internal_tokens.json
#
# Chrome user-data directory (logged-in Discord session on disk):
# DISCORD_CHROME_PROFILE_PATH=
# Default: workspace/discord_activity_tracker/chrome_profile
#
# DISCORD_SERVER_ID=987654321098765432
# DISCORD_CONTEXT_REPO_PATH=/absolute/path/to/discord-cplusplus-together-context
# DISCORD_CONTEXT_AUTO_COMMIT=false
Expand Down
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Resolved five cross-app import tech-debt edges: Pinecone via `cppa_pinecone_sync.sync_api`, dashboard model shim removed, CSV owner lookup via `cppa_user_tracker.services`, clang imports via `github_activity_tracker.sync_api`.
- Added **import-linter** contracts and pre-commit hook to prevent regressions.
- Enforced **service-layer-only ORM writes** with `scripts/check_service_layer_writes.py` and pre-commit; moved remaining direct writes (repo metadata sync, star bulk-update, GitHub file backfill, BoostVersion import, commit file-change backfill) into `github_activity_tracker.services` / `boost_library_tracker.services`. Allowlist [`.service-layer-write-allowlist.json`](.service-layer-write-allowlist.json) is empty by default for new debt only.
- **slack_event_handler:** Workspace under `workspace/slack_event_handler/`; replace Selenium with `plyvel` + `browser-cookie3` extraction from `CHROME_PROFILE_PATH` (optional Compose `slack-session` / `slack-chromium` noVNC on port 7900 and `manage.py extract_slack_tokens`), store xoxc/xoxd in `slack_internal_tokens.json` with runtime load and automatic re-extract when stale, and remove `slack_session_refresh`, `refresh_slack_tokens`, and the `slack-profile-refresh` compose service.
- **slack_event_handler:** Workspace under `workspace/slack_event_handler/`; huddle support configuration moved to workspace paths.
- Pydantic boundary schemas at GitHub, Slack, and Discord ingestion (`api_schemas.py` per app; Discord ChatExporter uses `staging_schema.py`); fetchers validate with `model_validate()`; services accept typed payloads; `classify_failure` maps validation errors to `VALIDATION`.

## [0.1.0] - 2026-05-22
Expand Down
38 changes: 38 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,14 @@ help:
@echo " slack-tokens-reextract Stop chromium → extract JSON"
@echo " slack-tokens-refresh Login (noVNC) → wait → extract JSON"
@echo ""
@echo " Discord session (user token extraction)"
@echo " discord-login Start discord-chromium (noVNC http://127.0.0.1:7901)"
@echo " discord-wait-profile Wait until Discord login wrote Cookies + LevelDB"
@echo " discord-login-stop Stop discord-chromium before extract"
@echo " extract-discord-tokens Extract token to workspace JSON (one-shot)"
@echo " discord-tokens-reextract Stop chromium → extract JSON"
@echo " discord-tokens-refresh Login (noVNC) → wait → extract JSON"
@echo ""
@echo " Utilities"
@echo " clean-mac Remove macOS ._* resource-fork files"
@echo " clean-pyc Remove compiled Python files"
Expand Down Expand Up @@ -203,6 +211,36 @@ slack-tokens-reextract: extract-slack-tokens
# Login in noVNC, wait for profile files, then extract JSON.
slack-tokens-refresh: slack-login slack-wait-profile extract-slack-tokens

# ── Discord session ───────────────────────────────────────────────────────────

.PHONY: discord-login discord-wait-profile discord-login-stop extract-discord-tokens \
discord-tokens-reextract discord-tokens-refresh

discord-login:
@mkdir -p workspace/discord_activity_tracker/chrome_profile
@rm -f workspace/discord_activity_tracker/chrome_profile/SingletonLock \
workspace/discord_activity_tracker/chrome_profile/SingletonCookie \
workspace/discord_activity_tracker/chrome_profile/SingletonSocket
$(COMPOSE) --profile discord-session up -d --force-recreate discord-chromium
@echo "noVNC (password: secret) — Chrome does NOT open automatically:"
@echo " http://127.0.0.1:7901/?autoconnect=1&resize=scale&password=secret"
@echo "Right-click desktop → Web Browsing → Google Chrome → https://discord.com"
@command -v open >/dev/null 2>&1 && open "http://127.0.0.1:7901/?autoconnect=1&resize=scale&password=secret" || true

discord-wait-profile:
@chmod +x scripts/wait_discord_chrome_profile.sh
@./scripts/wait_discord_chrome_profile.sh

discord-login-stop:
$(COMPOSE) --profile discord-session stop discord-chromium

extract-discord-tokens: discord-login-stop
$(MANAGE) extract_discord_tokens

discord-tokens-reextract: extract-discord-tokens

discord-tokens-refresh: discord-login discord-wait-profile extract-discord-tokens

# ── Utilities ─────────────────────────────────────────────────────────────────

.PHONY: clean-mac
Expand Down
7 changes: 3 additions & 4 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ We consider reports for security weaknesses in **this repository** in the follow
- **Django application** — web views, authentication and authorization, sessions, CSRF, admin, settings in [`config/settings.py`](config/settings.py), and deployment-related toggles documented in [`.env.example`](.env.example) (for example `USE_X_FORWARDED_HOST`, `USE_TLS_PROXY_HEADERS`, `CSRF_TRUSTED_ORIGINS`, `ALLOWED_HOSTS`).
- **Management commands and scheduled work** — collectors and related commands, including behavior under Celery/Celery Beat when used as documented (for example [`docs/Workflow.md`](docs/Workflow.md), `config/boost_collector_schedule.yaml`).
- **Credential and secret handling** — how tokens, keys, cookies, and workspace files are read, stored, logged, and passed to subprocesses or external APIs.
- **Integrations** — GitHub API usage; Slack and Discord connectors; Pinecone sync; YouTube API usage; **Chrome profile / session token** flows for Slack huddles (see [`.env.example`](.env.example)).
- **Integrations** — GitHub API usage; Slack and Discord connectors; Pinecone sync; YouTube API usage.
- **Workspace and filesystem** — paths under `WORKSPACE_DIR` / `RAW_DIR` and related processing, when failure could lead to arbitrary file access, data leaks, or unsafe deserialization.

### Out of scope
Expand All @@ -84,11 +84,10 @@ If you operate a deployment and suspect a leak or breach, **rotate** at least th
| Category | Examples / environment variables |
| --- | --- |
| **GitHub** | `GITHUB_TOKEN`, `GITHUB_TOKENS_SCRAPING` (multi-token pool), `GITHUB_TOKEN_WRITE`; PAT-style tokens used by integrations (for example `SLACK_PR_BOT_GITHUB_TOKEN` if it is a PAT) |
| **Slack** | `SLACK_BOT_TOKEN_<team_id>`, `SLACK_APP_TOKEN_<team_id>`; if enabled: internal session tokens in `workspace/slack_event_handler/slack_internal_tokens.json` (see `ALLOW_INTERNAL_SLACK_TOKENS` in `.env.example`) |
| **Discord** | `DISCORD_TOKEN` (bot token — supported path); **`DISCORD_USER_TOKEN`** for automation **conflicts with Discord’s Terms of Service** and may result in **account termination** — **rotate and discontinue** use; migrate to bot-based flows where applicable (see [`.env.example`](.env.example) and project docs) |
| **Slack** | `SLACK_BOT_TOKEN_<team_id>`, `SLACK_APP_TOKEN_<team_id>` |
| **Discord** | `DISCORD_TOKEN` |
Comment thread
leostar0412 marked this conversation as resolved.
| **Pinecone** | `PINECONE_API_KEY`, `PINECONE_PRIVATE_API_KEY`, and any host/index settings that grant write access |
| **YouTube** | `YOUTUBE_API_KEY` |
| **Browser session material** | Data derived from **Chrome profiles or cookies** (`CHROME_PROFILE_PATH`, `slack_internal_tokens.json`, and related flows) — treat as secrets; clear or rotate sessions and profiles as appropriate |

Also rotate **Django** `SECRET_KEY` and **database** credentials (`DATABASE_URL` or `DB_*`) if there is any chance the application or its configuration was exposed.

Expand Down
13 changes: 13 additions & 0 deletions config/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,19 @@ def _slack_team_scope_from_env():
# Discord configuration (for discord_activity_tracker)
DISCORD_TOKEN = (env("DISCORD_TOKEN", default="") or "").strip()
DISCORD_USER_TOKEN = (env("DISCORD_USER_TOKEN", default="") or "").strip()
ALLOW_INTERNAL_DISCORD_TOKENS = (
env("ALLOW_INTERNAL_DISCORD_TOKENS", default="") or ""
).strip().lower() == "true"
DISCORD_INTERNAL_TOKENS_JSON = (
env("DISCORD_INTERNAL_TOKENS_JSON", default="") or ""
).strip()
# Chrome user-data dir for Discord user token extraction (logged-in session on disk)
_DEFAULT_DISCORD_CHROME_PROFILE = str(
WORKSPACE_DIR / "discord_activity_tracker" / "chrome_profile"
)
DISCORD_CHROME_PROFILE_PATH = (
env("DISCORD_CHROME_PROFILE_PATH", default=_DEFAULT_DISCORD_CHROME_PROFILE) or ""
).strip()
_discord_server_id_str = (env("DISCORD_SERVER_ID", default="") or "").strip()
DISCORD_SERVER_ID: int | None = (
int(_discord_server_id_str) if _discord_server_id_str.isdigit() else None
Expand Down
5 changes: 5 additions & 0 deletions config/test_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,8 @@

# Tests patch a single subprocess.Popen for DiscordChatExporter.
DISCORD_CHAT_EXPORTER_SEQUENTIAL_EXPORT = False

# Tests set DISCORD_USER_TOKEN via monkeypatch; do not inherit internal-token mode
# from developer .env (get_or_load_discord_user_token would ignore env token).
ALLOW_INTERNAL_DISCORD_TOKENS = False
DISCORD_USER_TOKEN = ""
23 changes: 11 additions & 12 deletions core/operations/slack_ops/fetcher.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
Slack Fetcher: file download, user/channel info, huddle transcript.
Uses SlackAPIClient for API calls; file download and xoxc/xoxd transcript here.
Uses SlackAPIClient for API calls; huddle transcript uses workspace session credentials.
"""

import os
Expand Down Expand Up @@ -213,11 +213,10 @@ def download_file(file_url, save_path=None, filename=None, bot_token=None):

def fetch_huddle_transcript(file_id):
"""
Fetch huddle transcript/file info using xoxc/xoxd from workspace JSON.
Fetch huddle transcript/file info using session credentials from workspace JSON.

Stale JSON tokens with a valid Chrome profile are refreshed automatically via
get_or_load_slack_internal_token_pair (probe + re-extract). On auth errors,
re-extract is attempted once more before giving up.
Stale credentials are refreshed automatically. On auth errors, one refresh retry
is attempted before giving up.
"""
from slack_event_handler.utils.slack_internal_tokens_store import (
SLACK_TOKENS_RELOGIN_HINT,
Expand All @@ -234,16 +233,16 @@ def fetch_huddle_transcript(file_id):
if not pair:
if team_id:
logger.error(
"Cannot fetch huddle transcript for file %s: no valid Slack internal "
"tokens for team %s. %s",
"Cannot fetch huddle transcript for file %s: no valid session "
"credentials for team %s. %s",
file_id,
team_id,
SLACK_TOKENS_RELOGIN_HINT,
)
else:
logger.error(
"Cannot fetch huddle transcript for file %s: no Slack team id "
"(set SLACK_TEAM_IDS) and no valid internal tokens. %s",
"(set SLACK_TEAM_IDS) and no valid session credentials. %s",
file_id,
SLACK_TOKENS_RELOGIN_HINT,
)
Expand All @@ -270,7 +269,7 @@ def fetch_huddle_transcript(file_id):
if team_id and is_slack_internal_token_auth_error(err) and not reextracted:
reextracted = True
logger.info(
"Slack auth error (%s); re-extracting tokens from Chrome profile",
"Slack auth error (%s); refreshing session credentials",
err,
)
new_pair = _extract_validate_and_return(team_id)
Expand All @@ -280,8 +279,8 @@ def fetch_huddle_transcript(file_id):
cookies = {"d": xoxd_token}
continue
logger.error(
"Cannot fetch huddle transcript for file %s: re-extract from Chrome "
"profile did not yield valid tokens for team %s. %s",
"Cannot fetch huddle transcript for file %s: credential refresh did not "
"yield valid session for team %s. %s",
file_id,
team_id,
SLACK_TOKENS_RELOGIN_HINT,
Expand All @@ -291,7 +290,7 @@ def fetch_huddle_transcript(file_id):
log_slack_internal_tokens_still_invalid(team_id)
logger.error(
"Cannot fetch huddle transcript for file %s: Slack auth error (%s) "
"after re-extract. %s",
"after credential refresh. %s",
file_id,
err,
SLACK_TOKENS_RELOGIN_HINT,
Expand Down
2 changes: 1 addition & 1 deletion core/tests/operations/test_slack_fetcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -474,7 +474,7 @@ def test_fetch_huddle_auth_error_when_reextract_fails(
with caplog.at_level(logging.ERROR):
assert fetch_huddle_transcript("Fx") is None
_mock_reextract.assert_called_once_with("T1")
assert "slack-tokens-refresh" in caplog.text
assert ".env.example" in caplog.text


@override_settings(ALLOW_INTERNAL_SLACK_TOKENS=True)
Expand Down
6 changes: 3 additions & 3 deletions discord_activity_tracker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Ingests **Discord server activity** (messages, threads, exports) into PostgreSQL

### Where we fetch data

**Discord** via **DiscordChatExporter** (bot/user token + server/channel configuration) within the `--since`/`--until` window, honoring resume semantics documented in the command help.
**Discord** via **DiscordChatExporter** (configured credentials + server/channel configuration) within the `--since`/`--until` window, honoring resume semantics documented in the command help.

### How data is saved to the database

Expand All @@ -32,7 +32,7 @@ Unless `--skip-pinecone` (or deprecated `--ignore-pinecone`) is set, the run inv

## Main command: `run_discord_activity_tracker`

Orchestrates exporter fetch → DB upsert + raw JSON → Markdown export to `DISCORD_CONTEXT_REPO_PATH` → optional Pinecone via `run_cppa_pinecone_sync`. Requires `DISCORD_USER_TOKEN`, `DISCORD_SERVER_ID`; channel scope from `DISCORD_CHANNEL_IDS` unless `--channels` is set.
Orchestrates exporter fetch → DB upsert + raw JSON → Markdown export to `DISCORD_CONTEXT_REPO_PATH` → optional Pinecone via `run_cppa_pinecone_sync`. Requires configured Discord credentials (see `.env.example`), plus `DISCORD_SERVER_ID`; channel scope from `DISCORD_CHANNEL_IDS` unless `--channels` is set.

| Option | Description |
| --- | --- |
Expand All @@ -41,7 +41,7 @@ Orchestrates exporter fetch → DB upsert + raw JSON → Markdown export to `DIS
| `--skip-markdown-export` | Skip writing Markdown from the DB to `DISCORD_CONTEXT_REPO_PATH`. |
| `--skip-remote-push` | Skip git commit/push after Markdown export (when auto-commit is enabled). |
| `--skip-pinecone` / `--ignore-pinecone` | Skip Pinecone upsert for Discord messages (`--ignore-pinecone` is a deprecated alias). |
| `--since`, `--from-date`, `--start-time` | Exporter lower bound (`--after`): `YYYY-MM-DD` or ISO-8601 UTC. If omitted, resumes from latest DB message for the guild (or full history if empty). |
| `--since`, `--from-date`, `--start-time` | Exporter lower bound (`--after`): `YYYY-MM-DD` or ISO-8601 UTC. If omitted, resumes from latest DB message for the guild (or today UTC only if empty). |
| `--until`, `--to-date`, `--end-time` | Exporter upper bound (`--before`); same formats. Omitted = through present. |
| `--channels` | Comma-separated channel IDs (overrides `DISCORD_CHANNEL_IDS`). |
| `--task` | **Deprecated.** `sync` \| `export` \| `all` — prefer `--skip-*` flags. |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
"""
Management command: extract_discord_tokens

Persist Discord session credentials to workspace JSON.
"""

import logging

from django.conf import settings
from django.core.management.base import BaseCommand, CommandError

from discord_activity_tracker.utils.discord_internal_tokens_store import (
discord_internal_tokens_json_path,
extract_and_save_discord_internal_tokens,
)
from discord_activity_tracker.utils.discord_tokens import (
_resolve_discord_chrome_profile_root,
)
from discord_activity_tracker.workspace import get_chrome_profile_path

logger = logging.getLogger(__name__)


class Command(BaseCommand):
help = (
"Persist Discord session credentials to "
"workspace/discord_activity_tracker/discord_internal_tokens.json."
)

def handle(self, *args, **options):
allow_raw = getattr(settings, "ALLOW_INTERNAL_DISCORD_TOKENS", "") or ""
if isinstance(allow_raw, bool):
allow = allow_raw
else:
allow = str(allow_raw).strip().lower() == "true"
if not allow:
self.stderr.write(
self.style.WARNING(
"Internal Discord session mode is not enabled: credentials will be saved to "
"workspace JSON but ignored by Django until enabled. "
"Restart web/celery after enabling. See .env.example."
)
)

try:
profile = _resolve_discord_chrome_profile_root()
except ValueError as e:
raise CommandError(str(e)) from e
profile_path = str(profile)
if not profile.is_dir():
raise CommandError(
"Session storage not found "
f"({profile_path}). Expected: {get_chrome_profile_path()}. "
"See .env.example."
)

token = extract_and_save_discord_internal_tokens()
if not token:
raise CommandError("Failed to load session credentials. See .env.example.")
out_path = discord_internal_tokens_json_path()
self.stdout.write(
self.style.SUCCESS(f"Saved Discord session credentials to {out_path}.")
)
Loading
Loading