Intake Draft � OpenBrain Playwright fallback retrieval (OB-0MNHT5HTC0070EL7)
Headline summary
Produce a concise feature request document (docs/feature-requests/openbrain-playwright-fallback-retrieval.md) that specifies requirements, acceptance criteria, CI/test strategies, and telemetry for adding a Playwright-based retrieval fallback in OpenBrain. SourceBase will author and hand this document to PM/Engineering; implementation belongs in the OpenBrain repo.
Problem statement
Some web pages render their primary content with client-side JavaScript. The current OpenBrain retrieval path (fast HTML extraction) can fail to return usable content for these pages, causing ingestion to fail. A documented fallback that uses a headless browser (Playwright) is needed so OpenBrain can ingest JavaScript-heavy pages reliably when the primary extractor is insufficient.
Users
- Discord community operators who post links and expect them to be indexed by OpenBrain (example: as a community operator, when someone posts a JS-heavy article I want the link ingested so the knowledge is searchable).
- OpenBrain operators and automation authors who run
ob add in varied environments (example: as an automation author I want a configurable fallback so I can control resource and CI behaviour).
- OpenBrain engineers who will implement the fallback (example: as an engineer I need a clear set of technical acceptance criteria and test strategies to implement Playwright retrieval safely).
Success criteria
- The feature request document exists at docs/feature-requests/openbrain-playwright-fallback-retrieval.md and contains: problem statement, users and user stories, technical acceptance criteria, CI/testing strategy (record/playback + mock option), telemetry/diagnostics requirements, and an implementation sketch.
- The work item OB-0MNHT5HTC0070EL7 references the doc (link present in description) and the work item stage is set to
intake_complete.
- The document references the following related code and work items so implementers can start without additional discovery: OB-0MN9HWGAL001452N, OB-0MNFXR3E4005TGYX, src/cli/commands/add.ts, src/lib/ingestion/service.ts, and src/lib/ingestion/extractor.ts.
Constraints
- This repository (SourceBase) is the Discord bot integration layer; the retrieval fallback implementation belongs in the OpenBrain repo. SourceBase's scope is limited to producing the feature request doc and making any necessary documentation/behavior changes to the bot if explicitly requested.
- Playwright introduces platform/runtime dependencies and resource costs; prefer an opt-in configuration flag and record/playback fixtures for public CI runs.
- Telemetry and diagnostics must be non-sensitive and must record only metadata (fallback used, provider, duration, error notes); do not persist user secrets.
Existing state
- Work item exists: OB-0MNHT5HTC0070EL7 (current stage: idea; assignee Map).
- No file found at docs/feature-requests/openbrain-playwright-fallback-retrieval.md (agent search: file not present).
- The repo already uses Playwright in test-related tooling (dev deps include @vitest/browser-playwright) and the ingestion pipeline is exercised by
ob add (entrypoints: src/cli/commands/add.ts and src/lib/ingestion/service.ts).
- Related prior work items exist addressing ingestion and provider-specific fallbacks (see Related work below).
Desired change
- Create docs/feature-requests/openbrain-playwright-fallback-retrieval.md containing the sections described under Success criteria. The document should be review-ready for PM/Engineering handoff and include suggested telemetry fields and a CI/test strategy.
- Update OB-0MNHT5HTC0070EL7 description to link the doc and set stage to
intake_complete.
- Optionally: create follow-up child work items in OpenBrain for implementation (PlaywrightExtractor, CI test harness, telemetry), but do not implement code changes in SourceBase as part of this item.
Related work
- OB-0MN9HWGAL001452N � Ingest CLI: file and URL ingestion
Relevance: Primary ingestion flow (src/cli/commands/add.ts -> src/lib/ingestion/service.ts). Playwright output should be compatible with this ingestion pipeline.
- OB-0MNFXR3E4005TGYX � Fix YouTube ingestion for ob add
Relevance: Example of a provider-specific fallback and associated tests/diagnostics.
- OB-0MN9CZ48N0053L9Q � Create a full PRD for OpenBrain
Relevance: Product-level guidance: local-first preferences and documented fallback policies.
- OB-0MNGPYRSR00472F3 � CLI: Add "ob summary " command
Relevance: Demonstrates how CLI triggers ingestion/summarization and where Playwright-extracted content would be consumed.
- OB-0MNK32JBQ008T8ND � Add CI workflow to run benchmark and publish results
Relevance: CI/record-playback strategy and notes about mocking heavy dependencies for public CI.
Relevant files (starting points)
- src/cli/commands/add.ts � CLI entrypoint for URL ingestion
- src/lib/ingestion/service.ts � ingestion pipeline (extraction -> summarization -> persist)
- src/lib/ingestion/extractor.ts � extractor interface and plugin points
- src/lib/ingestion/youtube.ts � provider-specific pattern (YouTube handler)
- tests/acceptance/ingest-e2e.test.ts � acceptance harness to validate retrieval -> summarization -> DB
Appendix: Clarifying questions & answers
- Q: "Should SourceBase implement the Playwright fallback code or produce a feature request document?" � Answer (work item OB-0MNHT5HTC0070EL7): "Produce a feature request document in this repo; the retrieval fallback implementation belongs in OpenBrain." Source: work item description. Final: yes.
- Q: "Does the feature request document already exist at docs/feature-requests/openbrain-playwright-fallback-retrieval.md?" � Answer (agent search): No. Evidence: attempt to read file returned File not found; repository search returned no matches for that path.
- Q: "What related work items and files should be referenced in the doc?" � Answer (agent inference using
wl search and repo grep): See Related work and Relevant files sections above. Evidence: wl search results and repo file matches (src/cli/commands/add.ts, src/lib/ingestion/service.ts, tests/acceptance/ingest-e2e.test.ts). Final: included.
- Q: "Are Playwright dependencies already present in the repo tooling?" � Answer (agent search): Yes; dev dependencies include @vitest/browser-playwright (package-lock.json) and tests reference Playwright in test harness files. Evidence: package-lock and test files.
Intake Draft � OpenBrain Playwright fallback retrieval (OB-0MNHT5HTC0070EL7)
Headline summary
Produce a concise feature request document (docs/feature-requests/openbrain-playwright-fallback-retrieval.md) that specifies requirements, acceptance criteria, CI/test strategies, and telemetry for adding a Playwright-based retrieval fallback in OpenBrain. SourceBase will author and hand this document to PM/Engineering; implementation belongs in the OpenBrain repo.
Problem statement
Some web pages render their primary content with client-side JavaScript. The current OpenBrain retrieval path (fast HTML extraction) can fail to return usable content for these pages, causing ingestion to fail. A documented fallback that uses a headless browser (Playwright) is needed so OpenBrain can ingest JavaScript-heavy pages reliably when the primary extractor is insufficient.
Users
ob addin varied environments (example: as an automation author I want a configurable fallback so I can control resource and CI behaviour).Success criteria
intake_complete.Constraints
Existing state
ob add(entrypoints: src/cli/commands/add.ts and src/lib/ingestion/service.ts).Desired change
intake_complete.Related work
Relevance: Primary ingestion flow (src/cli/commands/add.ts -> src/lib/ingestion/service.ts). Playwright output should be compatible with this ingestion pipeline.
Relevance: Example of a provider-specific fallback and associated tests/diagnostics.
Relevance: Product-level guidance: local-first preferences and documented fallback policies.
Relevance: Demonstrates how CLI triggers ingestion/summarization and where Playwright-extracted content would be consumed.
Relevance: CI/record-playback strategy and notes about mocking heavy dependencies for public CI.
Relevant files (starting points)
Appendix: Clarifying questions & answers
wl searchand repo grep): See Related work and Relevant files sections above. Evidence:wl searchresults and repo file matches (src/cli/commands/add.ts, src/lib/ingestion/service.ts, tests/acceptance/ingest-e2e.test.ts). Final: included.