feat: Add CRW document loader and scrape tool nodes by us · Pull Request #6066 · FlowiseAI/Flowise

us · 2026-03-26T12:18:22Z

Summary

Add CRW document loader node with scrape, crawl, and map modes for web content extraction
Add CRW Scrape tool node for agent-based web scraping via DynamicStructuredTool
Add CRW API credential supporting both self-hosted instances and fastcrw.com cloud

Details

Document Loader (`CRW`)

Scrape: Extract content from a single URL with JS rendering, CSS/XPath selectors, stealth mode, proxy support
Crawl: BFS crawl of a site with configurable max depth and page limits
Map: Discover all URLs on a site via link following and optional sitemap parsing
Outputs as Document array or concatenated text, with optional text splitter support

Tool (`CRW Scrape`)

Exposes CRW scrape as a tool for LLM agents
Configurable tool name/description, output format, JS rendering, and main content extraction

Credential (`CRW API`)

API key (optional for self-hosted, required for cloud)
Configurable API URL (defaults to http://localhost:3000)

Test plan

Verify document loader node appears in Flowise UI under "Document Loaders"
Verify scrape tool node appears under "Tools"
Test scrape mode with a sample URL
Test crawl mode with a small site
Test map mode for URL discovery
Test CRW Scrape tool within an agent flow
Verify credential creation and connection

Add CRW integration with: - Document loader node supporting scrape, crawl, and map modes - Scrape tool node for agent-based web scraping - CRW API credential with support for self-hosted and cloud instances

gemini-code-assist · 2026-03-26T12:18:40Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances web content extraction capabilities by integrating CRW, an AI-native web scraper. It introduces a versatile document loader that can scrape, crawl, or map websites, and a dedicated tool node for agents to leverage CRW's scraping features. A new credential type facilitates secure and flexible connections to CRW services, making it easier to incorporate dynamic web data into flows.

Highlights

CRW Document Loader Node: Introduced a new document loader node for CRW, supporting 'scrape' for single URLs, 'crawl' for multi-page BFS traversal, and 'map' for URL discovery via link following and sitemap parsing. It includes options for JS rendering, CSS/XPath selectors, stealth mode, proxy support, and text splitting.
CRW Scrape Tool Node: Added a CRW Scrape tool node, enabling LLM agents to perform web scraping. This tool is configurable with a custom name, description, output format, JS rendering options, and main content extraction.
CRW API Credential: Implemented a new credential type for CRW API, allowing connection to both self-hosted CRW instances and the fastcrw.com cloud service. It supports optional API keys and configurable API URLs.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces integration with the CRW web scraping service, adding a new credential type, a document loader for scraping, crawling, and mapping websites, and a dedicated tool for CRW scrape functionality. Feedback includes making the 5-minute crawl timeout configurable for flexibility and removing an unused stealth parameter from the scrape method signature in the CRWScrape tool.

gemini-code-assist · 2026-03-26T12:20:46Z

+
+        // Poll until completed or failed
+        const jobId = startData.id
+        const maxWaitMs = 5 * 60 * 1000 // 5 minutes


A hardcoded 5-minute timeout for crawling can be problematic. It might be too long for some use cases, potentially causing gateway timeouts in a web server environment, or too short for very large sites. Consider making this timeout configurable as a node input to provide more flexibility.

gemini-code-assist · 2026-03-26T12:20:46Z

+
+    async scrape(
+        url: string,
+        params: { onlyMainContent?: boolean; renderJs?: string; formats?: string[]; stealth?: boolean }


The stealth parameter is defined in the scrape method's signature but is never used, as there is no corresponding input in the tool's configuration. It can be removed to simplify the code and avoid confusion.

Suggested change

params: { onlyMainContent?: boolean; renderJs?: string; formats?: string[]; stealth?: boolean }

params: { onlyMainContent?: boolean; renderJs?: string; formats?: string[] }

Copilot

Pull request overview

Adds CRW integrations to Flowise by introducing a new CRW document loader (scrape/crawl/map) and a CRW Scrape tool node, along with a credential for configuring CRW API access (self-hosted or fastcrw.com).

Changes:

Added CRW document loader node supporting scrape, crawl, and map modes with configurable extraction options.
Added CRW Scrape tool node exposing CRW scraping as a DynamicStructuredTool for agent use.
Added CRW API credential (API key + base URL) and node icons.

Reviewed changes

Copilot reviewed 3 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
packages/components/nodes/tools/CRWScrape/crw.svg	Adds CRW icon for the tool node.
packages/components/nodes/tools/CRWScrape/CRWScrape.ts	Adds CRW Scrape tool node implementation and minimal CRW HTTP client.
packages/components/nodes/documentloaders/CRW/crw.svg	Adds CRW icon for the document loader node.
packages/components/nodes/documentloaders/CRW/CRW.ts	Adds CRW document loader node (scrape/crawl/map) with optional text splitting and metadata merge.
packages/components/credentials/CRWApi.credential.ts	Adds CRW API credential definition (key optional + configurable API base URL).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T12:24:31Z

+        if (output === 'text') {
+            return docs.map((doc) => doc.pageContent).join('\n\n')
+        }


When returning the text output, this loader returns the raw concatenated string without applying handleEscapeCharacters. Most other document loaders wrap their final text output with handleEscapeCharacters(..., false) to keep escaping consistent across nodes and avoid downstream parsing/display issues with special characters.

Copilot · 2026-03-26T12:24:32Z

+                onlyMainContent: onlyMainContent ?? true
+            }
+            if (renderJs && renderJs !== 'auto') params.renderJs = renderJs
+            if (waitFor) params.waitFor = waitFor


waitFor is only forwarded when it is truthy (if (waitFor)), so a valid value of 0 ms cannot be passed through to the CRW API. Use an explicit undefined/null check instead (e.g., waitFor !== undefined) so zero is respected.

Suggested change

if (waitFor) params.waitFor = waitFor

if (waitFor !== undefined && waitFor !== null) params.waitFor = waitFor

- Make crawl timeout configurable via node input parameter (default 5min) - Remove unused stealth param from CRWScrape tool's scrape method signature - Apply handleEscapeCharacters to text output for consistency with other loaders - Use explicit null/undefined check for waitFor to allow waitFor=0

feat: Add CRW document loader and tool nodes

89b62b4

Add CRW integration with: - Document loader node supporting scrape, crawl, and map modes - Scrape tool node for agent-based web scraping - CRW API credential with support for self-hosted and cloud instances

Copilot AI review requested due to automatic review settings March 26, 2026 12:18

Copilot started reviewing on behalf of us March 26, 2026 12:18 View session

gemini-code-assist Bot reviewed Mar 26, 2026

View reviewed changes

Copilot AI reviewed Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add CRW document loader and scrape tool nodes#6066

feat: Add CRW document loader and scrape tool nodes#6066
us wants to merge 2 commits into
FlowiseAI:mainfrom
us:feat/add-crw-node

us commented Mar 26, 2026

Uh oh!

gemini-code-assist Bot commented Mar 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 26, 2026

Uh oh!

gemini-code-assist Bot Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	params: { onlyMainContent?: boolean; renderJs?: string; formats?: string[]; stealth?: boolean }
	params: { onlyMainContent?: boolean; renderJs?: string; formats?: string[] }

	if (waitFor) params.waitFor = waitFor
	if (waitFor !== undefined && waitFor !== null) params.waitFor = waitFor

Uh oh!

Conversation

us commented Mar 26, 2026

Summary

Details

Document Loader (CRW)

Tool (CRW Scrape)

Credential (CRW API)

Test plan

Uh oh!

gemini-code-assist Bot commented Mar 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Document Loader (`CRW`)

Tool (`CRW Scrape`)

Credential (`CRW API`)