docs: add Crawl4AI guide by vdusek · Pull Request #942 · apify/apify-sdk-python

vdusek · 2026-06-05T14:50:37Z

Adds a guide for using the Crawl4AI LLM-friendly web crawler in Apify Actors, following the structure of the existing scraping-library guides (BeautifulSoup, Playwright, Scrapy).

What's included

docs/03_guides/10_crawl4ai.mdx — the guide: introduction & features, a runnable example Actor that crawls pages into LLM-ready markdown, Apify Proxy integration, and the Dockerfile / base-image setup for running the browser on the platform.
docs/03_guides/code/crawl4ai_project/ — the example Actor: a recursive markdown scraper using Crawl4AI's AsyncWebCrawler through Apify Proxy, plus its Dockerfile.
Link to the new guide added to the quick-start "Guides" list.
pyproject.toml — a ruff I001 per-file-ignore for the new code project (matches the existing Scrapy ignore).

Notes

Built on the Python 3.13 Playwright base image: Crawl4AI pins lxml < 6, which has no cp314 wheels yet, so the 3.14 image would force a slow source build. Crawl4AI reuses the base image's pre-installed browser, so no extra browser-install step is needed in the Dockerfile.

Verification

The example Actor was run locally (apify run) and on the Apify platform (build + run SUCCEEDED), producing dataset items with url / title / markdown.
Apify Proxy confirmed: requests egress through an Apify Proxy IP, distinct from the direct IP.
ruff format --check, ruff check, and ty check pass.

Numbered 10 to sit after the in-progress uv (#932 → 08) and Scrapling (#938 → 09) guides.

Part of #836.

codecov · 2026-06-05T14:51:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.98%. Comparing base (3f25d4a) to head (a62a06b).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #942      +/-   ##
==========================================
+ Coverage   86.87%   86.98%   +0.10%     
==========================================
  Files          48       48              
  Lines        2942     2942              
==========================================
+ Hits         2556     2559       +3     
+ Misses        386      383       -3

Flag	Coverage Δ
e2e	`37.76% <ø> (ø)`
integration	`59.14% <ø> (+0.03%)`	⬆️
unit	`75.69% <ø> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

docs: add Crawl4AI guide

a62a06b

vdusek added the t-tooling Issues with this label are in the ownership of the tooling team. label Jun 5, 2026

vdusek self-assigned this Jun 5, 2026

github-actions Bot added this to the 142nd sprint - Tooling team milestone Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add Crawl4AI guide#942

docs: add Crawl4AI guide#942
vdusek wants to merge 1 commit into
masterfrom
docs/crawl4ai-guide

vdusek commented Jun 5, 2026

Uh oh!

codecov Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vdusek commented Jun 5, 2026

What's included

Notes

Verification

Uh oh!

codecov Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 5, 2026 •

edited

Loading