apify · vdusek · Jun 5, 2026 · Jun 5, 2026 · Jun 5, 2026
@@ -106,3 +106,4 @@ To see how you can integrate the Apify SDK with popular web scraping libraries,
 - [Crawlee](../guides/crawlee)
 - [Scrapy](../guides/scrapy)
 - [Running webserver](../guides/running-webserver)
+- [Validate Actor input with Pydantic](../guides/input-validation)
@@ -20,6 +20,10 @@ For example, if an Actor received a JSON input with two fields, `{ "firstNumber"
     {InputExample}
 </RunnableCodeBlock>
 
+## Validating input
+
+Reading values straight out of the raw input dictionary works for simple cases, but it gives you no type guarantees, no constraint checks, and no clear error when the input is malformed. For anything beyond a couple of fields, validate the input with [Pydantic](https://docs.pydantic.dev/) so your code works with a typed, guaranteed-valid object instead. See the [Validate Actor input with Pydantic](../guides/input-validation) guide for the recommended approach.
+
 ## Loading URLs from Actor input
 
 Actors commonly receive a list of URLs to process via their input. The <ApiLink to="class/ApifyRequestList">`ApifyRequestList`</ApiLink> class (from `apify.request_loaders`) can parse the standard Apify input format for URL sources. It supports both direct URL objects (`{"url": "https://example.com"}`) and remote URL lists (`{"requestsFromUrl": "https://example.com/urls.txt"}`), where the remote file contains one URL per line.

@@ -0,0 +1,107 @@
+---
+id: input-validation
+title: Input validation with Pydantic
+description: Parse, validate, and type your Actor's input with Pydantic models instead of reaching into a raw dictionary.
+---
+
+import CodeBlock from '@theme/CodeBlock';
+import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
+import ApiLink from '@theme/ApiLink';
+
+import RawInputExample from '!!raw-loader!roa-loader!./code/11_raw_input.py';
+import PydanticExample from '!!raw-loader!roa-loader!./code/11_pydantic.py';
+import HttpUrlExample from '!!raw-loader!./code/11_http_url.py';
+import ModelValidatorExample from '!!raw-loader!./code/11_model_validator.py';
+
+In this guide, you'll learn how to validate your Apify Actor's input with [Pydantic](https://docs.pydantic.dev/), so that your code works with a typed, guaranteed-valid object instead of a raw dictionary.
+
+## Introduction
+
+An Actor reads its input with <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink>, which returns the input record as a plain `dict` (or `None` when there's no input). Working with that dictionary directly is fragile:
+
+<RunnableCodeBlock className="language-python" language="python">
+    {RawInputExample}
+</RunnableCodeBlock>
+
+- There are no type guarantees - `max_results` could just as easily arrive as the string `"10"` or `None`, and you'd only find out when something blows up later.
+- There's no validation - nothing stops `max_results` from being `0` or `-5`, or `search_terms` from being empty.
+- A typo in a key (`maxResult` instead of `maxResults`) silently falls back to the default instead of failing.
+- Defaults are scattered across the codebase, and your editor can't autocomplete the fields or catch mistakes.
+
+[Pydantic](https://docs.pydantic.dev/) solves all of this. You declare the shape of your input once as a model, and Pydantic parses the raw dictionary into a typed object, applying defaults, enforcing constraints, and producing clear error messages when the input doesn't match. Pydantic is already a dependency of the Apify SDK, so there's nothing extra to install.
+
+## Example Actor
+
+The following Actor declares its input as a Pydantic `BaseModel`, validates the raw input against it, and then works with a fully typed object. On invalid input it fails fast with a readable error; on valid input it logs the normalized values and stores them as the Actor's output.
+
+<RunnableCodeBlock className="language-python" language="python">
+    {PydanticExample}
+</RunnableCodeBlock>
+
+A few things worth pointing out about the **model**:
+
+- **Aliases bridge the naming conventions.** Apify input fields are conventionally `camelCase` (`maxResults`), while Python attributes are `snake_case` (`max_results`). `Field(alias='maxResults')` maps one to the other, and `populate_by_name=True` lets the model accept either spelling - handy in tests.
+- **Defaults and `required` fields are explicit.** A field without a default (`search_terms`) is required; one with a default (`max_results`) is optional. There's a single, obvious place where every default lives.
+- **Constraints are declarative.** `ge=1, le=100` enforces a numeric range, `min_length=1` rejects an empty list, and `Literal['json', 'csv']` restricts a field to a fixed set of choices - mirroring an `enum` in the input schema.
+- **Custom validators handle the rest.** The `field_validator` normalizes the search terms (trimming whitespace, dropping empties) and rejects input that has nothing left, so the rest of your code never has to repeat those checks.
+- **Unknown fields are ignored.** `extra='ignore'` means adding a new field to your input schema won't break an older Actor build that doesn't know about it yet. Use `extra='forbid'` instead if you'd rather reject anything unexpected.
+
+And about the **validation** itself:
+
+- `model_validate` parses the raw dictionary into a typed `ActorInput` instance, filling in defaults and guaranteeing every field is valid - or raising a `ValidationError` describing every problem at once.
+- Catching that error, logging a readable summary, and re-raising makes the Actor **fail fast** with a clear explanation right at the start, rather than crashing with an obscure error somewhere deep in the run. Because the body runs inside `async with Actor:`, the re-raised exception automatically marks the run as `FAILED`.
+- The error messages refer to the fields by their input-schema aliases. For invalid input like `{"searchTerms": [], "maxResults": 999, "outputFormat": "xml"}`, the log shows exactly what's wrong:
+
+  ```text
+  The Actor input is invalid:
+  3 validation errors for ActorInput
+  searchTerms
+    List should have at least 1 item after validation, not 0 ...
+  maxResults
+    Input should be less than or equal to 100 ...
+  outputFormat
+    Input should be 'json' or 'csv' ...
+  ```
+
+Once validation passes, the rest of `main` works with `actor_input.search_terms`, `actor_input.max_results`, and `actor_input.output_format` - all correctly typed, with editor autocompletion and static type checking.
+
+## Relationship to the input schema
+
+Pydantic validation **complements** the Actor's [input schema](https://docs.apify.com/platform/actors/development/input-schema) (`.actor/input_schema.json`) - it doesn't replace it. The two serve different layers:
+
+- The **input schema** drives the Apify Console form, documents the fields for your users, and lets the platform validate input before the run even starts. Keep declaring your fields there.
+- The **Pydantic model** validates the input again *inside your Python code*, where it gives you a typed object, IDE support, and richer rules (normalization, cross-field checks, custom formats) that the input schema can't express. It's also your safety net for runs started programmatically by [another Actor](../concepts/interacting-with-other-actors) or executed [locally](https://docs.apify.com/cli/docs/reference#apify-run), and for keeping the two definitions honest with each other.
+
+Keep the model's aliases in sync with the field keys in `input_schema.json`, and the two definitions describe the same input from both sides.
+
+## Useful validation features
+
+Pydantic offers much more than the example uses. A few features that come up often when validating Actor input:
+
+**Format-validated types** for common string formats, for example `HttpUrl` for URLs or `EmailStr` for e-mail addresses (the latter needs the `pydantic[email]` extra):
+
+<CodeBlock className="language-python">
+    {HttpUrlExample}
+</CodeBlock>
+
+**Cross-field validation** with `model_validator`, when one field's validity depends on another:
+
+<CodeBlock className="language-python">
+    {ModelValidatorExample}
+</CodeBlock>
+
+**Secret input fields.** The platform decrypts [secret input fields](https://docs.apify.com/platform/actors/development/secret-input) for you before <ApiLink to="class/Actor#get_input">`Actor.get_input`</ApiLink> returns, so you receive plaintext. Wrap such fields in Pydantic's `SecretStr` to keep them from leaking into logs or `model_dump()` output.
+
+For the full set of types, constraints, and validators, see the [Pydantic documentation](https://docs.pydantic.dev/latest/concepts/models/).
+
+## Conclusion
+
+In this guide, you learned how to validate Actor input with Pydantic: declaring the input as a model with aliases, defaults, and constraints; parsing the raw input with `model_validate`; failing fast with a readable error when the input is invalid; and working with a typed object for the rest of the run. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own Actors. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy validating!
+
+## Additional resources
+
+- [Pydantic: Official documentation](https://docs.pydantic.dev/)
+- [Pydantic: Models](https://docs.pydantic.dev/latest/concepts/models/)
+- [Pydantic: Validators](https://docs.pydantic.dev/latest/concepts/validators/)
+- [Apify: Actor input](https://docs.apify.com/platform/actors/running/input)
+- [Apify: Input schema specification](https://docs.apify.com/platform/actors/development/input-schema)
diff --git a/docs/03_guides/code/11_http_url.py b/docs/03_guides/code/11_http_url.py
@@ -0,0 +1,5 @@
+from pydantic import BaseModel, HttpUrl
+
+
+class ActorInput(BaseModel):
+    target_url: HttpUrl
diff --git a/docs/03_guides/code/11_model_validator.py b/docs/03_guides/code/11_model_validator.py
@@ -0,0 +1,14 @@
+from typing import Self
+
+from pydantic import BaseModel, model_validator
+
+
+class ActorInput(BaseModel):
+    min_price: int = 0
+    max_price: int = 100
+
+    @model_validator(mode='after')
+    def _check_range(self) -> Self:
+        if self.min_price > self.max_price:
+            raise ValueError('min_price must not exceed max_price')
+        return self
diff --git a/docs/03_guides/code/11_pydantic.py b/docs/03_guides/code/11_pydantic.py
@@ -0,0 +1,59 @@
+import asyncio
+from typing import Literal
+
+from pydantic import BaseModel, ConfigDict, Field, ValidationError, field_validator
+
+from apify import Actor
+
+
+class ActorInput(BaseModel):
+    """Typed and validated representation of the Actor input."""
+
+    # Accept both snake_case and the input schema's camelCase; ignore extras.
+    model_config = ConfigDict(populate_by_name=True, extra='ignore')
+
+    # Required: non-empty list of search terms (normalized below).
+    search_terms: list[str] = Field(alias='searchTerms', min_length=1)
+
+    # Optional: 1-100, defaults to 10.
+    max_results: int = Field(alias='maxResults', default=10, ge=1, le=100)
+
+    # Optional: restricted to a fixed set of choices.
+    output_format: Literal['json', 'csv'] = Field(alias='outputFormat', default='json')
+
+    @field_validator('search_terms')
+    @classmethod
+    def _normalize_terms(cls, value: list[str]) -> list[str]:
+        # Trim whitespace and drop empty terms.
+        cleaned = [term.strip() for term in value if term.strip()]
+        if not cleaned:
+            raise ValueError('searchTerms must contain at least one non-empty term')
+        return cleaned
+
+
+async def main() -> None:
+    async with Actor:
+        # Read the raw input (a plain dict, not yet validated).
+        raw_input = await Actor.get_input() or {}
+
+        # Validate the raw input against the model.
+        try:
+            actor_input = ActorInput.model_validate(raw_input)
+        except ValidationError as exc:
+            # Log a per-field summary, then re-raise to fail the run.
+            Actor.log.error('The Actor input is invalid:\n%s', exc)
+            raise
+
+        # Work with typed attributes from here on.
+        Actor.log.info('Input passed validation: %s', actor_input.model_dump())
+
+        max_results = actor_input.max_results
+        for term in actor_input.search_terms:
+            Actor.log.info('Processing %r (max %d results)', term, max_results)
+
+        # Store the normalized input as output.
+        await Actor.set_value('OUTPUT', actor_input.model_dump())
+
+
+if __name__ == '__main__':
+    asyncio.run(main())
diff --git a/docs/03_guides/code/11_raw_input.py b/docs/03_guides/code/11_raw_input.py
@@ -0,0 +1,18 @@
+import asyncio
+
+from apify import Actor
+
+
+async def main() -> None:
+    # Enter the context of the Actor.
+    async with Actor:
+        # Read the input and reach into the raw dict.
+        actor_input = await Actor.get_input() or {}
+        search_terms = actor_input.get('searchTerms', [])
+        max_results = actor_input.get('maxResults', 10)
+
+        Actor.log.info('search_terms=%s, max_results=%s', search_terms, max_results)
+
+
+if __name__ == '__main__':
+    asyncio.run(main())