- Autonomous Agent Node: Added the
OllamaAgentnode that can perform iterative reasoning (ReAct) and call tools until it finds the answer. - Web Search Tool: New node to allow models to search the internet (DuckDuckGo or Ollama API).
- Thinking Support: Added a 'Think' toggle for models like Qwen 3.5, DeepSeek-R1, and GPT-OSS.
- UI Enhancements: Added password-style masking for API keys and improved model selection lists.
This extension for ComfyUI enables the use of Ollama LLM models, such as Qwen 3.5, DeepSeek-R1, Llama 3.1/3.2, and Mistral.
- Autonomous Agent π€: An intelligent agent that can use tools, think, and search the web to answer complex queries.
- Web Search Tool π: Connect your Agent to the internet via DuckDuckGo (free) or the Ollama Search API.
- Support for 'Thinking' Models π§ : Full support for reasoning chains in models like Qwen 3.5 and DeepSeek.
- Ollama Image Describer πΌοΈ: Generate structured descriptions of images.
- Ollama Text Describer π: Extract meaningful insights from text.
- Ollama Image Captioner π·: Create automatic captions for images.
- Text Transformer π: Prepend, append, or modify text dynamically.
- JSON Property Extractor π: Extract specific values from structured outputs.
Follow the official Ollama installation guide.
The easiest way to install this extension is through ComfyUI Manager:
- Open ComfyUI Manager.
- Search for ComfyUI-Ollama-Describer.
- Click Install and restart ComfyUI.
git clone https://github.com/alisson-anjos/ComfyUI-Ollama-Describer.gitPath should be custom_nodes\ComfyUI-Ollama-Describer.
Run install.bat
pip install -r requirements.txtThe Ollama Agent is an autonomous node that can use connected tools to answer questions. It doesn't just generate text; it enters a reasoning loop (ReAct) where it can call tools, analyze results, and "think" before giving a final answer.
model: Select models optimized for tool calling (e.g., Llama 3.1, Qwen 3.5).tools: ConnectOLLAMA_TOOLnodes (like Web Search).think: Enable reasoning chains for compatible models (Qwen 3.5, DeepSeek-R1).system_context: Default instructions that force the model to use tools for real-time data.max_tokens: Limit the response length (default 2048).
Similar to the Image Describer, but optimized for processing video frames or sequences. It allows for detailed temporal analysis using vision-enabled models.
num_ctx: Context window size (default 4096, increase for longer descriptions).max_tokens: Maximum length of the video description.keep_model_alive: Manage VRAM by deciding how long to keep the model loaded.
tool_name: Custom name for the tool (e.g., "google_search"). This is how the Agent will refer to it in its thinking process.- DuckDuckGo (free): No setup needed, search the web for free.
- Ollama API: Highly accurate search results, requires a free API key from ollama.com.
- Max Results: Control how many snippets are fed back to the Agent.
- Extracts structured descriptions from images using vision-enabled LLMs.
- Useful for analyzing images and generating detailed captions, including objects, actions, and surroundings.
model: Select LLaVa models (7B, 13B, etc.).custom_model: Specify a custom model from Ollama's library.api_host: Define the API address (e.g.,http://localhost:11434).timeout: Max response time before canceling the request.temperature: Controls randomness (0 = factual, 1 = creative).top_k,top_p,repeat_penalty: Fine-tune text generation.max_tokens: Maximum response length in tokens.seed_number: Set seed for reproducibility (-1 for random).keep_model_alive: Defines how long the model stays loaded after execution.prompt: The main instruction for the model.system_context: Provide additional context for better responses.structured_output_format: Accepts either a Python dictionary or a valid JSON string to define the expected response structure.
- Used to extract specific values from structured JSON outputs returned by Ollama Image Describer or Ollama Text Describer.
- Works by selecting a key (or path) inside a JSON structure and outputting only the requested data.
- Useful for filtering, extracting key insights, or formatting responses for further processing.
- Compatible with
structured_output_format, which allows defining structured outputs via a Python dictionary or a valid JSON string.
- Processes text inputs to generate structured descriptions or summaries.
- Ideal for refining text-based outputs and enhancing context understanding.
- Automatically generates concise and relevant captions for images.
- Processes images from a specified folder, iterates through each file, and generates
.txtcaption files saved in the output directory. - Useful for bulk image captioning, dataset preparation, and AI-assisted annotation.
- Useful for image-to-text applications, content tagging, and accessibility.
-
Works in conjunction with Ollama Image Captioner to provide additional customization for captions.
-
Allows fine-tuning of captions by enabling or disabling specific details like lighting, camera angle, composition, and aesthetic quality.
-
Useful for controlling caption verbosity, accuracy, and inclusion of metadata like camera settings or image quality.
-
Helps tailor the output for different applications such as dataset labeling, content creation, and accessibility enhancements.
-
Provides additional customization settings for generated captions.
-
Helps refine style, verbosity, and accuracy based on user preferences.
- Allows users to modify, append, prepend, or replace text dynamically.
- Useful for formatting, restructuring, and enhancing text-based outputs.
| Suffix | Meaning |
|---|---|
| Q | Quantized model (smaller, faster) |
| 4, 8, etc. | Number of bits used (lower = smaller & faster) |
| K | K-means quantization (more efficient) |
| M | Medium-sized model |
| F16 / F32 | Floating-point precision (higher = more accurate) |
More details on quantization: Medium Article.
- Measures how well a model predicts text.
- Lower perplexity = better predictions.







