Skip to content

perceptron-ai-inc/fiftyone-isaac-0_2

Repository files navigation

Isaac-0.2 FiftyOne Model Zoo Integration

A FiftyOne Model Zoo integration for Isaac-0.2 by Perceptron AI - hybrid-reasoning vision-language models designed for real-world visual understanding tasks.

image

Overview

Isaac-0.2 extends the efficient frontier of perception — small models that outperform systems 10× larger on visual reasoning and perception tasks, all running on commodity GPUs or edge devices. From robotics to media search to industrial inspection, Isaac 0.2 delivers high-accuracy perception without the heavy compute footprint.

Read the full announcement.

Available Models:

  • Isaac-0.2-2B-Preview - 2B parameter hybrid-reasoning model for maximum accuracy
  • Isaac-0.2-1B - 1B parameter model for faster inference and edge deployment

What's New in Isaac 0.2

  • Reasoning via Thinking Traces: Short, structured reasoning traces improve multi-step decisions, small-object understanding, and ambiguous spatial tasks
  • Tool Calling + Focus (Zoom & Crop): Isaac 0.2 can trigger tool calls to focus (zoom + crop) and re-query on smaller regions — improving fine-grained perception
  • Structured Outputs: More reliable structured output generation for consistent JSON and predictable downstream integration
  • Complex OCR: Improved text recognition across cluttered, low-resolution, or distorted regions — enabling accurate extraction from documents, diagrams, labels, screens, and dense real-world scenes
  • Desktop Use: Better performance on everyday desktop and mobile workflows such as UI understanding and navigation, making Isaac faster and more capable for agentic use cases

Features

  • Object Detection: Detect and localize objects with bounding boxes
  • Keypoint Detection: Identify key points in images with spatial awareness
  • Complex OCR: Extract and detect text from documents, diagrams, labels, screens, and cluttered scenes
  • Classification: Classify images into categories with reliable JSON output
  • Visual Question Answering: Answer questions about image content
  • Segmentation: Generate polygon masks for instance segmentation
  • Desktop/UI Understanding: Navigate and understand desktop and mobile interfaces

Installation

Prerequisites

pip install fiftyone
pip install perceptron
pip install transformers
pip install accelerate
pip install torch torchvision
pip install huggingface-hub

Register the Model Zoo

import fiftyone.zoo as foz

# Register this model zoo source
foz.register_zoo_model_source(
    "https://github.com/perceptron-ai-inc/fiftyone-isaac-0_2",
    overwrite=True
)

Usage

Loading the Model

import fiftyone.zoo as foz

# Load the Isaac-0.2 2B model
model = foz.load_zoo_model("PerceptronAI/Isaac-0.2-2B-Preview")

# Or load the 1B model for faster inference
model = foz.load_zoo_model("PerceptronAI/Isaac-0.2-1B")

Object Detection

import fiftyone as fo
import fiftyone.zoo as foz

# Load a dataset
dataset = foz.load_zoo_dataset("quickstart", max_samples=10)

# Load model and set operation
model = foz.load_zoo_model("PerceptronAI/Isaac-0.2-2B-Preview")
model.operation = "detect"

# Set detection prompt
model.prompt = "Animals, Humans, Vehicles, Objects"

# Apply model to dataset
dataset.apply_model(model, label_field="detections")

Visual Question Answering (VQA)

model.operation = "vqa"
model.prompt = "Describe the spatial relationships between objects in this scene"

dataset.apply_model(model, label_field="vqa_response")

OCR Text Detection

model.operation = "ocr_detection"
model.prompt = "Detect all text in this image"

dataset.apply_model(model, label_field="text_detections")

OCR Text Extraction

model.operation = "ocr"
model.prompt = "Extract all text from this image"

dataset.apply_model(model, label_field="extracted_text")

Keypoint Detection

model.operation = "point"
model.prompt = "Identify key features: eyes, nose, corners"

dataset.apply_model(model, label_field="keypoints")

Classification

model.operation = "classify"
model.prompt = "Classify the weather: sunny, rainy, snowy, cloudy, indoor"

dataset.apply_model(model, label_field="weather")

Segmentation (Polygons)

model.operation = "segment"
model.prompt = "Draw polygons around the following objects: person, car, animal"

dataset.apply_model(model, label_field="polygons")

Advanced Usage

Thinking Mode

Enable structured reasoning traces for improved accuracy on complex scenes. Thinking traces improve multi-step decisions, small-object understanding, and ambiguous spatial tasks:

model.operation = "detect"
model.enable_thinking = True

dataset.apply_model(model, label_field="detections_with_reasoning")

# Disable when done
model.enable_thinking = False

Focus Tool Call

Enable the Focus system (zoom + crop) for fine-grained perception. Isaac 0.2 can trigger tool calls to focus on smaller regions and re-query — improving detection of small objects and dense scenes. Only works with BOX operations (detect, ocr_detection):

model.operation = "detect"
model.enable_focus_tool_call = True

dataset.apply_model(model, label_field="focused_detections")

# Disable when done
model.enable_focus_tool_call = False

Combining Advanced Options

You can combine both options for maximum precision:

model.operation = "detect"
model.enable_thinking = True
model.enable_focus_tool_call = True

dataset.apply_model(model, label_field="enhanced_detections")

Using Sample-Level Prompts

You can use different prompts for each sample in your dataset:

# Apply model using the prompt field
model.operation = "detect"
dataset.apply_model(
    model,
    label_field="custom_detections",
    prompt_field="sample_prompt"
)

Custom System Prompts

You can customize the system prompt for specific use cases:

model.system_prompt = """
You are a specialized assistant for medical image analysis.
Focus on identifying anatomical structures and abnormalities.
"""

Model Operations

Operation Description Output Type Example Prompt
detect Object detection with bounding boxes fo.Detections "Cars, pedestrians, traffic signs"
point Keypoint detection fo.Keypoints "Eyes, nose, mouth corners"
classify Image classification fo.Classifications "Indoor or outdoor scene"
ocr Text extraction String "Extract all text from the image"
ocr_detection Text detection with boxes fo.Detections "Detect text regions"
ocr_polygon Text detection with polygons fo.Polylines "Detect text regions"
segment Instance segmentation fo.Polylines "Segment all objects"
vqa Visual question answering String "What is the main subject?"

Example Notebook

Open In Colab

See isaac_0_2_demo.ipynb for a complete interactive notebook demonstrating all operations. You can run it directly in Google Colab or download it for local execution.

Model Details

  • Parameters: 2B (Preview) / 1B
  • Architecture: Based on Qwen with custom vision encoder
  • Vision Resolution: Dynamic, up to 60 megapixels
  • Context Length: 16,384 tokens
  • Training: Perceptive-language pretraining on multimodal data

License

  • Code: Apache 2.0 License
  • Model Weights: Creative Commons Attribution-NonCommercial 4.0 International License

Resources

Citation

If you use Isaac-0.2 in your research or applications, please cite:

@software{isaac2025fiftyone,
  title = {Isaac-0.2 FiftyOne Model Zoo Integration},
  author = {Perceptron AI},
  year = {2025},
  url = {https://github.com/perceptron-ai-inc/fiftyone-isaac-0_2},
  note = {FiftyOne integration for Isaac-0.2 perceptive-language model}
}

@misc{perceptronai2025isaac,
  title = {Isaac-0.2: A Perceptive-Language Model},
  author = {{Perceptron AI}},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/PerceptronAI/Isaac-0.2-2B-Preview},
  note = {Open-source multimodal model for real-world visual understanding}
}

Contact

Acknowledgments

About

Integrating Isaac-0.2 model series by Perceptron as Remote Source Zoo Models in FiftyOne

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors