Skip to content

feat(perception): AprilTag 3D detector with solvePnP TF transforms#2092

Open
Null-Phnix wants to merge 8 commits into
dimensionalOS:mainfrom
Null-Phnix:feat/apriltag-3d-detector-2036
Open

feat(perception): AprilTag 3D detector with solvePnP TF transforms#2092
Null-Phnix wants to merge 8 commits into
dimensionalOS:mainfrom
Null-Phnix:feat/apriltag-3d-detector-2036

Conversation

@Null-Phnix
Copy link
Copy Markdown

Summary

Implements the AprilTag/ArUco fiducial marker 3D detector requested in #2036.
Consumes color_image + camera_info, detects markers via OpenCV ArUco, and
estimates 6-DOF pose with solvePnP using the full camera intrinsics and
distortion model. Publishes TF transforms so downstream modules (navigation,
manipulation, multi-robot identification) can use detected tags as landmarks.

What changed

dimos/perception/detection/detectors/apriltag.py

  • AprilTagDetector — reusable detector class.
    • Supports 20 predefined ArUco/AprilTag families (tag36h11, aruco_4x4_50, …).
    • Uses OpenCV cv2.aruco.ArucoDetector + solvePnP (SOLVEPNP_ITERATIVE).
    • Handles full CameraInfocameraMatrix + distCoeffs conversion.
    • Returns AprilTagDetection dataclass with pose, corners, and canonical
      marker/{family}_{tag_id} frame ID.

dimos/perception/detection/apriltag_detector.py

  • AprilTagDetectionModule — full DimOS Module with typed streams.
    • In[Image] / In[CameraInfo] inputs.
    • Publishes tf (Transform) for each detected tag.
    • Optional annotated_image (Out[Image]) with overlays.
    • Optional tags_json (Out[str]) with serialised pose + corners.
    • Uses existing sharpness_barrier + backpressure reactive pipeline.
    • Configurable: family, tag_size_m, max_freq.
    • Matches the deploy() pattern used by Detection2DModule.

dimos/perception/detection/detectors/test_apriltag.py

  • Pure synthetic tests — no external fixtures or hardware required.
  • Tests: detect known tag, empty image, multiple tags, wrong-family rejection.
  • Pose validation — depth accuracy within 10%, rotation near-identity for
    face-on tags via quaternion conversion.
  • Parametrised depth consistency (0.5–3.0 m).
  • Input validation tests (invalid family, negative tag size).

Architecture notes

  • Follows the existing detector pattern: detectors/base.py ABC, separate
    detector class from Module wrapper, In/Out typed streams via autoconnect.
  • TF publishing matches Detection2DModule.track() pattern: computes
    camera_optical → marker/... transforms directly from solvePnP tvec/R.
  • No additional runtime dependencies — dimos[apriltag] already pulls
    opencv-contrib-python.

How to test (replay / sim)

1. Synthetic unit tests

uv sync --extra apriltag
pytest dimos/perception/detection/detectors/test_apriltag.py -v

2. Deploy in a blueprint

from dimos.perception.detection.apriltag_detector import AprilTagDetectionModule, deploy

# In a blueprint — wire camera In/Out as usual
tag_mod = deploy(dimos, camera, family="tag36h11", tag_size_m=0.165)

3. Inspect TF output

dimos run unitree-go2-agentic  # with tag module added to blueprint
# TFs published on the TF tree as marker/tag36h11_{id}

Known limitations / future work

  • solvePnP pose is in camera_optical frame. Converting to world
    requires the caller to have the current base_link → camera_optical
    TF available (standard in all blueprints that have a camera).
  • Only supports single-family dictionaries per module instance. Multi-family
    is a natural next extension.
  • Marker corner ordering is the OpenCV convention; pose accuracy degrades at
    extreme angles (> 70°) or heavy motion blur.

Verification

  • py_compile passes on all new files.
  • No new runtime dependencies beyond existing dimos[apriltag].
  • CLA signed per repository policy.

Closes #2036

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 15, 2026

Greptile Summary

Adds AprilTagDetector and AprilTagDetectionModule for 6-DOF fiducial marker pose estimation via OpenCV ArUco + solvePnP, publishing TF transforms from the camera optical frame to each detected tag. Previously flagged issues (blocking get_next, shared DetectorParameters singleton, family_dict_cv NameError, discarded detector_params, cv2.polylines shape, and the depth-3.0 test failure) are all addressed in this revision.

  • detectors/apriltag.py: AprilTagDetector correctly creates per-instance DetectorParameters, resolves family lookup via _FAMILIES, and uses a standard Shepperd quaternion conversion; the solvePnP object-point convention is consistent with OpenCV ArUco corner ordering.
  • apriltag_detector.py: AprilTagDetectionModule stores the latest CameraInfo via a subscribe callback, avoiding the blocking get_next pattern, and self.tf auto-starts through PubSubTFConfig.autostart=True without requiring an explicit call.
  • all_blueprints.py: Registers april-tag-detection-module correctly, but also removes 8 blueprint entries and 11 module entries unrelated to AprilTag (source files still present), which appears to be an accidental carry-over from another branch.

Confidence Score: 4/5

The AprilTag detection code itself is ready to merge, but all_blueprints.py carries unrelated deletions of working module entries that should be separated out first.

The three new AprilTag files are clean and all previously flagged problems have been fixed. The only finding is in all_blueprints.py, where 19 entries for existing, still-present modules are deleted without explanation — any dimos run invocation or module-by-name lookup that used those keys (e.g. teleop-quest-go2, relocalization-module, keyboard-teleop-a750) will silently break.

dimos/robot/all_blueprints.py — the unrelated blueprint/module removals need to be verified as intentional or reverted before merging.

Important Files Changed

Filename Overview
dimos/perception/detection/detectors/apriltag.py Core detector: properly uses per-instance DetectorParameters, correct _FAMILIES lookup, valid solvePnP object-point convention, and a correct Shepperd quaternion conversion. Previously flagged issues (NameError on family_dict_cv, discarded detector_params, shared mutable singleton) are all resolved in this revision.
dimos/perception/detection/apriltag_detector.py Module wrapper: correctly caches latest CameraInfo in an instance attribute (the subscribe/setattr pattern avoids the blocking get_next issue), reshape for cv2.polylines is present, and self.tf auto-starts via PubSubTFConfig.autostart=True. No new issues found.
dimos/perception/detection/detectors/test_apriltag.py Test suite uses only depths [0.5, 1.0, 2.0] (3.0 m excluded per prior feedback), synthetic image generation looks correct, and the 15% tolerance is reasonable for ideal pinhole geometry. No new issues found.
dimos/robot/all_blueprints.py Adds april-tag-detection-module to all_modules correctly, but also silently removes 8 blueprint entries and 11 module entries unrelated to AprilTag while their source files still exist — likely an accidental carry-over from another branch.

Sequence Diagram

sequenceDiagram
    participant Camera
    participant AprilTagDetectionModule
    participant AprilTagDetector
    participant LCMTF

    Camera->>AprilTagDetectionModule: color_image (Image)
    Camera->>AprilTagDetectionModule: camera_info (CameraInfo)

    Note over AprilTagDetectionModule: camera_info cached via subscribe callback

    AprilTagDetectionModule->>AprilTagDetectionModule: sharpness_barrier + backpressure
    AprilTagDetectionModule->>AprilTagDetector: detect(image, camera_info)

    AprilTagDetector->>AprilTagDetector: ArucoDetector.detectMarkers(grey)
    AprilTagDetector->>AprilTagDetector: solvePnP → rvec, tvec
    AprilTagDetector->>AprilTagDetector: Rodrigues → R_mat → Quaternion
    AprilTagDetector-->>AprilTagDetectionModule: list[AprilTagDetection]

    AprilTagDetectionModule->>LCMTF: "tf.publish(Transform: camera_optical → marker/{family}_{id})"
    AprilTagDetectionModule-->>AprilTagDetectionModule: annotated_image.publish(Image)
    AprilTagDetectionModule-->>AprilTagDetectionModule: tags_json.publish(str)
Loading

Reviews (9): Last reviewed commit: "chore: remove dead code — unused module ..." | Re-trigger Greptile

Comment on lines +109 to +112
camera_info = self.camera_info.get_next()
if camera_info is None:
logger.debug("AprilTag module: no camera_info available yet")
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 get_next() blocks and never returns None

get_next() either returns a value or raises an Exception (after a 10-second timeout) — it never returns None. The if camera_info is None guard on line 110 is therefore dead code. In practice, every frame processed before the first camera_info arrives will block the reactive worker thread for 10 seconds, then the raised exception will propagate through the subscribe callback and silently kill the subscription. The correct pattern used elsewhere (e.g. ObjectTracking.start()) is to subscribe to camera_info as a separate stream, store the latest value in an instance attribute with a lock, and check if self._latest_camera_info is None instead.

Comment on lines +44 to +45
from dimos.msgs.geometry_msgs.Pose import Pose
from dimos.msgs.geometry_msgs.PoseStamped import PoseStamped
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unused import Pose

Pose is imported but never referenced in this file. Only PoseStamped is used.

Suggested change
from dimos.msgs.geometry_msgs.Pose import Pose
from dimos.msgs.geometry_msgs.PoseStamped import PoseStamped
from dimos.msgs.geometry_msgs.PoseStamped import PoseStamped


dict_id = _FAMILIES[family]
self._dictionary = cv2.aruco.getPredefinedDictionary(dict_id)
self._params = detector_params or _DEFAULT_DETECTOR_PARAMS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Shared mutable DetectorParameters singleton

_DEFAULT_DETECTOR_PARAMS is instantiated once at module load time and shared across every AprilTagDetector that does not supply a custom detector_params. cv2.aruco.DetectorParameters is a mutable object; if any caller mutates the shared instance, all detectors sharing it will be affected. Creating a fresh instance per detector avoids this class of subtle bug.

Suggested change
self._params = detector_params or _DEFAULT_DETECTOR_PARAMS
self._params = detector_params if detector_params is not None else cv2.aruco.DetectorParameters()

Comment on lines +162 to +163
corners = det.corners_image.astype(np.int32)
cv2.polylines(cv_image, [corners], True, (0, 255, 0), 2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 cv2.polylines corner array shape

cv2.polylines expects each contour to have shape (N, 1, 2). Passing a (4, 2) array works in some OpenCV versions but raises an error in others. Reshaping is the safe, portable approach.

Suggested change
corners = det.corners_image.astype(np.int32)
cv2.polylines(cv_image, [corners], True, (0, 255, 0), 2)
corners = det.corners_image.astype(np.int32).reshape(-1, 1, 2)
cv2.polylines(cv_image, [corners], True, (0, 255, 0), 2)

Comment on lines +173 to +174
dict_id = _FAMILIES[family]
self._dictionary = cv2.aruco.getPredefinedDictionary(family_dict_cv[family])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 family_dict_cv is never defined anywhere in this file — the correct mapping is _FAMILIES, which is declared at module scope. This NameError will crash every call to AprilTagDetector(...) before the object is constructed. Additionally, dict_id computed on the previous line is immediately orphaned.

Suggested change
dict_id = _FAMILIES[family]
self._dictionary = cv2.aruco.getPredefinedDictionary(family_dict_cv[family])
self._dictionary = cv2.aruco.getPredefinedDictionary(_FAMILIES[family])

Comment on lines +175 to +176
self._params: cv2.aruco.DetectorParameters = cv2.aruco.DetectorParameters()
self._refinement: cv2.aruco.RefineParameters = cv2.aruco.RefineParameters()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The detector_params argument is accepted in the public __init__ signature but is always discarded — a fresh cv2.aruco.DetectorParameters() is created unconditionally. Any caller who passes a custom DetectorParameters (e.g. to tune corner-refinement or thresholding) will silently get the default parameters instead.

Suggested change
self._params: cv2.aruco.DetectorParameters = cv2.aruco.DetectorParameters()
self._refinement: cv2.aruco.RefineParameters = cv2.aruco.RefineParameters()
self._params: cv2.aruco.DetectorParameters = (
detector_params if detector_params is not None else cv2.aruco.DetectorParameters()
)
self._refinement: cv2.aruco.RefineParameters = cv2.aruco.RefineParameters()

Comment on lines +201 to +214
@pytest.mark.parametrize("depth", [0.5, 1.0, 2.0, 3.0])
def test_depth_scale(self, detector_small: AprilTagDetector, camera_info: CameraInfo, depth: float) -> None:
"""Estimated Z should scale approximately linearly with true depth."""
img = _synthetic_tag_image(
tag_id=99,
tag_size_m=0.08,
depth_m=depth,
tag_shift_px=(0, 0),
)
results = detector_small.detect(img, camera_info)
assert len(results) == 1
z = results[0].pose.position.z
# Tolerate 15 % error due to synthetic / ideal nature.
assert abs(z - depth) < 0.15 * depth, f"depth {depth} got Z={z}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 test_depth_scale will fail at depth=3.0 m due to pixel-size clamping

In _synthetic_tag_image, apparent_size_px = int(500 * 0.08 / 3.0) = 13, which is less than 20 and gets clamped to 20. A 20 px marker with tag_size_m=0.08 causes solvePnP to recover a depth of fx * tag_size_m / pixel_width ≈ 500 * 0.08 / 20 = 2.0 m, but the assertion demands |z - 3.0| < 0.45 m. The 1.0 m error guarantees a failure at this parametrize value. The fix is to skip the depth-accuracy assertion (or the whole iteration) whenever clamping has occurred — the clamping only ensures the marker is detectable; it intentionally breaks the geometric depth correspondence.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
1750 2 1748 27
View the top 2 failed test(s) by shortest run time
dimos.perception.detection.detectors.test_apriltag.TestAprilTagDetectorPoseConsistency::test_depth_scale[3.0]
Stack Traces | 0.005s run time
self = <dimos.perception.detection.detectors.test_apriltag.TestAprilTagDetectorPoseConsistency object at 0x7f0ba994d910>
detector_small = <dimos.perception.detection.detectors.apriltag.AprilTagDetector object at 0x7f0ba4ff7080>
camera_info = CameraInfo(height=480, width=640, distortion_model='', frame_id='camera_optical', ts=1778854995.5262084)
depth = 3.0

    @pytest.mark.parametrize("depth", [0.5, 1.0, 2.0, 3.0])
    def test_depth_scale(
        self, detector_small: AprilTagDetector, camera_info: CameraInfo, depth: float
    ) -> None:
        """Estimated Z should scale approximately linearly with true depth."""
        img = _synthetic_tag_image(
            tag_id=99,
            tag_size_m=0.08,
            depth_m=depth,
            tag_shift_px=(0, 0),
        )
        results = detector_small.detect(img, camera_info)
        assert len(results) == 1
        z = results[0].pose.position.z
        # Tolerate 15 % error due to synthetic / ideal nature.
>       assert abs(z - depth) < 0.15 * depth, f"depth {depth} got Z={z}"
E       AssertionError: depth 3.0 got Z=2.1052631579154757
E       assert 0.8947368420845243 < (0.15 * 3.0)
E        +  where 0.8947368420845243 = abs((2.1052631579154757 - 3.0))

camera_info = CameraInfo(height=480, width=640, distortion_model='', frame_id='camera_optical', ts=1778854995.5262084)
depth      = 3.0
detector_small = <dimos.perception.detection.detectors.apriltag.AprilTagDetector object at 0x7f0ba4ff7080>
img        = Image(shape=(480, 640, 3), format=BGR, dtype=uint8, frame_id='camera_optical', ts=1.0)
results    = [AprilTagDetection(tag_id=99, family='tag36h11', pose=Pose(position=Vector([ -0.0021053  -0.0021053      2.1053]), ori...   [        329,         230],
       [        329,         249],
       [        310,         249]]), corner_count=4)]
self       = <dimos.perception.detection.detectors.test_apriltag.TestAprilTagDetectorPoseConsistency object at 0x7f0ba994d910>
z          = 2.1052631579154757

.../detection/detectors/test_apriltag.py:219: AssertionError
dimos.robot.test_all_blueprints_generation::test_all_blueprints_is_current
Stack Traces | 2.95s run time
def test_all_blueprints_is_current() -> None:
        root = DIMOS_PROJECT_ROOT / "dimos"
        all_blueprints, all_modules = _scan_for_blueprints(root)
    
        common = set(all_blueprints.keys()) & set(all_modules.keys())
        assert not common, (
            f"Names must be unique across blueprints and modules, "
            f"but these appear in both: {sorted(common)}"
        )
    
        generated_content = _generate_all_blueprints_content(all_blueprints, all_modules)
    
        file_path = root / "robot" / "all_blueprints.py"
    
        if "CI" in os.environ:
            if not file_path.exists():
                pytest.fail(f"all_blueprints.py does not exist at {file_path}")
    
            current_content = file_path.read_text()
            if current_content != generated_content:
                diff = difflib.unified_diff(
                    current_content.splitlines(keepends=True),
                    generated_content.splitlines(keepends=True),
                    fromfile="all_blueprints.py (current)",
                    tofile="all_blueprints.py (generated)",
                )
                diff_str = "".join(diff)
>               pytest.fail(
                    f"all_blueprints.py is out of date. Run "
                    f"`pytest dimos/robot/test_all_blueprints_generation.py` locally to update.\n\n"
                    f"Diff:\n{diff_str}"
                )
E               Failed: all_blueprints.py is out of date. Run `pytest dimos/robot/test_all_blueprints_generation.py` locally to update.
E               
E               Diff:
E               --- all_blueprints.py (current)
E               +++ all_blueprints.py (generated)
E               @@ -118,6 +118,7 @@
E                
E                
E                all_modules = {
E               +    "april-tag-detection-module": "dimos.perception.detection.apriltag_detector.AprilTagDetectionModule",
E                    "arm-teleop-module": "dimos.teleop.quest.quest_extensions.ArmTeleopModule",
E                    "b-box-navigation-module": "dimos.navigation.bbox_navigation.BBoxNavigationModule",
E                    "b1-connection-module": "dimos.robot.unitree.b1.connection.B1ConnectionModule",

all_blueprints = {'coordinator-basic': 'dimos.control.blueprints.basic:coordinator_basic', 'coordinator-cartesian-ik-mock': 'dimos.cont...r_cartesian_ik_piper', 'coordinator-combined-xarm6': 'dimos.control.blueprints.teleop:coordinator_combined_xarm6', ...}
all_modules = {'april-tag-detection-module': 'dimos.perception.detection.apriltag_detector.AprilTagDetectionModule', 'arm-teleop-mod..._navigation.BBoxNavigationModule', 'b1-connection-module': 'dimos.robot.unitree.b1.connection.B1ConnectionModule', ...}
common     = set()
current_content = '# Copyright 2025-2026 Dimensional Inc.\n#\n# Licensed under the Apache License, Version 2.0 (the "License");\n# you m...ebsocket_vis_module.WebsocketVisModule",\n    "zed-camera": "dimos.hardware.sensors.camera.zed.camera.ZEDCamera",\n}\n'
diff       = <generator object unified_diff at 0x7f8273aa6dc0>
diff_str   = '--- all_blueprints.py (current)\n+++ all_blueprints.py (generated)\n@@ -118,6 +118,7 @@\n \n \n all_modules = {\n+   ...igation.BBoxNavigationModule",\n     "b1-connection-module": "dimos.robot.unitree.b1.connection.B1ConnectionModule",\n'
file_path  = PosixPath('.../dimos/robot/all_blueprints.py')
generated_content = '# Copyright 2025-2026 Dimensional Inc.\n#\n# Licensed under the Apache License, Version 2.0 (the "License");\n# you m...ebsocket_vis_module.WebsocketVisModule",\n    "zed-camera": "dimos.hardware.sensors.camera.zed.camera.ZEDCamera",\n}\n'
root       = PosixPath('.../dimos/dimos/dimos')

dimos/robot/test_all_blueprints_generation.py:66: Failed

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@Null-Phnix Null-Phnix force-pushed the feat/apriltag-3d-detector-2036 branch from 97cc5fc to f9a2ebc Compare May 18, 2026 03:57
Null-Phnix and others added 8 commits June 7, 2026 15:37
- Add AprilTagDetector class using OpenCV ArUco + solvePnP for 6-DOF marker pose
- Add AprilTagDetectionModule dimos Module with In[Image]/In[CameraInfo] -> TF
- Publishes camera_optical -> marker/{family}_{tag_id} transforms
- Emits annotated images and JSON-serialised detections
- Full synthetic test suite with depth and rotation validation
- Configurable family, tag_size_m, max_freq

Closes dimensionalOS#2036
@Null-Phnix Null-Phnix force-pushed the feat/apriltag-3d-detector-2036 branch from 2f36b10 to 1d83a8d Compare June 7, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AprilTag marker 3D detector

1 participant