Skip to content

feat(maru): add pool_id config for dax device selection#6

Draft
hyunyul-XCENA wants to merge 9 commits into
xcena-dev:feat/maru-backendfrom
hyunyul-XCENA:feat/maru-backend-pool-id
Draft

feat(maru): add pool_id config for dax device selection#6
hyunyul-XCENA wants to merge 9 commits into
xcena-dev:feat/maru-backendfrom
hyunyul-XCENA:feat/maru-backend-pool-id

Conversation

@hyunyul-XCENA

@hyunyul-XCENA hyunyul-XCENA commented Mar 17, 2026

Copy link
Copy Markdown

Summary

  • Read maru_pool_id from extra_config and pass it to MaruConfig.pool_id
  • Enables pool selection (dax device pinning) from maru-public in the new MaruBackend path
  • Supports int, list[int], and comma-separated string formats

Usage

extra_config:
  maru_pool_id: 0          # single pool
  maru_pool_id: [0, 1]     # fallback chain (try pool 0 first, then pool 1)
  maru_pool_id: "0,1,2"    # comma-separated
  # omit → ANY_POOL_ID (maru_resourced decides)

Changes

  • maru_backend.py: _create_handler() reads extra_config.maru_pool_id and passes it to MaruConfig kwargs
  • No changes to config.py — follows existing extra_config pattern used by other maru options

youngrok-XCENA and others added 8 commits March 16, 2026 06:30
- Add use_layerwise guard (NotImplementedError)
- Change zip strict=False to strict=True in batched_submit_put_task
- Add warning log when contains(pin=True) is called
- Add warning log for in-flight put_tasks on close()
…or MaruBackend

Enable MaruBackend to participate in StorageManager.async_lookup_and_prefetch()
by implementing the two required async lookup APIs. Both use asyncio.to_thread
to wrap sync RPC calls (handler.exists / handler.retrieve) without blocking the
event loop.
- Convert maru:// to tcp:// in _create_handler for ZMQ compatibility
- Call memory_obj.pin() in get_blocking to match cleanup unpin,
  fixing pin_count=-1 warning on retrieve path
Previously _async_store called on_complete_callback in the finally
block regardless of success/failure, which could signal false success
to callers and mask CXL page leaks on store failure.

Aligns with LocalCPUBackend and NixlDynamic which only call callback
on success.
Read maru_pool_id from extra_config and pass to MaruConfig.
Supports int, list[int], and comma-separated string formats.

Usage in LMCache YAML:
  extra_config:
    maru_pool_id: 0          # single pool
    maru_pool_id: [0, 1]     # fallback chain
    maru_pool_id: "0,1,2"    # comma-separated

@seohui-XCENA seohui-XCENA left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean and minimal change. The dict + **kwargs pattern for conditionally passing pool_id is appropriate, and it follows the existing extra_config conventions.

if isinstance(pool_id, list):
maru_kwargs["pool_id"] = [int(p) for p in pool_id]
elif isinstance(pool_id, str) and "," in pool_id:
maru_kwargs["pool_id"] = [int(p.strip()) for p in pool_id.split(",")]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] No error handling on int() conversion.

Malformed input (e.g. "abc", [1, "x"]) will raise a bare ValueError and crash MaruBackend initialization. _parse_pool_size in the same file uses try/except ValueError with a fallback, so this is inconsistent.

That said, for pool_id a silent fallback could be dangerous (mapping to the wrong dax device), so fail-fast is arguably better — but with a clear error message:

try:
    maru_kwargs["pool_id"] = int(pool_id)
except (ValueError, TypeError) as e:
    raise ValueError(f"Invalid maru_pool_id={pool_id!r}: {e}") from e

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9897fce. Wrapped all int() conversions with try/except (ValueError, TypeError) and re-raise with clear message:

raise ValueError(f"Invalid maru_pool_id={pool_id!r}: {e}") from e

Agreed that fail-fast is better than silent fallback for pool_id — wrong dax device mapping would be worse than a crash.

eager_map=extra.get("maru_eager_map", True),
)
pool_id = extra.get("maru_pool_id")
if pool_id is not None:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Empty list / empty string edge cases.

  • maru_pool_id: []isinstance(pool_id, list) is true → maru_kwargs["pool_id"] = []
  • maru_pool_id: "" → falls to else: int("")ValueError
  • maru_pool_id: ","split(",")["", ""]int("")ValueError

Empty values could be treated the same as None (i.e. skip and let maru_resourced decide).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9897fce. Empty values now treated as None (skip):

  • []if pool_id: guard, skips
  • ""if stripped: guard, skips
  • ","if p.strip() filter in list comprehension, produces empty → skips

All result in maru_resourced deciding the pool.

- Wrap int() conversions in try/except with clear ValueError message
- Handle empty list, empty string, and comma-only string as None (skip)
- Filter out empty entries in comma-separated format

@hyunyul-XCENA hyunyul-XCENA left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed both review comments in 9897fce:

[Minor] int() error handling:
Wrapped all int() conversions with try/except (ValueError, TypeError) and re-raise as ValueError with clear message. Agreed that fail-fast is better than silent fallback for pool_id.

[Nit] Empty edge cases:
Empty values now treated as None (skip):

  • []if pool_id: guard, skips
  • ""if stripped: guard, skips
  • ","if p.strip() filter in list comprehension, produces empty → skips

@seohui-XCENA seohui-XCENA left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@hyunyul-XCENA hyunyul-XCENA marked this pull request as draft March 18, 2026 07:46
@youngrok-XCENA

Copy link
Copy Markdown

나중에 extra config에 별도의 device path 같은 걸 넣어서 사용할 수 있게 하면 될 듯 합니다. 일단 홀딩.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants