Skip to content

Python: fix(mem0): isolate entity retrieval and correct app_id payload#6242

Open
VedantSonani wants to merge 1 commit into
microsoft:mainfrom
VedantSonani:fix-mem0-retrieval
Open

Python: fix(mem0): isolate entity retrieval and correct app_id payload#6242
VedantSonani wants to merge 1 commit into
microsoft:mainfrom
VedantSonani:fix-mem0-retrieval

Conversation

@VedantSonani
Copy link
Copy Markdown

Motivation and Context

This change is required because the current Mem0ContextProvider fails to retrieve any stored memories during the before_run phase. It solves two critical bugs in how the provider interacts with the Mem0 API:

  1. API Parameter Mismatch: The provider was saving the application ID inside a custom metadata dictionary but searching for it using Mem0's native top-level app_id parameter, resulting in instant filtering failures.
  2. The Entity Isolation "AND" Trap: Mem0 stores extracted facts in isolated entity partitions (e.g., assigning a memory strictly to user_1 OR agent_1). By passing both user_id and agent_id in a single bundled filters dictionary, the provider forced a strict logical AND intersection (user == X AND agent == Y). Since no single memory row contains both tags, the database always returned zero results.

Fixes #6237

Description

Changes Implemented:

  • after_run (Ingestion Fix): Modified the mem0_client.add payload to pass self.application_id to the native app_id parameter instead of trapping it inside the metadata dictionary. This aligns the insertion schema with the retrieval schema.
  • before_run (Retrieval Fix): Completely removed the bundled _build_filters logic. Replaced it with a concurrent architecture using asyncio.gather to query the User partition and the Agent partition independently.
  • Result Merging & Deduplication: Added logic to seamlessly merge the results from the parallel queries and deduplicate them by memory id.
  • Client Compatibility: Introduced a build_search_kwargs helper function inside before_run. This safely generates the query dictionaries without shallow-copy side effects, and cleanly handles the differing payload requirements between AsyncMemory (OSS) and AsyncMemoryClient (Platform).

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR. (No, non-breaking bug fix).

Copilot AI review requested due to automatic review settings June 1, 2026 15:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates the Mem0 context provider to retrieve memories by querying entity “partitions” (user/agent) independently and merging results, avoiding strict AND-filter limitations, and aligns application ID usage with an app_id parameter.

Changes:

  • Run parallel Mem0 searches for user_id and agent_id, then merge/deduplicate results before injecting into the session context.
  • Refactor filter construction into a per-entity search-kwargs builder that supports OSS vs Platform client differences.
  • Update memory creation call to pass app_id instead of metadata.application_id.

Comment on lines +113 to +124
# 1. Query User partition independently
if self.user_id:
user_kwargs = self._build_filters(input_text, "user_id", self.user_id)
search_tasks.append(self.mem0_client.search(**user_kwargs))

# 2. Query Agent partition independently
if self.agent_id:
agent_kwargs = self._build_filters(input_text, "agent_id", self.agent_id)
search_tasks.append(self.mem0_client.search(**agent_kwargs))

if not search_tasks:
return
Comment on lines +104 to +108
"""Search Mem0 for relevant memories and add to the session context."""
self._validate_filters()
input_text = "\n".join(msg.text for msg in context.input_messages if msg and msg.text and msg.text.strip())
if not input_text.strip():
return
Comment on lines +126 to +135
results = await asyncio.gather(*search_tasks, return_exceptions=True)

# Merge and deduplicate results
memories = []
seen_memory_ids = set()

for search_response in results:
if isinstance(search_response, Exception):
continue

Comment on lines +196 to 210
def _build_filters(self, input_text: str, entity_key: str, entity_value: str) -> dict[str, Any]:
filters: dict[str, Any] = {"query": input_text}

if isinstance(self.mem0_client, AsyncMemory):
# AsyncMemory (OSS) expects direct kwargs
filters[entity_key] = entity_value
if self.application_id:
filters["app_id"] = self.application_id
else:
# AsyncMemoryClient (Platform) expects a filters dict
filters["filters"] = {entity_key: entity_value}
if self.application_id:
filters["filters"]["app_id"] = self.application_id

return filters
@VedantSonani
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

@moonbox3 moonbox3 added the python label Jun 1, 2026
@github-actions github-actions Bot changed the title fix(mem0): isolate entity retrieval and correct app_id payload Python: fix(mem0): isolate entity retrieval and correct app_id payload Jun 1, 2026
Copy link
Copy Markdown
Contributor

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also have a look at the failing CI/CD items.

agent_kwargs = self._build_filters(input_text, "agent_id", self.agent_id)
search_tasks.append(self.mem0_client.search(**agent_kwargs))

if not search_tasks:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when only application_id is configured? _validate_filters allows that (app-only is a valid config), but search_tasks only gets populated for user_id/agent_id, so we hit this guard and return without ever searching. App-scoped setups now retrieve zero memories, silently, even though after_run keeps writing them. Regression vs the old single-search path that always included app_id.

One option might be an app-only fallback before the guard:

Suggested change
if not search_tasks:
# Fall back to an app-scoped search when only application_id is configured
if not search_tasks and self.application_id:
app_kwargs: dict[str, Any] = {"query": input_text}
if isinstance(self.mem0_client, AsyncMemory):
app_kwargs["app_id"] = self.application_id
else:
app_kwargs["filters"] = {"app_id": self.application_id}
search_tasks.append(self.mem0_client.search(**app_kwargs))
if not search_tasks:
return

seen_memory_ids = set()

for search_response in results:
if isinstance(search_response, Exception):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be swallowing every search error here? gather(return_exceptions=True) + continue turns auth failures, bad config, rate limits, network/5xx all into an empty result with no log (module has no logger). A fully misconfigured provider becomes indistinguishable from one that legitimately found nothing, and the caller sees success. Could we at least log each exception arm, and maybe distinguish all-tasks-failed from genuinely-empty rather than returning silently?

current_memories = [search_response]

for mem in current_memories:
mem_id = mem.get("id")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is gating inclusion on mem_id being truthy intended? Any memory with a missing or falsy id (None, "", 0) gets dropped entirely, not just deduped. Old code included every memory regardless of id. Dedup should skip repeats, not discard id-less entries. Could we keep id-less memories in?

Suggested change
mem_id = mem.get("id")
mem_id = mem.get("id")
if mem_id is not None and mem_id in seen_memory_ids:
continue
if mem_id is not None:
seen_memory_ids.add(mem_id)
memories.append(mem)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Bug]: [Bug] Mem0ContextProvider always returns empty results due to broken filter logic and API parameter mismatch

3 participants