Hi, thanks for releasing this benchmark suite — really useful unification of the memory systems.
I'm trying to reproduce the "LongMemEval" numbers in the paper, but the only data path I can find in the code is benchmark/memoryagentbench/hf_datasets.py pulling ai-hyz/MemoryAgentBench — there's no standalone LongMemEval loader, and the context chunking, prompts, and gold answers all come from that MemoryAgentBench split rather than the original LongMemEval release.
Could you clarify whether the reported numbers are entirely from this MemoryAgentBench-reconstructed subset, whether the workload should instead be labeled "MemoryAgentBench (longmemeval_s*)"?
Hi, thanks for releasing this benchmark suite — really useful unification of the memory systems.
I'm trying to reproduce the "LongMemEval" numbers in the paper, but the only data path I can find in the code is benchmark/memoryagentbench/hf_datasets.py pulling ai-hyz/MemoryAgentBench — there's no standalone LongMemEval loader, and the context chunking, prompts, and gold answers all come from that MemoryAgentBench split rather than the original LongMemEval release.
Could you clarify whether the reported numbers are entirely from this MemoryAgentBench-reconstructed subset, whether the workload should instead be labeled "MemoryAgentBench (longmemeval_s*)"?