Skip to content

Python client does not support SOLR deep paging cursors #356

@stevegaron

Description

@stevegaron

I've noticed poor performance with some SOLR queries related to Deep Paging.(http://solr.pl/en/2014/03/10/solr-4-7-efficient-deep-paging/)

Here is the use case:
I need to pull all keys from a bucket that match a given filter.

Right now I do something like this:

def list_keys(bucket, my_filter)
    out=[]
    start=0
    rows=1000
    done = False

    while not done:
        results = bucket.search(my_filter, fl="_yz_rk", start=start, rows=rows)
        out.extend([x["_yz_rk"] for x in results['docs']])
        start += rows
        if len(results['docs']) < rows:
            done = True

    return out

The problem with this is the deeper I go in the index, the slower the bucket.search gets. This is especially true when you add a sort to the search...

SOLR Fixed the issue in 4.7 by passing a cursor instead of using the 'start' parameter. Therefor I expect to do something like this:

def list_keys(bucket, my_filter)
    out=[]
    cursorMark="*"
    rows=1000
    done = False

    while not done:
        results = bucket.search(my_filter, fl="_yz_rk", rows=rows, cursorMark=cursorMark)
        cursorMark = results['nextCursorMark']
        out.extend([x["_yz_rk"] for x in results['docs']])
        if len(results['docs']) < rows:
            done = True

    return out

As it turns out, right now, the python client does not pass back the 'nextCursorMark' and only the docs, max_score and num_found are returned to the results object.

Thank you,
Steve

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions