Skip to content

ARROW-13832: [Doc] Improve compute documentation#11830

Closed
amol- wants to merge 11 commits into
apache:masterfrom
amol-:ARROW-13832
Closed

ARROW-13832: [Doc] Improve compute documentation#11830
amol- wants to merge 11 commits into
apache:masterfrom
amol-:ARROW-13832

Conversation

@amol-

@amol- amol- commented Dec 1, 2021

Copy link
Copy Markdown
Member

Document a bit better the compute functions and add a section about grouped aggregations. Also list the available aggregation functions automatically.

sshot2

@github-actions

github-actions Bot commented Dec 1, 2021

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Dec 1, 2021

Copy link
Copy Markdown

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@pitrou pitrou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement! Here are some comments.

Comment thread docs/source/python/api/compute.rst Outdated
Comment thread docs/source/conf.py Outdated
# This will also rebuild appropriately when the value changes.
app.add_config_value('cuda_enabled', cuda_enabled, 'env')
app.add_config_value('flight_enabled', flight_enabled, 'env')
app.add_directive('computefuncs', ComputeFunctionsTableDirective)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For disambiguation and clarity, can we prefix our own directives with "arrow-"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread docs/source/conf.py Outdated
Comment thread docs/source/conf.py
Comment thread docs/source/conf.py Outdated
Comment thread docs/source/python/compute.rst Outdated
keys: [["a","b","c"]]

The ``"sum"`` aggregation passed to the ``aggregate`` method in the previous
example is the :func:`hash_sum` compute function.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "hash_sum" actually exposed in the docs? Otherwise, I suppose the :func: tag will not link to anything?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It currently is, but there is #11803 which removes it, so remove the :func: from here

@jorisvandenbossche jorisvandenbossche left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed a nice improvement!

:toctree: ../generated/

ArraySortOptions
AssumeTimezoneOptions

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it has much value to list them here, as long as they have no docstring.. (this will create a lot new doc pages, which will basically be empty)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I list them here because it provides the signature and thus which arguments they support. As you said there won't be any docstring but given that in many cases you can guess what the arguments do from their name it's better than nothing

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's true that the signatures show something. It only looks kind of "bad" to have an empty docstring page ..

Actually, what happens if you leave out the :toctree: ../generated/ (to only have the table) ? Although that will make just sphinx complain about nonexisting references ..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing it also removes the reference page where you can see the signature

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing it also removes the reference page where you can see the signature

The table doesn't show the signature?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope

Comment thread docs/source/python/compute.rst Outdated
Comment thread docs/source/python/compute.rst
You can use them with or without the ``"hash_"`` prefix.

.. arrow-computefuncs::
:kind: hash_aggregate No newline at end of file

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment thread docs/source/conf.py
result = ViewList()
function_kind = self.options.get('kind', None)

result.append(".. csv-table::", "<computefuncs>")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what's the "" for?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I got the question, which "" are you referring to?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, annoying github rendering that hides things between angle brackets :) I meant "<computefuncs>"

@amol- amol- Dec 9, 2021

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice is the name of the source file where the rst code was written. In this case I use <computefuncs> so that if there is a syntax error in the generated rst code it will tell you "line blahblah in <computefuncs>" and we know it's in this directive. I mimic a bit python style which uses things like File "<stdin>", line 1, in <module>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see!

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@ursabot

ursabot commented Dec 9, 2021

Copy link
Copy Markdown

Benchmark runs are scheduled for baseline = 3f6773a and contender = 5b805d7. 5b805d7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.09% ⬆️0.04%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants