This is a demo showing how to plug in a real-world API into trustfall
and execute queries against it.
The key idea demonstrated in this demo is composition in schemas and queries.
Even without using trustfall capabilities, a motivated programmer could write
a purpose-built tool that directly uses the HackerNews API to perform the operations
examined here. However, each such tool would need to be written, maintained, and
optimized separately. Tools that do related-but-different operations, especially
if implemented by different people or at different points in time, are unlikely
to share code. For tools performing queries of significant complexity, the large state space
means the tool is also unlikely to be thoroughly tested, and may be buggy and difficult to extend.
In contrast, querying using the trustfall engine allows all functionality to be decomposed
into smaller components (vertices, edges, vertex properties), each able to be implemented
and tested independently of others. Each component represents one conceptual operation,
for example the "get a User vertex's name" property operation, or the "for a Comment vertex,
find the User that authored that comment" edge operation. Each such component can be added
to the schema one at a time, and implemented and tested individually.
It's much easier to implement and test multiple small components, than to build and test
all possible compositions. If each component is correctly implemented, the trustfall query engine
guarantees that the composition of such components is also going to be correct.
This makes it possible to confidently expose larger schemas and execute more complex queries
than previously possible, without worrying about bugs. Similarly, the composition-based
implementation approach allows individual operations to be optimized as necessary,
making it easier to win performance gains above and beyond the very good performance
already provided by the iterator-style execution model.
The project consists of the following components:
vertex.rsdefines theVertexenum whichtrustfalluses to represent vertices in the query graph.adapter.rsdefines theHackerNewsAdapterstruct, which implements thetrustfall::provider::BasicAdaptertrait and connects the query engine to the HackerNews API.- The
resolve_starting_verticesmethod is what produces the initial iterator ofVertexvertices corresponding to the root edge at which querying starts (e.g.FrontPage). - The
resolve_propertymethod is used to get property values for eachVertexin an iterator. - The
resolve_neighborsmethod is used to get the neighboring vertices (Vertexs) across a particular edge, for eachVertexin an iterator. - The
resolve_coercionmethod is kind of like the Pythonisinstance()function: for eachVertexin an iterable, it checks whether thatVertex's type can be narrowed to a more derived type than it previously represented. For example, if theVertexoriginally representedinterface Animal,resolve_coercionmay be used to check whether theVertexis actually oftype Dog implements Animal.
- The
main.rsis a simple CLI app that can execute query files inronformat.
This demo requires the Rust toolchain. To install Rust, follow the
official instructions. On UNIX-like operating systems,
that's usually as simple as curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh.
To confirm everything is set up correctly, cd into this directory and run cargo check.
After compiling for a minute or two, it should find no errors.
This demo contains several example query files in the example_queries directory.
Each file represents a single query (conforming to the schema in hackernews.graphql)
together with any arguments necessary to run the query.
To execute a query, run cargo run --example hackernews query path/to/query/file.ron. The execution is
lazy and incremental (iterator-style), so you'll see results stream onto the screen
continuously as they are received.
Reminder: While the schema and query syntax here is able to be parsed with GraphQL,
you are not actually using a GraphQL API, and the query semantics are not the same as GraphQL.
GraphQL's capabilities are a strict subset of the abilities of the query capabilities
of trustfall.
Here are the most notable aspects in which the trustfall query language differs from GraphQL:
@outputis used to mark fields for output; in SQL, this would correspond to a term in theSELECTclause.@filterdenotes that a field must match a predicate; in SQL, this would correspond to a term in theWHEREclause.- Expanding an edge by default has semantics equivalent to SQL
INNER JOIN. Directives like@optional / @recurse / @foldmay be applied to edges to change their behavior. - When an edge points to an
interfacetype, it is possible to select only a specific subtype (interfaceortype) of thatinterfaceusing a type coercion:... on Footo select onlyFoo-typed vertices and discard all others. @optionaldenotes that an edge is optional and is allowed to not exist. This is semantically equivalent to a SQLLEFT JOIN.@recursedenotes that an edge is to be traversed recursively between 0 and the number of times specified in the directive'sdepthfield.@foldrequests that the data output on the other side of the edge be "folded" into lists for each output field. In PostgreSQL terminology, this is like aGROUP BYwitharray_agg()applied to all folded outputs.
Let's describe and explain the queries in the example_queries directory.
cargo run --example hackernews query example_queries/front_page_stories_with_links.ron gets the HackerNews
items on the front page that are stories with links (as opposed to job links, or submissions
like "Show HN" that contain a message instead of a link). For each match, the query outputs
its title, link, current score, the name of its submitter and their current karma.
This is what the query looks like:
{
FrontPage {
... on Story {
title @output
url @filter(op: "is_not_null") @output
score @output
byUser {
submitter: id @output
submitter_karma: karma @output
}
}
}
}Here's what running it looks like:
$ cargo run --example hackernews query example_queries/front_page_stories_with_links.ron
Finished dev [unoptimized + debuginfo] target(s) in 0.15s
Running `/.../hackernews query example_queries/front_page_stories_with_links.ron`
{
"submitter_karma": 13731,
"submitter": "0xedb",
"title": "New Year, New CEO",
"url": "https://signal.org/blog/new-year-new-ceo/",
"score": 474
}
{
"submitter": "danso",
"score": 71,
"title": "In first, US surgeons transplant pig heart into human patient",
"submitter_karma": 142723,
"url": "https://apnews.com/article/pig-heart-transplant-6651614cb9d73bada8eea2ecb6449aef"
}
<... many more results ...>
cargo run query example_queries/jobs_in_top_50.ron gets the jobs that are currently shown in the
top 50 items on HackerNews. For each match, it returns the posting's title, url, and current score.
This is the query:
{
Top(max: 50) {
... on Job {
title @output
url @filter(op: "is_not_null") @output
score @output
}
}
}Here's its output:
$ cargo run --example hackernews query example_queries/jobs_in_top_50.ron
Finished dev [unoptimized + debuginfo] target(s) in 0.14s
Running `/.../hackernews query example_queries/jobs_in_top_50.ron`
{
"title": "Flow Club (YC S21) is hiring our first marketer",
"score": 1,
"url": "https://flowclub.notion.site/Work-at-Flow-Club-1e6cc84bfc0d4463ab333ee9bc02c46a"
}
cargo run --example hackernews query example_queries/latest_links_by_high_karma_users.ron gets the latest links
(i.e. links on the "new" tab) that were submitted by users with karma of 10,000 or more.
For each match, it returns the submission's title, URL, current score, and the submitter's username
and current karma.
This is the query:
{
LatestStory(max: 100) {
title @output
url @filter(op: "is_not_null") @output
score @output
byUser {
submitter: id @output
submitter_karma: karma @filter(op: ">=", value: ["$min_karma"]) @output
}
}
}It is executed with the following arguments, shown here in RON serialization format:
{
"min_karma": Uint64(10000),
}
Here's what running it looks like:
$ cargo run --example hackernews query example_queries/latest_links_by_high_karma_users.ron
Finished dev [unoptimized + debuginfo] target(s) in 0.15s
Running `/.../hackernews query example_queries/latest_links_by_high_karma_users.ron`
{
"submitter_karma": 23927,
"score": 2,
"url": "https://github.com/snapview/sunrise",
"title": "Sunrise: Spreadsheet-like dataflow programming in TypeScript",
"submitter": "wslh"
}
{
"title": "The case for Rust as the future of JavaScript infrastructure",
"submitter_karma": 30051,
"score": 1,
"submitter": "feross",
"url": "https://thenewstack.io/the-case-for-rust-as-the-future-of-javascript-infrastructure/"
}
<... many more results ...>
cargo run --example hackernews query example_queries/links_with_high_karma_commenters.ron looks at the latest
100 story submissions (i.e. HN's "new" tab items), and selects those that have links
and also have comments (looking up to 5 reply levels deep) made by users with at least 10,000 karma.
For each match, it outputs the submission's title, current score, URL, as well as
the matching comment's content, author, and the author's current karma.
This is the query:
{
LatestStory(max: 100) {
title @output
url @filter(op: "is_not_null") @output
score @output
comment {
reply @recurse(depth: 5) {
comment: text @output
byUser {
commenter: id @output
commenter_karma: karma @filter(op: ">=", value: ["$min_karma"]) @output
}
}
}
}
}It is executed with the following arguments, shown here in RON serialization format:
{
"min_karma": Uint64(10000),
}
Here's what running it looks like:
$ cargo run --example hackernews example_queries/links_with_high_karma_commenters.ron
Finished dev [unoptimized + debuginfo] target(s) in 0.17s
Running `/.../hackernews query example_queries/links_with_high_karma_commenters.ron`
{
"commenter_karma": 22774,
"url": "https://www.phoronix.com/scan.php?page=news_item&px=Intel-New-CCG-Leader",
"comment": ">Holthaus replaces EVP Gregory Bryant (“GB”), who will leave the company at the end of January for a new opportunity.<p>This is strange because Gregory Bryant was still presenting at CES [1] .<p>[1] <a href=\"https://www.anandtech.com/show/17171/intel-keynote-and-svp-greg-bryant-at-ces-2022-live-blog-10am-pt-1800-utc\" rel=\"nofollow\">https://www.anandtech.com/show/17171/intel-keynote-and-svp-g...</a>",
"title": "Intel Announces New Leader of Client Computing Group",
"commenter": "ksec",
"score": 1
}
{
"commenter": "scrollaway",
"commenter_karma": 25466,
"comment": "4 petabytes eh. Bonus points for the first article in several years to use CD-ROMs as a unit of comparison.<p>> <i>"drawn from crime reports, hacked from encrypted phone services and sampled from asylum seekers never involved in any crime"</i>",
"url": "https://www.theguardian.com/world/2022/jan/10/a-data-black-hole-europol-ordered-to-delete-vast-store-of-personal-data",
"title": "A data ‘black hole’: Europol ordered to delete vast store of personal data",
"score": 31
}
<... many more results ...>
The easiest way to write and run your own query is to:
- copy the content of one of the example queries,
- edit the query string and/or arguments as necessary,
- save it to a new file,
- then run it with
cargo run --example hackernews query <your_query_file>.
The query must use properties, types, and edges from the schema in the hackernews.graphql file.