You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently Cursorless uses a custom pattern definition DSL alongside a set of helper functions in nodeMatcher.ts to match various scope types, such as item within a list or argue within a function definition or function invocation.
The tree-sitter project also provides a DSL, written in Scheme which allows a user to query for patterns within syntax trees. Here's a link to the docs and an example usage in JS via the web-tree-sitter project. Each query then is allowed to assign a name to a node, such as @comment or @punctuation.bracket:
The thought is that moving towards this approach will be more expressive out of the box. Additionally, many other projects including Neovim and Helix rely on queries for syntax highlighting as well as indentation which might help to make the incremental work for adding a new language a little bit simpler, since there are already partial or full definitions to work from. In particular, Helix already uses these queries for their textobjects, which is a simplified version of Cursorless scope types. Here's an example of a set of textobject definitions; they exist for several other languages as well
The Work
Create a queries directory with a subdirectory for each language, eg queries/python, etc
Create a query file for a language that provides queries for some or all of these ScopeTypes, placing the file in queries/<language>/scopeTypes.scm
Add the ability to load queries on a per-language basis
Queries occur on the tree level (SyntaxNode.Tree.Language) and so are top down rather than bottom up as cursorless node matchers currently work.
Should a query have successful matches, return the smallest range containing the input selection
Note that for some of the auxiliary definitions listed below (eg @<scopeType>.searchScope) we first find a match, and then search within that range
The definitions
Default query tag is just @<scopeType>, so eg @namedFunction
In addition, we support a few other queries:
@<scopeType>.removalRange indicates a different range that should be used for removal
@<scopeType>.domain indicates that we should first expand to the smallest containing match for this tag and then search for a rooted instance of @<scopeType> within this region. The canonical example for this one is enabling take value from within the key in a map: we'd set @collectionItem.domain to be the containing pair
@<scopeType>.iterationScope indicates that when user says "every <scopeType>", we should first expand to the smallest instance of this tag, and then yield all top-level instances of @<scopeType> within this range. Here, top-level means not contained by any other match within the search range. Also, note that when finding the instances in the range, we should use @<scopeType>.domain if it exists. See below for an explanation
This will require a replacement of each of the language matcher files with a scopeTypes.scm definition. For this reason, we will want to support both paths while the migration occurs. We can keep doing continuous delivery during migration because every language other than C# is well tested.
Questions
Do we want to change our term scopeType to textObject? That is the term used in both nvim tree-sitter, helix, and by redstart voice
Better term for @<scopeType>.iterationScope?
@<scopeType>.parent?
How to handle argument lists and collection items? I have a feeling we'll be repeating , stuff for removal ranges a lot. I wonder if we want to add Toml configuration for languages where we can indicate scopes that should be handled as comma-separated lists. Along this direction, it's worth thinking about the connection to Support generic comma-separated lists #357
Do we want to support custom queries? Here's how neovim does it
Do we still want to support the "every" in cases where a scope doesn't explicitly specify a @<scopeType>.domain? We do that today by just iterating the parent. Might be useful to keep this one as a fallback 🤷♂️
Challenging cases
Why we need to use @<scopeType>.domain when searching within @<scopeType>.iterationScope
Consider the following case:
{foo: {bar: "baz"}}
If the user says "take every key fine", we want to just return foo, excluding the nested key bar. In this case key.iterationScope is object and key.domain is pair. If we just looked for instances of key within the object, we'd get the nested key as well. However, if we search for top-level pair objects we won't, as desired
Why @<scopeType> must be rooted within @<scopeType>.domain
We can actually use the same code example as above:
{
foo: {|bar: "baz"}}
If the user says "take key" with the cursor at the indicated position (after second opening bracket), we want to select foo. We first expand to the containing pair, as that is the definition of key.domain. Then we need to find the key. If we just look for top-level keys (ie not contained by other keys), we'll end up with both foo and bar. If we require that the key be rooted within the pair, that won't happen
Fwiw, we could possibly instead exclude any @<scopeType> matches which are contained within a lower @<scopeType>.domain
Background
Currently Cursorless uses a custom pattern definition DSL alongside a set of helper functions in nodeMatcher.ts to match various scope types, such as
itemwithin a list orarguewithin a function definition or function invocation.The tree-sitter project also provides a DSL, written in Scheme which allows a user to query for patterns within syntax trees. Here's a link to the docs and an example usage in JS via the web-tree-sitter project. Each query then is allowed to assign a
nameto a node, such as@commentor@punctuation.bracket:The name can then be read or asserted against.
The thought is that moving towards this approach will be more expressive out of the box. Additionally, many other projects including Neovim and Helix rely on queries for syntax highlighting as well as indentation which might help to make the incremental work for adding a new language a little bit simpler, since there are already partial or full definitions to work from. In particular, Helix already uses these queries for their textobjects, which is a simplified version of Cursorless scope types. Here's an example of a set of textobject definitions; they exist for several other languages as well
The Work
queriesdirectory with a subdirectory for each language, egqueries/python, etcScopeTypes, placing the file inqueries/<language>/scopeTypes.scmSyntaxNode.Tree.Language) and so are top down rather than bottom up as cursorless node matchers currently work.@<scopeType>.searchScope) we first find a match, and then search within that rangeThe definitions
@<scopeType>, so eg@namedFunction@<scopeType>.removalRangeindicates a different range that should be used for removal@<scopeType>.domainindicates that we should first expand to the smallest containing match for this tag and then search for a rooted instance of@<scopeType>within this region. The canonical example for this one is enablingtake valuefrom within the key in a map: we'd set@collectionItem.domainto be the containing pair@<scopeType>.iterationScopeindicates that when user says"every <scopeType>", we should first expand to the smallest instance of this tag, and then yield all top-level instances of@<scopeType>within this range. Here, top-level means not contained by any other match within the search range. Also, note that when finding the instances in the range, we should use@<scopeType>.domainif it exists. See below for an explanation@<scopeType>.interioris used byexcludeInteriorandinteriorOnlystages (see update inside / outside #254)Migration notes
This will require a replacement of each of the language matcher files with a
scopeTypes.scmdefinition. For this reason, we will want to support both paths while the migration occurs. We can keep doing continuous delivery during migration because every language other than C# is well tested.Questions
scopeTypetotextObject? That is the term used in both nvim tree-sitter, helix, and by redstart voice@<scopeType>.iterationScope?@<scopeType>.parent?,stuff for removal ranges a lot. I wonder if we want to add Toml configuration for languages where we can indicate scopes that should be handled as comma-separated lists. Along this direction, it's worth thinking about the connection to Support generic comma-separated lists #357@<scopeType>.domain? We do that today by just iterating the parent. Might be useful to keep this one as a fallback 🤷♂️Challenging cases
Why we need to use
@<scopeType>.domainwhen searching within@<scopeType>.iterationScopeConsider the following case:
If the user says
"take every key fine", we want to just returnfoo, excluding the nested keybar. In this casekey.iterationScopeisobjectandkey.domainispair. If we just looked for instances ofkeywithin theobject, we'd get the nested key as well. However, if we search for top-levelpairobjects we won't, as desiredWhy
@<scopeType>must be rooted within@<scopeType>.domainWe can actually use the same code example as above:
If the user says "take key" with the cursor at the indicated position (after second opening bracket), we want to select
foo. We first expand to the containingpair, as that is the definition ofkey.domain. Then we need to find thekey. If we just look for top-levelkeys (ie not contained by otherkeys), we'll end up with bothfooandbar. If we require that thekeybe rooted within thepair, that won't happenFwiw, we could possibly instead exclude any
@<scopeType>matches which are contained within a lower@<scopeType>.domainResources
Addenda:
nameof a node and using this DSL will likely be the approach used to support multi-language documents.Attributions
👋 Big H/T 🎩 to @wenkokke for the original idea