Options for disambiguating the \k backreferences

Backreferences to named capture groups have the surface syntax `\k<foo>`, but that already has semantics in non-Unicode RegExps. A few options have been discussed for the issue:

1. *Named capture groups only usable in Unicode mode*
This was the original proposal, and it's what V8 implements behind a flag. This is one reason why we reserved extra sequences like `\k` to be a syntax error in Unicode mode--so we could add new features this way. It's the simplest option, and it would give people a carrot to upgrade to using the Unicode flag. Going from 1, we could "always" add 2 or 3 "later". @mathiasbynens and @hashseed have argued for this minimal option.
2. *Named capture groups can be used outside of Unicode mode, but named backreferences are only with Unicode mode on*
There seemed to be some concern from the committee that this is something of an unexpected cliff in the middle of the feature. Another argument against it is that we shouldn't add new things to non-Unicode RegExps to encourage people to flip the flag on. This was my funny idea.
3. *Disambiguate by making \k it have the new semantics if there are any named capture groups*
This is definitely possible, but more complicated than one might think at first. If there are no named capture groups, then `\k` can be anywhere, but otherwise, it needs to be followed by `<` *IdentifierName* `>`; this complicates the grammar. Another piece of complexity is that an implementation can't determine whether there are named capture groups on-line, if lookbehind is in play (because lookbehind semantics are executing the RegExp backwards, and this affects captured groups. For example: `/(?<=\k<a>(?<a>.))/` matches a zero-length sequence which is preceded by the same character twice. It's definitely unambiguous, just complicated. This was @bakkot's suggestion.

At the September TC39 meeting, we seemed to come to consensus on 3; however, this was without incorporating some feedback from people not present in the room, and without a full understanding of the complexity of 3. With the complexity of 3, and the weird cliff of 2, I'm personally leaning back towards 1. OTOH, 3 feels the most "1JS-y" to me. Any thoughts? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Options for disambiguating the \k backreferences #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Options for disambiguating the \k backreferences #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions