[AIT-311] Add Claude skill for translating examples to Swift, and an example translation#3283
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
dcf84c4 to
2366368
Compare
.claude/skills/translate-examples-to-swift/prompts/translation-subagent.md
Outdated
Show resolved
Hide resolved
.claude/skills/translate-examples-to-swift/prompts/verification-subagent.md
Outdated
Show resolved
Hide resolved
.claude/skills/translate-examples-to-swift/prompts/verification-subagent.md
Outdated
Show resolved
Hide resolved
|
Thanks for the feedback on the skill, Marat. I'll take a look, but as mentioned the skill is very much a "Claude wrote this and I've not reviewed the individual words in detail" thing. Did you have any thoughts on the translations themselves? |
| ```swift | ||
| // Publish initial message and capture the serial for appending tokens | ||
| let publishResult = try await withCheckedThrowingContinuation { (continuation: CheckedContinuation<ARTPublishResult, Error>) in | ||
| channel.publish([.init(name: "response", data: "")]) { result, error in |
There was a problem hiding this comment.
Inconsistent publish API call style. This uses channel.publish([.init(name: "response", data: "")]) (array-of-messages form), but the skill's translation prompt examples at prompts/translation-subagent.md:236 use channel.publish("response", data: "") (name+data form). Both are valid ably-cocoa APIs but the inconsistency between what the skill teaches and what was actually produced could cause confusion for future translation runs. Should pick one and be consistent. The array form is arguably more correct since the JS passes an object { name: 'response', data: '' }, but then the skill prompt examples should be updated to match.
Same issue at lines 403 and 1154.
There was a problem hiding this comment.
The subagent isn't meant to be prescriptive about which variant to use; it's just meant to use the variant that most closely matches that being used by the JS example. Will add something to the skill
| */} | ||
| ```swift | ||
| let options = ARTClientOptions(key: "your-api-key") | ||
| options.transportParams = ["appendRollupWindow": .withString("100")] // 10 messages/s |
There was a problem hiding this comment.
transportParams value .withString("100") -- is this the right API? The JS/Python/Java examples all pass a plain string "100", but the Swift uses .withString("100") which implies ARTStringifiable or similar. Worth verifying this compiles against ably-cocoa and that .withString is the correct way to set string values in transportParams. If transportParams is just [String: String], this would be wrong.
There was a problem hiding this comment.
Everything has been verified to compile. transportParams is [String: ARTStringifiable] and this factory method is one way to create ARTStringifiable. c.f. e.g. these tests (they use the initializer form instead) https://github.com/ably/ably-cocoa/blob/460c1d11333dd58bbc924a3d153d1364b39bbdf0/Test/AblyTests/Tests/RealtimeClientTests.swift#L200-L210
| @MainActor | ||
| func example_anthropic_message_per_response_1() async throws { | ||
| // --- example code starts here --- | ||
| func example() async throws { |
There was a problem hiding this comment.
CheckedContinuation<Void, any Error> vs CheckedContinuation<Void, Error>. The guide examples (line 657 and 958) use any Error (existential) while every example in message-per-response.mdx uses plain Error. Both compile, but Error (without any) is the standard pattern and what the skill prompt teaches. These guide files should use the same form for consistency.
There was a problem hiding this comment.
sure, will update the guides
| } | ||
|
|
||
| // Example: stream returns events like { type: 'token', text: 'Hello' } | ||
| for try await event in stream { |
There was a problem hiding this comment.
for try await vs for await -- inconsistent with skill prompt and possibly unnecessary. This line and lines 244, 423, 1169 use for try await event in stream, but the skill's translation prompt examples (lines 250, 534) use for await event in stream (no try). The difference matters: for try await is needed when the AsyncSequence's Failure type is non-Never, but all harness signatures declare Never as the failure type (e.g., any AsyncSequence<..., Never>). With a Never failure type, for await (without try) should suffice. The try is harmless but misleading to readers.
There was a problem hiding this comment.
I think that the correct way to model an LLM token stream is as a throwing sequence, so the Failure type should be non-Never, necessitating for try await — will update guidance and translations
| @@ -0,0 +1,315 @@ | |||
| --- | |||
| name: translate-examples-to-swift | |||
| description: Translates inline JavaScript example code to Swift | |||
There was a problem hiding this comment.
Description is too terse for reliable skill triggering. "Translates inline JavaScript example code to Swift" is generic enough that Claude may not invoke this skill when the user says things like "add Swift examples" or "translate to ably-cocoa". Consider expanding to something like: "Translates inline JavaScript example code to Swift in Ably documentation MDX files. Use this skill whenever adding Swift code blocks to docs, translating JS examples to Swift/ably-cocoa, or when the user mentions Swift translations, even if they don't explicitly say 'translate'."
There was a problem hiding this comment.
I mean, the intention is that this be invoked explicitly as /translate-examples-to-swift. I haven't tried a workflow in which you just hope that it magically gets invoked. Can we leave this one until (if) it turns out to be a problem? I don't want to make it more verbose for the sake of it
| - `consolidate.sh` - Merges translation and verification JSONs, validates, generates review HTML | ||
| - `generate-translation-stubs.sh` - Generates stub translation JSONs from verification data (for verify-only mode) | ||
|
|
||
| Scripts are in `.claude/skills/translate-examples-to-swift/review-app/`: |
There was a problem hiding this comment.
Minor copy-paste error. This repeats "Scripts are in" from line 294 above. Should be "Review app scripts are in" or similar to distinguish the two locations.
There was a problem hiding this comment.
Sure, will fix
| - `nonisolated(unsafe)` (forbidden by C9) | ||
| - Force-unwraps beyond the `(result, error)` callback convention (i.e. force-unwrapping a result that isn't being used) | ||
| - Logic inside continuation callbacks beyond resuming the continuation (values should be extracted after the `await`, not inside the callback) | ||
| - Fire-and-forget SDK calls using bare callbacks instead of `Task { }` with a continuation inside |
There was a problem hiding this comment.
Verification checklist contradicts translation guidance. This says to check for "Fire-and-forget SDK calls using bare callbacks instead of Task { } with a continuation inside", implying fire-and-forget calls should use Task{} with continuations. But the translation prompt explicitly says fire-and-forget calls should be called directly without a callback or Task wrapper (translation-subagent.md lines 217-219). This checklist item reads as the opposite of the intended rule. Should be reworded to check for fire-and-forget calls that are unnecessarily wrapped in Task{} or continuations when they should just be called directly.
There was a problem hiding this comment.
Ah yes, this is a leftover from an earlier iteration, will fix
| ```swift | ||
| // The body of this function is the translation of the example. | ||
| // Function name includes the example ID | ||
| func example_Kx9mQ3(channel: ARTRealtimeChannel, stream: any AsyncSequence<(type: String, text: String), Never>) async throws { |
There was a problem hiding this comment.
Harness function signature is missing & Sendable. This example shows any AsyncSequence<(type: String, text: String), Never> without & Sendable, but every actual harness comment in the MDX files includes & Sendable (e.g., message-per-response.mdx:141). Under Swift 6 strict concurrency this matters for passing across isolation boundaries. The prompt examples should include & Sendable to match what is actually produced.
There was a problem hiding this comment.
I don't think we actually need & Sendable anywhere, will remove and can add back in if it turns out to be needed (will run another verification pass)
|
The JS example on this page uses |
|
|
|
For full transparency, I've reviewed this by pulling the branch down and probing Claude with questions about it. I posted comments using Claude which are points that I think should be looked at but generally speaking my approach with skills is that you'll know if these are actual issues once you use the skill a few times - still worth having a look at points that might look obviously wrong and fixing them as needed. More general feedback would be to run /skill-creator which is a skill auditor/creator/validator made by Anthropic themselves. My own personal opinion was that the skill.md seems long? maybe we can break it up into more resources? but again, it doesn't concern me if you think the skill works well as is and we can evolve this over time. Specifically to this skill and where it should live... who owns this skill? how do we keep it up to date? do you think it is better off in Ably OS? |
|
also, not in scope for this PR, but how do you go about validating the outputted translation? how do you ascertain confidence that it followed the skill correctly? Can we have another skill to validate? or are we okay with shifting the burden from writing the translation to reviewing it in PRs instead? |
|
Thanks for the comments Umair, will take a look when I have some time. But re your overall comments: The skill was created by Claude, and has gone through many iterations as I've identified various things it was doing wrong. I did not, however, use the
Unless I've misunderstood what it is that you're suggesting, I think that it very much is in scope for this PR:
|
Right now this skill requires you to run it on a Mac and have Xcode installed, so realistically the only people who are going to be running it are you, Marat, or me. I'd be tempted to say "I own it". And it's a skill that is only useful within the context of the docs repo so I think here is fine? |
|
|
||
| ``` | ||
| Tool: Task | ||
| subagent_type: "general-purpose" |
There was a problem hiding this comment.
Is it how subagents are being spawned, or why this snippet is here? wdyt
There was a problem hiding this comment.
🤷 it's something that Claude wrote when writing the skill — I asked it about it and it said that "Launch a general-purpose subagent with the resulting prompt" should work fine, so will replace it. (general-purpose subagents are described here)
Whatever inconsistencies in channel naming exist between those two pages in the JS have just been copied across — I think this out of scope of this PR. |
@lawrence-forooghian |
Sure, will change |
2366368 to
2cdab26
Compare
|
@maratal @umair-ably I've addressed all the feedback (except for using the |
Given it's only us three to realistically use this, we (devex team) think it should belong in Ably OS instead. There seem to be clearer lines of ownership for skills in Ably OS (as well as analytics and eventually pruning - if this isn't already in their too). There's a much higher chance it'd be orphaned or fall into the remit of DevEx (and then orphaned/stale anyway) if it stays in this repo. Definitely agree with the sentiment that something is better than nothing and we don't need to optimise the skill before it even goes in - this can happen over time. |
As in, let's not merge this PR and instead let's move all this stuff over to ably-os? Or merge it and move it later? Some things I'd like to understand:
|
maratal
left a comment
There was a problem hiding this comment.
LGTM (I haven't tried it tho, but some inconsistencies in translations would not affect this decision).
umair-ably
left a comment
There was a problem hiding this comment.
As in, let's not merge this PR and instead let's move all this stuff over to ably-os? Or merge it and move it later?
Some things I'd like to understand:
how is this skill different to https://github.com/ably/docs/blob/main/.claude/commands/generate-guide.md, which we were (seemingly) happy to merge here?
does it make sense for a skill that is very tied to the structure of the docs repo (mdx files with multiple code blocks side by side) to exist in somewhere that isn't this repo?
They're both fair points and needs to feed into wider conversations on how we handle skills that are clearly internal but still tied to specific repos (that then also cross to different teams/ownership)...
The main argument for putting it in Ably OS, is that it shouldn't matter what repo it is tied to... i.e. if we're using Ably OS skills, there's nothing that stops it from referencing other repos and interacting with them as you see fit.
Granted, this sits closer to Docs as a home, but then it loses niceties that the Ably OS gives us whilst also raising questions around ownership... with this in mind, I would say the generate-guide should also live in Ably OS (even if both of these are in a docs-skills plugin served by the OS)
This conversation is beyond the scope of this PR and I don't want to be the blocker to getting this in, so I'm happy to approve, but we should pick this up with Mark and potentially the SLT to see what the advice is here. We can table this conversation until people are back after the easter break
Tweak the change made in 1c4e6fe: instead of allow-listing different subdirectories (e.g. .claude/skills) ad-hoc as we need to, make use of Claude's scope system (i.e. ignore local scope, under the definitions given in [1]). [1] https://code.claude.com/docs/en/settings#configuration-scopes
When invoked as, for example:
> /translate-examples-to-swift translate all the example code in @src/pages/docs/ai-transport
it will translate all of the referenced examples to Swift, making sure
to produce code which has been verified to compile. It also runs an
independent verification subagent which reviews the correctness of the
translation and performs an second compilation attempt.
As part of the verification, it also generates a single-page app with a
UI that makes it easy for a human to review the translations; you can then
export a Markdown or JSON file with the feedback (for passing to Claude
for it to iterate on this feedback).
I don't yet have a great process for getting it to apply review feedback
when starting from a fresh context; I've just been telling it something
like "translations x were generated using this skill; now apply feedback
y", but it doesn't do a great job of updating the translation JSON files
(and thus the data displayed in the review app) to reflect the changes
it's made without wiping out any unrelated notes from the original
translation.
The skill also gives Claude the ability to review Swift translations in
isolation (i.e. not as part of a translation run and thus without the
supporting artifacts). For this to work properly, we need to keep the
context comments (added by the translation process) in the MDX files. I
think that we should keep these _anyway_, because I think we should at
some point consider setting up tooling to ensure that _all_ of our code
examples in the docs repo actually are valid and compile. And this would
be a stepping stone to that. Note that these harness comments contain
random IDs, which look a bit useless in isolation; however, they are
definitely useful during the the translation-and-verification processes
(which are independent and thus need some sort of ID for correlation;
previously just tried using a sequential counter but this can easily get
out of sync due to merging new or reordered examples, or the two
different subagents counting examples differently), but I believe will
also continue to be useful as a simple way of referring to an example
when working with Claude ("fix example Kx9mQ3").
I wrote the original version of this skill and then got Claude to do
some heavy iteration of it based on my feedback when testing. I haven't
reviewed any of the skill's supporting files — i.e. the scripts or HTML
or schemas — in any detail.
As part of this change — the first shared addition to the .claude
directory — I've changed the gitignore rules to only ignore local scope
(definitions given in [1]).
A few things that could be improved in the future (I had to draw a line
under this task at some point):
- the review app for some reason requires that you click twice on the
"Flag" or "Approve" button before it collapses the element
- the review app's exported Markdown file's references are done by line
number, which is a slightly meaningless value given that we're
inserting new code into the file as part of translation; switch it to
use IDs like the JSON example
- make the review app accept multi-line comments
- we may be able to simplify the test harness by instead using Swift's
"MainActor isolation by default" mode
- thinking about how to restructure the skill so that it can be extended
to translating other languages (see PR comment [2])
Note that I've chosen to favour an `async` / `await` approach (bridged
with continuations) instead of nested callbacks. Having experimented
with both approaches, I concluded that this is the better of the two.
The bridging boilerplate is repetitive but local — each continuation is
a self-contained block that's easy to skim past. The structural benefit
is that the overall control flow becomes linear and readable, matching
the JS; this makes things easier for users to understand and for us to
review, in particular in more complicated examples that do things like
loading multiple history pages in a loop.
(Note: An earlier version of this skill was already used in b49924d —
that commit is a bit messed up by a botched rebase, it seems — before
being properly introduced here. I've updated the harness comments
introduced there to be in line with the format used here, and switched
from using `any Error` to `Error` to be consistent with what we've done
here.)
[1] https://code.claude.com/docs/en/settings#configuration-scopes
[2] #3192 (comment)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This demonstrates the translation skill added in 1ab2a37. I've reviewed the translations.
2cdab26 to
d8f4754
Compare
|
OK, cool. I'm going to merge this PR now because it's something that I don't have further time to work on at the moment but which will be useful for upcoming, higher-priority work — let's pick the conversation up again after Easter |
Description
Replaces #3192.
Adds a
/translate-examples-to-swiftClaude skill. When invoked as, for example:it will translate all of the referenced examples to Swift, making sure to produce code which has been verified to compile. It also runs an independent verification subagent which reviews the correctness of the translation and performs an second compilation attempt. It then produces a webpage with a UI for a human to review the translations.
There's probably plenty of improvement that can be done to this skill still, but I need to draw a line under it and move on to other stuff; we can iterate in the future.
I've then used this skill to translate one of the AI Transport example files. I decided not to translate them all in one go in order to reduce review burden and make sure everyone is happy with the approach.
See commit messages for more (many, many more 😅) details, if you're so inclined. I'm not expecting — or, to be honest — hoping for — a large amount of feedback on the skill itself; I think it's largely a "something is better than nothing" thing, and it's been through quite a lot of iteration already.