Add voice input + async transcription(#547)#597
Conversation
|
Thanks for the PR. Tried it out and read through it. Overall it's in good shape and does what the issue asks. A few things I'd like to see changed before merge: Main thing: the voice route is doing too much. (Same suggestion as in #595 (comment)) Right now Smaller things:
The tests are thorough, nice job there. Nothing here is a big deal except the first point about moving logic out of the route. |
Thanks, all fair. Agree on moving the logic out of the route. Two quick checks before I
The rest I'll fix: single source for the allowed formats, clean up the orphaned audio |
|
On the service yeah, let the voice method take the session and own the saving, job creation and dispatch. That's still fine with the layering, the route stays thin and the service does the work through the repos. Keep build_voice_input pure like the text one, just add the session-aware method on top of it. personally I preference to move the text input route logic into the service too, so both text and voice are handled the same way instead of one in the route and one in the service. But that's text input which is already merged, so it's a bit outside this PR. Up to you if you'd rather keep this PR focused on voice and leave text as is, that's completely fine, just leaving it as an option. On the error code, yes go with STT_UNAVAILABLE and update the contract too. Only change it for the voice path the contract, the voice route and the task. Leave the extraction one as LLM_UNAVAILABLE since that one really is the LLM. forms.py is already on STT_UNAVAILABLE so it all lines up after that. Rest of the list sounds good. |
Closes #547.
Adds voice input with async transcription — POST /api/v1/input/voice. The endpoint saves
the audio, creates a queued Input and a transcription Job, and dispatches a Celery task
that runs Whisper and moves the records through their states. The client polls
GET /input/{input_id} (from #546) for the transcript. Built to the spec settled on the
issue thread.
Key points, following the issue discussion:
Whisper reuse, not copy-paste: extracted the Whisper /asr call into a new shared
module app/services/whisper.py (call_whisper_asr). Both /forms/transcribe and the new
transcription task call it — /forms/transcribe is otherwise unchanged (same signature,
same 503/502 codes), and its existing tests still pass. /forms/transcribe is left in
place per the discussion (to be replaced properly in a later PR).
Job: Option A — reuses the existing Job model with job_type="transcription" and the
form-fill columns (template_id/input_text/model) left null. The Job is linked to the
Input by passing input_id as a task argument (no new column). Job.result_url is the
input poll URL. Input is the primary record; Job is secondary tracking. Form.job_id/
Report.job_id are untouched — the proper contract Job stays a separate (models for v1 API contract #544) concern.
No migration.
Ordering / race fix: the Job is created before dispatch (placeholder celery_task_id),
then the task is dispatched with input_id + job_id, then celery_task_id is backfilled.
The task finds itself by job_id, not celery_task_id. (The same race exists on the fill
path — left alone, as chetanr25 is handling that separately.)
Status transitions: start → Input=transcribing, Job=processing; success →
Input=ready + transcript/counts, Job=completed + result_url; failure → Input=failed +
error_detail, Job=failed + error{error_code, message}, distinguishing ConnectionError
(LLM_UNAVAILABLE) from RuntimeError (TRANSCRIPTION_FAILED).
Storage: audio saved to {DATA_DIR}/audio/{input_id}.{ext}, path passed to the task,
not stored in the DB. No disposal logic (a later discussion).
Validation: 415 UNSUPPORTED_FORMAT (extension check), 413 FILE_TOO_LARGE (>500MB), 503
LLM_UNAVAILABLE (Whisper availability check before queuing).
Tests: 145 passing — 11 endpoint tests (dispatch mocked, no broker/Whisper/filesystem)
and 7 task unit tests (task called directly with an injected test session and mocked
Whisper, covering success, both failure types, and the transcribing→ready ordering).
No migration (Option A).