@ziuchen/doubaoime-asr

中文 | English

Node.js client for Doubao IME (豆包输入法) Automatic Speech Recognition (ASR) service.

Node.js rewrite based on doubaoime-asr (Python), implemented using Node.js >= 24 native capabilities.

Requires Node.js >= 24

Disclaimer

This project is based on protocol analysis and reference to the Android Doubao IME client. It is NOT an official API.

For learning and research purposes only
No guarantee of future availability or stability
Server-side protocols may change at any time

Features

Speech Recognition — File-based and real-time streaming ASR
Named Entity Recognition (NER) — Extract entities from text via Wave-encrypted API
Device Registration — Automatic device credential management
Wave Encryption — ECDH key exchange + ChaCha20 stream cipher (fully native crypto)
Protobuf-es — Type-safe message encoding/decoding via @bufbuild/protobuf
CLI Tool — Command-line interface for quick transcription
Minimal Dependencies — Only ws, cac, @bufbuild/protobuf, and @evan/opus at runtime

Installation

pnpm add @ziuchen/doubaoime-asr

After installation, you can use the CLI globally via the doubaoime-asr command, or directly via npx:

npx @ziuchen/doubaoime-asr transcribe audio.wav -c credentials.json

Quick Start

Library Usage

import { DoubaoASR, ASRConfig } from '@ziuchen/doubaoime-asr'
import { Encoder } from '@evan/opus'

const encoder = new Encoder({ sample_rate: 16000, channels: 1, application: 'voip' })
const config = new ASRConfig({
  credentialPath: './credentials.json', // Auto-register and cache
  opusEncoder: { encode: (pcm) => Buffer.from(encoder.encode(pcm)) },
})

const asr = new DoubaoASR(config)

// Simple transcription
const text = await asr.transcribe('audio.wav')
console.log(text)

// Streaming with interim results
for await (const resp of asr.transcribeStream('audio.wav')) {
  console.log(resp.type, resp.text)
}

// Real-time from microphone (provide your own PCM source)
for await (const resp of asr.transcribeRealtime(audioSource)) {
  console.log(resp.type, resp.text)
}

Passing Credentials as Object

When used as a library, you can pass credentials directly as a JS object instead of a file path:

import { ASRConfig, registerDevice, getAsrToken } from '@ziuchen/doubaoime-asr'
import { Encoder } from '@evan/opus'

// Obtain credentials programmatically
const creds = await registerDevice()
const token = await getAsrToken(creds.deviceId!, creds.cdid)

const encoder = new Encoder({ sample_rate: 16000, channels: 1, application: 'voip' })
const config = new ASRConfig({
  credentials: { ...creds, token },
  opusEncoder: { encode: (pcm) => Buffer.from(encoder.encode(pcm)) },
})

CLI Usage

# Register device & save credentials
doubaoime-asr register -o credentials.json

# Transcribe audio file
doubaoime-asr transcribe audio.wav -c credentials.json

# Named Entity Recognition
doubaoime-asr ner "明天北京天气怎么样" -c credentials.json

# Real-time recognition (from microphone)
doubaoime-asr listen --list-devices

# Help
doubaoime-asr --help

Environment Variables

Variable	Description
`DOUBAO_CREDENTIAL_PATH`	Credential file path
`DOUBAO_DEVICE_ID`	Device ID (skip registration)
`DOUBAO_TOKEN`	ASR token (skip token fetch)

API Reference

`DoubaoASR`

Main client class.

Method	Description
`transcribe(audio, options?)`	Transcribe audio file or PCM buffer → text
`transcribeStream(audio, options?)`	Streaming transcription → `AsyncGenerator<ASRResponse>`
`transcribeRealtime(source)`	Real-time streaming from `AsyncIterable<Buffer>`

`ASRConfig`

Configuration class with automatic credential management.

Option	Default	Description
`credentialPath`	—	Path to credential JSON file
`credentials`	—	`DeviceCredentials` object (takes precedence over file)
`deviceId`	—	Device ID (auto-register if empty)
`token`	—	ASR token (auto-fetch if empty)
`opusEncoder`	—	Opus encoder instance (required)
`sampleRate`	`16000`	Audio sample rate (Hz)
`channels`	`1`	Audio channels
`frameDurationMs`	`20`	Frame duration (ms)
`enablePunctuation`	`true`	Enable punctuation
`connectTimeout`	`10000`	Connection timeout (ms)
`recvTimeout`	`10000`	Receive timeout (ms)

Convenience Functions

import { transcribe, transcribeStream, transcribeRealtime, ner } from '@ziuchen/doubaoime-asr'

Other Exports

registerDevice() — Manual device registration
getAsrToken(deviceId) — Manual token retrieval
getSamiToken(cdid?) — SAMI token for NER service
WaveClient — Wave encryption protocol client
parseWavFile(path) / parseWavBuffer(buf) — WAV parsing utilities
parseResponse(data) — Parse protobuf ASR response
isJwtExpired(token) — JWT expiry check
chacha20Crypt(key, nonce, data) — ChaCha20 encrypt/decrypt
md5Hex(data) — MD5 hash (uppercase hex)

Audio Requirements

Format: WAV (PCM, 16-bit)
Sample rate: 16000 Hz (default, configurable)
Channels: Mono (default, stereo auto-converted)

ffmpeg -i input.mp3 -ar 16000 -ac 1 -f wav output.wav

Native Capabilities

This project maximizes use of Node.js built-in APIs:

Capability	Implementation
HTTP requests	Native `fetch`
Crypto (ECDH/HKDF/ChaCha20/ECDSA/MD5)	Native `crypto`
UUID generation	`crypto.randomUUID()`
WAV parsing	Manual implementation
File system	Native `fs`

Runtime dependencies: ws (WebSocket with custom headers), cac (CLI), @bufbuild/protobuf (protobuf encoding), @evan/opus (Opus audio encoding, Wasm).

Examples

See the examples/ directory for runnable scripts:

Example	Description
`examples/file-transcribe.ts`	File-based transcription (auto-downloads sample audio)
`examples/ner.ts`	Named Entity Recognition
`examples/credentials.ts`	Three ways to manage credentials

npx tsx examples/file-transcribe.ts          # uses sample audio
npx tsx examples/file-transcribe.ts my.wav   # custom file
npx tsx examples/ner.ts "明天北京天气怎么样"

Development

pnpm install
pnpm test           # Run tests
pnpm run typecheck  # Type check
pnpm run build      # Build

# Regenerate protobuf (requires protoc + protoc-gen-es)
protoc --es_out=src/gen --es_opt=target=ts --plugin=protoc-gen-es=node_modules/.bin/protoc-gen-es proto/asr.proto

Reference Implementation

The Python reference implementation is maintained as a git submodule under the refs/@ziuchen/doubaoime-asr directory.

git submodule update --init

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
examples		examples
proto		proto
refs		refs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.npmrc		.npmrc
README.md		README.md
README.zh-CN.md		README.zh-CN.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
tsdown.config.ts		tsdown.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@ziuchen/doubaoime-asr

Disclaimer

Features

Installation

Quick Start

Library Usage

Passing Credentials as Object

CLI Usage

Environment Variables

API Reference

`DoubaoASR`

`ASRConfig`

Convenience Functions

Other Exports

Audio Requirements

Native Capabilities

Examples

Development

Reference Implementation

License

About

Uh oh!

Releases 2

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

@ziuchen/doubaoime-asr

Disclaimer

Features

Installation

Quick Start

Library Usage

Passing Credentials as Object

CLI Usage

Environment Variables

API Reference

DoubaoASR

ASRConfig

Convenience Functions

Other Exports

Audio Requirements

Native Capabilities

Examples

Development

Reference Implementation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors

Uh oh!

Languages

`DoubaoASR`

`ASRConfig`