Skip to content
Merged

Dev #24

Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
87744d2
[update] README.md add model list
dianjixz May 12, 2025
7d392a3
Merge branch 'dev' of github.com:m5stack/StackFlow into dev
dianjixz May 12, 2025
10e4bdf
Refactor SOLA component code
yuyun2000 May 15, 2025
ebf908a
Merge branch 'dev' into opt/melotts
yuyun2000 May 15, 2025
6a96f35
Merge pull request #1 from yuyun2000/opt/melotts
yuyun2000 May 15, 2025
74c41a3
Add text normalization for Chinese, Japanese, and English
yuyun2000 May 16, 2025
0619178
Merge pull request #2 from yuyun2000/opt/melotts
yuyun2000 May 16, 2025
e479b19
Merge pull request #18 from yuyun2000/dev
Abandon-ht May 16, 2025
9a20dd0
[update] update melotts, update static_lib verison
May 16, 2025
daeaf4b
[update] update lib-llm version, update melotts model version.
May 16, 2025
00c0533
[update] update libonnxruntime.so
May 16, 2025
f775786
[update] add en-au, en-br, en-india, en-us model. Format code.
May 20, 2025
8acb179
[fix] Handles the situation where Either tagger or verbalizer file do…
May 20, 2025
f67506c
[update] update melotts-es-es model
May 20, 2025
764bca1
[update] update model list
May 20, 2025
aa10381
add trigger method to llm_kws
nyasu3w Jun 3, 2025
2f63527
Merge pull request #21 from nyasu3w/pr/trigger_kws
dianjixz Jun 5, 2025
b9401b2
[update] llm trigger Standardization.
dianjixz Jun 5, 2025
61c69a3
[update] update docs
Jun 10, 2025
2d0cd69
[update] vlm add task_camera_data
Jun 18, 2025
357a6f1
[update] llm-camera axera camera add custom_config
dianjixz Jun 23, 2025
43906d9
Merge branch 'dev' of github.com:m5stack/StackFlow into dev
dianjixz Jun 23, 2025
57b1437
[update] depth_anything use async inference, move ax_engine init.fix …
Jun 24, 2025
b7e62dd
[update] update llm-depth-anything version, llm-yolo version. fix lib…
Jun 25, 2025
592fd9e
[update] update llm-camera version
Jun 25, 2025
cccddd2
[update] update llm-vlm version
Jun 26, 2025
b0743f0
[update] update model list & add npu1 model.
Jun 27, 2025
629e822
[update] update docs
Jun 27, 2025
d29e074
[update] update ax650 model config, melotts model.
Jun 27, 2025
90fae78
[update] main_audio add 630c kit default param && StackFlow add send_…
dianjixz Jul 1, 2025
5995886
[update] main_audio add tinyalsa API cap function.
Jul 1, 2025
9abe069
[update] KWS sets multiple keywords, fix melotts
Jul 18, 2025
c25b4f0
[fix] Fix caching causing audio issues
Jul 23, 2025
cfbfd62
[update] update docs
Aug 14, 2025
a916ca0
[update] Reduce buffer frames
Aug 21, 2025
73c4a49
[update] ModuleLLM support ctx model, add HomeAssistant model, add mo…
Aug 22, 2025
9167b6e
[update] update llm_vlm encoder. update audio cache.
Aug 26, 2025
9a14d45
[update] support ax650. add ax650 model.
Aug 27, 2025
9d816fe
[update] ensure that a frame is written
Aug 28, 2025
92b10ac
[update] add internvl3-1B-ax630c model update main_vlm
Aug 29, 2025
57404bc
[update] add internvl3-1B config file, update postprocess.
Aug 29, 2025
2de874c
[update] update llm & vlm
Sep 3, 2025
b6d6e95
[update] move public include into static_lib, update llm & vlm
Sep 4, 2025
1df8ab9
[update] update model list
Sep 4, 2025
e628093
[update] update asr kws llm vlm vad whisper melotts version
Sep 4, 2025
a7d82af
[fix] fix alsa audio cap
Sep 8, 2025
bb48236
[update] add cosyvioce2
Sep 15, 2025
2d064fd
[update] update cosy_voice
Sep 15, 2025
52a09b6
[update] add new kws unit
Sep 17, 2025
01d6715
[update] update cosy_voice & new kws
Sep 23, 2025
2a5c139
update static version
Abandon-ht Sep 23, 2025
d489723
[update] clean code
Sep 25, 2025
27a16a4
[update] llm-openai-version fix kws
Sep 28, 2025
7a97143
[update] update sdk version & chip name
Sep 29, 2025
0e14999
[fix] Fix inference issues caused by memory synchronization
Oct 16, 2025
f9de469
[update] update CosyVoice2
Nov 3, 2025
4e3d7f3
[update] fix llm generate bug
Nov 4, 2025
a3d0913
[update] update model config
Nov 5, 2025
423427d
[update] update model ctx len
Nov 5, 2025
3a16259
[fix] pzmq close wait
dianjixz Nov 7, 2025
07c964c
[update] update software version
Nov 7, 2025
bd14152
[update] update ax_msp kconfig bsp version,pzmq add NotAction dec,sta…
dianjixz Nov 10, 2025
6d7ae90
[add] ax650_ec_proxy
dianjixz Nov 10, 2025
0d3e36f
[update] vlm support qwen3-vl model, add qwen3-vl-2b model. update pz…
Nov 14, 2025
324f04d
[update] update llm-vlm version & model config
Nov 21, 2025
bd4c03e
[fix] Fix cosyvoice Deinit bug
Nov 21, 2025
9e70ce8
[update] update llm-llm llm-cosyvoice version
Nov 21, 2025
8712328
[update] Add qwen3-vl-2B-Init4-ax630c model
Nov 26, 2025
cc1087f
[update] fix postprocess Div zero bug, update llm-openai-api, update …
Nov 27, 2025
4bc10a7
[fix] pzmq creat error
dianjixz Dec 3, 2025
f12314e
[update] del ec_prox
dianjixz Dec 3, 2025
e96fcf4
[update] llm_asr suooported sensevoice, update llm_audio supported al…
Dec 9, 2025
a674665
[update] kws supported custom 'hi m5' keywords
Dec 9, 2025
cc9d1bc
[update] perf llm backend & add c tokenizer
Dec 18, 2025
cde5921
[update] add legacy llm backend
Dec 18, 2025
50e3609
[update] Reduce model loading time. Optimize model loading method
Dec 18, 2025
ea7ddd0
[update] Add CosyVoice tokenizer server timeout
Dec 18, 2025
d5685d4
[update] kws support axmodel
Dec 18, 2025
bea45ab
[update] llm_asr supported zipformer stream model.
Dec 18, 2025
3f608d2
[update] add asr, kws model config
Dec 19, 2025
eff3a47
[update] perf llm-asr, kws add buttons control.
Dec 19, 2025
7b671f2
[update] update melotts play stop cap
Dec 22, 2025
3ab3d87
[update] update package version
Dec 26, 2025
9c7ba31
[update] update llm-asr & model config
Dec 26, 2025
312791e
[update] update llm-openai-api version
Dec 26, 2025
9f34887
[update] llm-model-audio version
Jan 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[fix] Handles the situation where Either tagger or verbalizer file do…
…es not exist.
  • Loading branch information
LittleMouse committed May 20, 2025
commit 8acb179b634afc26c1a723c8dd948524918e65bb
12 changes: 8 additions & 4 deletions projects/llm_framework/main_melotts/src/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -183,10 +183,14 @@ class llm_task {
awake_delay_ = config_body["awake_delay"].get<int>();
else if (file_body["mode_param"].contains("awake_delay"))
awake_delay_ = file_body["mode_param"]["awake_delay"];
// Load lexicon
lexicon_ = std::make_unique<Lexicon>(mode_config_.lexicon, mode_config_.tokens, mode_config_.tagger,
mode_config_.verbalizer);
// Read g.bin

if (!std::filesystem::exists(mode_config_.tagger) || !std::filesystem::exists(mode_config_.verbalizer)) {
SLOGW("Either tagger or verbalizer file does not exist, using alternative lexicon.");
lexicon_ = std::make_unique<Lexicon>(mode_config_.lexicon, mode_config_.tokens);
} else {
lexicon_ = std::make_unique<Lexicon>(mode_config_.lexicon, mode_config_.tokens, mode_config_.tagger,
mode_config_.verbalizer);
}
g_matrix.resize(256, 0);
FILE *fp = fopen(mode_config_.gbin.c_str(), "rb");
if (!fp) {
Expand Down
78 changes: 70 additions & 8 deletions projects/llm_framework/main_melotts/src/runner/Lexicon.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ class Lexicon {
std::pair<std::vector<int>, std::vector<int>> unknown_token;
std::unordered_map<int, std::string> reverse_tokens;

wetext::Processor* m_processor;
wetext::Processor* m_processor = nullptr;

public:
Lexicon(const std::string& lexicon_filename, const std::string& tokens_filename, const std::string& tagger_filename,
Expand Down Expand Up @@ -96,6 +96,65 @@ class Lexicon {
max_phrase_length);
}

Lexicon(const std::string& lexicon_filename, const std::string& tokens_filename) : max_phrase_length(0)
{
SLOGD("Dictionary loading: %s Pronunciation table loading: %s", tokens_filename.c_str(),
lexicon_filename.c_str());

std::unordered_map<std::string, int> tokens;
std::ifstream ifs(tokens_filename);
assert(ifs.is_open());
std::string line;
while (std::getline(ifs, line)) {
auto splitted_line = split(line, ' ');
if (splitted_line.size() >= 2) {
int token_id = std::stoi(splitted_line[1]);
tokens.insert({splitted_line[0], token_id});
reverse_tokens[token_id] = splitted_line[0];
}
}
ifs.close();
ifs.open(lexicon_filename);
assert(ifs.is_open());
while (std::getline(ifs, line)) {
auto splitted_line = split(line, ' ');
if (splitted_line.empty()) continue;
std::string word_or_phrase = splitted_line[0];
auto chars = splitEachChar(word_or_phrase);
max_phrase_length = std::max(max_phrase_length, chars.size());
size_t phone_tone_len = splitted_line.size() - 1;
size_t half_len = phone_tone_len / 2;
std::vector<int> phones, tones;
for (size_t i = 0; i < phone_tone_len; i++) {
auto phone_or_tone = splitted_line[i + 1];
if (i < half_len) {
if (tokens.find(phone_or_tone) != tokens.end()) {
phones.push_back(tokens[phone_or_tone]);
}
} else {
tones.push_back(std::stoi(phone_or_tone));
}
}
lexicon[word_or_phrase] = std::make_pair(phones, tones);
}
const std::vector<std::string> punctuation{"!", "?", "…", ",", ".", "'", "-"};
for (const auto& p : punctuation) {
if (tokens.find(p) != tokens.end()) {
int i = tokens[p];
lexicon[p] = std::make_pair(std::vector<int>{i}, std::vector<int>{0});
}
}
assert(tokens.find("_") != tokens.end());
unknown_token = std::make_pair(std::vector<int>{tokens["_"]}, std::vector<int>{0});
lexicon[" "] = unknown_token;
lexicon[","] = lexicon[","];
lexicon["。"] = lexicon["."];
lexicon["!"] = lexicon["!"];
lexicon["?"] = lexicon["?"];
SLOGD("Dictionary loading complete, containing %zu entries, longest phrase length: %zu", lexicon.size(),
max_phrase_length);
}

std::vector<std::string> splitEachChar(const std::string& text)
{
std::vector<std::string> words;
Expand Down Expand Up @@ -195,14 +254,17 @@ class Lexicon {
{
SLOGD("\nStarting text processing: \"%s\"", text.c_str());

std::string taggedText = m_processor->Tag(text);
SLOGD("\taggedText processing: \"%s\"", taggedText.c_str());
std::string normalizedText = m_processor->Verbalize(taggedText);
SLOGD("\normalizedText processing: \"%s\"", normalizedText.c_str());
std::string normalizedText;
if (m_processor) {
std::string taggedText = m_processor->Tag(text);
SLOGD("\taggedText processing: \"%s\"", taggedText.c_str());
normalizedText = m_processor->Verbalize(taggedText);
SLOGD("\tnormalizedText processing: \"%s\"", normalizedText.c_str());
} else {
SLOGD("m_processor is not initialized, skipping tag and verbalize steps.");
normalizedText = text;
}

SLOGD("=======Matching Results=======");
SLOGD("Unit\t|\tPhonemes\t|\tTones");
SLOGD("-----------------------------");
phones.insert(phones.end(), unknown_token.first.begin(), unknown_token.first.end());
tones.insert(tones.end(), unknown_token.second.begin(), unknown_token.second.end());
SLOGD("<BOS>\t|\t%s\t|\t%s", phonesToString(unknown_token.first).c_str(),
Expand Down
Loading