TinyModel/plan.txt at main · HyperlinksSpace/TinyModel · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
>>> Actual

Further development steps' 1 and 2 exits

>>> Old plan (to think about):

Short-term plan — start now (aligned with Phase A: harden the encoder lane)

1) Ship and verify the pipeline — mostly implemented
   - Kaggle → Hugging Face workflow was hardened and validated through multiple failure classes (auth, kernel conflict, status parsing, retries), and now reaches train/eval/publish path reliably.
   - Remaining routine work: periodic re-runs for regression checks and verifying Space/model release pair per version.

2) Tighten evaluation (before scaling data or model size) — implemented
   - `scripts/train_tinymodel1_classifier.py` now reports accuracy, macro/weighted F1, per-class F1, and writes `eval_report.json` (confusion matrix + reproducibility block).
   - How split + seed work: `texts/eval-reproducibility.md`.
   - Instant test (fast CPU run, ~30s): `python scripts/train_tinymodel1_classifier.py --output-dir artifacts/eval-smoke --max-train-samples 120 --max-eval-samples 80 --epochs 1 --batch-size 8 --seed 42` then open `artifacts/eval-smoke/eval_report.json`.

3) Second dataset or second task (same encoder family) — implemented
   - Hub [`emotion`](https://huggingface.co/datasets/emotion) wired via `scripts/train_tinymodel1_emotion.py` (preset over `train_tinymodel1_classifier.py`). README: section **Second reference dataset (Emotion)**.
   - Instant test: `python scripts/train_tinymodel1_emotion.py --output-dir artifacts/emotion-smoke --max-train-samples 200 --max-eval-samples 100 --epochs 1 --batch-size 8 --seed 42` then open `artifacts/emotion-smoke/eval_report.json`.

4) Embeddings smoke test (product-shaped) — implemented
   - `scripts/embeddings_smoke_test.py` exercises classify / similarity / retrieve (triage-style copy). README: **Embeddings smoke test**.
   - Test: train `artifacts/eval-smoke` then `python scripts/embeddings_smoke_test.py --model artifacts/eval-smoke` (or `--model HyperlinksSpace/TinyModel1`).

5) Optional quick win: pretrained encoder fine-tune — implemented
   - `scripts/finetune_pretrained_classifier.py` (default `distilbert-base-uncased`). Compare `eval_report.json` to scratch runs with same `--seed` and caps. README: **Pretrained encoder fine-tune**.
   - Test: command block in README (`artifacts/finetune-smoke`).

6) Data hygiene (lightweight) — implemented
   - `texts/labeling-and-data-hygiene.md` (label guide template, versioning, leakage). README cross-link under **Custom labels and data hygiene**.

Out of scope for this short list — do not start yet until (1)–(3) are stable: decoder LLM training, RAG production stack, or multimodal — those build on eval + serving discipline from the steps above.