-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathplan.txt
More file actions
33 lines (23 loc) · 2.66 KB
/
plan.txt
File metadata and controls
33 lines (23 loc) · 2.66 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
>>> Actual
Further development steps' 1 and 2 exits
>>> Old plan (to think about):
Short-term plan — start now (aligned with Phase A: harden the encoder lane)
1) Ship and verify the pipeline — mostly implemented
- Kaggle → Hugging Face workflow was hardened and validated through multiple failure classes (auth, kernel conflict, status parsing, retries), and now reaches train/eval/publish path reliably.
- Remaining routine work: periodic re-runs for regression checks and verifying Space/model release pair per version.
2) Tighten evaluation (before scaling data or model size) — implemented
- `scripts/train_tinymodel1_classifier.py` now reports accuracy, macro/weighted F1, per-class F1, and writes `eval_report.json` (confusion matrix + reproducibility block).
- How split + seed work: `texts/eval-reproducibility.md`.
- Instant test (fast CPU run, ~30s): `python scripts/train_tinymodel1_classifier.py --output-dir artifacts/eval-smoke --max-train-samples 120 --max-eval-samples 80 --epochs 1 --batch-size 8 --seed 42` then open `artifacts/eval-smoke/eval_report.json`.
3) Second dataset or second task (same encoder family) — implemented
- Hub [`emotion`](https://huggingface.co/datasets/emotion) wired via `scripts/train_tinymodel1_emotion.py` (preset over `train_tinymodel1_classifier.py`). README: section **Second reference dataset (Emotion)**.
- Instant test: `python scripts/train_tinymodel1_emotion.py --output-dir artifacts/emotion-smoke --max-train-samples 200 --max-eval-samples 100 --epochs 1 --batch-size 8 --seed 42` then open `artifacts/emotion-smoke/eval_report.json`.
4) Embeddings smoke test (product-shaped) — implemented
- `scripts/embeddings_smoke_test.py` exercises classify / similarity / retrieve (triage-style copy). README: **Embeddings smoke test**.
- Test: train `artifacts/eval-smoke` then `python scripts/embeddings_smoke_test.py --model artifacts/eval-smoke` (or `--model HyperlinksSpace/TinyModel1`).
5) Optional quick win: pretrained encoder fine-tune — implemented
- `scripts/finetune_pretrained_classifier.py` (default `distilbert-base-uncased`). Compare `eval_report.json` to scratch runs with same `--seed` and caps. README: **Pretrained encoder fine-tune**.
- Test: command block in README (`artifacts/finetune-smoke`).
6) Data hygiene (lightweight) — implemented
- `texts/labeling-and-data-hygiene.md` (label guide template, versioning, leakage). README cross-link under **Custom labels and data hygiene**.
Out of scope for this short list — do not start yet until (1)–(3) are stable: decoder LLM training, RAG production stack, or multimodal — those build on eval + serving discipline from the steps above.