diff --git a/README.md b/README.md index 378a105..e2bcb13 100644 --- a/README.md +++ b/README.md @@ -438,6 +438,11 @@ Each `model_id` has a fixed receptive field \(R\): - **model 4**: \(R=13\) - **model 5**: \(R=13\) +#### Training recommendations + +- **Models 1, 4, 5 (uncorrelated matching):** Train for at least **100 epochs**. Fewer epochs will yield under-trained models. +- **Shots per epoch:** Use **67 million** shots per epoch when training with 8 GPUs (`PREDECODER_TRAIN_SAMPLES=67108864`). Using fewer shots per epoch produces worse results. + #### Distance / rounds semantics - Top-level `distance` / `n_rounds` are the **evaluation targets** (what you care about in inference). @@ -563,6 +568,8 @@ The five grouped totals are: - If `max_group >= 6e-3`: parameters are **not** modified (the training log emits a warning in case this indicates a configuration error). - Non-surface-code types (`code_type != "surface_code"`) are never upscaled. +**Algorithm in brief:** The pipeline stores `p_max = max(P_prep, P_meas, P_idle_cnot, P_idle_spam, P_cnot)` from the full 25-parameter noise vector and rescales the entire vector by `0.006 / p_max` so that `p_max` is raised to **0.6%** (6 × 10⁻³). The original noise model is preserved unchanged for evaluation. + We have found that training on denser syndromes and then evaluating on sparser data produces better results than training directly on sparse data. #### Skipping noise upscaling diff --git a/TRAINING.md b/TRAINING.md index a8542ae..d8ae711 100644 --- a/TRAINING.md +++ b/TRAINING.md @@ -141,8 +141,8 @@ export CONFIG_NAME=config_qec_decoder_r13_fp8 | Variable | Default | Description | |----------|---------|-------------| -| `PREDECODER_TRAIN_EPOCHS` | `100` | Total number of training epochs. | -| `PREDECODER_TRAIN_SAMPLES` | config-defined | Samples per epoch. Bypasses auto-scaling when set explicitly. | +| `PREDECODER_TRAIN_EPOCHS` | `100` | Total number of training epochs. For models 1, 4, 5 (uncorrelated matching), use at least **100** epochs; fewer epochs will yield under-trained models. | +| `PREDECODER_TRAIN_SAMPLES` | config-defined | Samples per epoch. Bypasses auto-scaling when set explicitly. For best results with 8 GPUs, use **67 million** shots per epoch (`67108864`); fewer shots per epoch will produce worse results. | | `PREDECODER_LR_MILESTONES` | config-defined | Comma-separated LR schedule milestone fractions (e.g. `0.25,0.5,1.0`). | | `PREDECODER_TIMING_RUN` | unset | Set `1` for timing/benchmarking mode (disables some overhead). | | `PREDECODER_TORCH_COMPILE` | `0` when run via `sbatch_train.sh`, otherwise unset | `0` to disable `torch.compile`, `1` to enable. |