Update readme.md

KindXiaoming · web-flow · commit 767561b2f36a · 2024-12-05T20:01:48.000-05:00
diff --git a/GPT2/readme.md b/GPT2/readme.md
@@ -6,6 +6,6 @@
 
 # Notice
 * The code is based on [sophia repo](https://github.com/Liuhong99/Sophia/tree/main), which in turn is based on [nanogpt](https://github.com/karpathy/nanoGPT/). The training pipeline might be unnecessarily complicated for our purposes (a lot of parallelization etc.).
-* My major changes (relevant to harmonic losses) are in `model_l2loss.py` and highlighted with comments "Ziming's note".
+* My major changes (relevant to harmonic losses) are in `model_l2loss.py` and highlighted with comments "Ziming's note". The standard transformer is in `model.py`. The line in `train_adam_l2loss.py`, which is `from model_l2loss import GPT, GPTConfig`, specifies that we're using GPT with harmonic similarity. To use standard GPT, change the line to `from model import GPT, GPTConfig`.
 * To change configurations, e.g., the size of the network, go to  `config/train_gpt2_small_adam_l2loss.py`. Although there are some hyperparameters being set up at the beginning of `train_adam_l2loss.py`, these hyperparameters are later overwritten by `config/train_gpt2_small_adam_l2loss.py`.
-* Given the complexity of the training code, I suspect a faster way to kickstart is using the `GPT` model in `model_l2loss.py`, writing training loops by oneself without caring to read other files.
+* Given the complexity of the training code, I suspect a faster way to kickstart is playing with the `GPT` model in `model_l2loss.py` and `model.py`, writing training loops by oneself without caring to read other files.