0919 change examples for Qwen2.5-Coder

cyente · cyente · commit 1b3785a0c736 · 2024-09-19T02:04:22.000+08:00
diff --git a/README.md b/README.md
@@ -44,7 +44,16 @@ This update focuses on two main improvements: scaling up the code training data
 > We updates both the special tokens and their corresponding token ids, in order to maintain consistency with Qwen2.5. The new special tokens are as the following:
 
 ```json
-{'<|fim_prefix|>': 151659, '<|fim_middle|>': 151660, '<|fim_suffix|>': 151661, '<|fim_pad|>': 151662, '<|repo_name|>': 151663, '<|file_sep|>': 151664, '<|im_start|>': 151644, '<|im_end|>': 151645}
+{
+  "<|fim_prefix|>": 151659, 
+  "<|fim_middle|>": 151660, 
+  "<|fim_suffix|>": 151661, 
+  "<|fim_pad|>": 151662, 
+  "<|repo_name|>": 151663, 
+  "<|file_sep|>": 151664, 
+  "<|im_start|>": 151644, 
+  "<|im_end|>": 151645
+}
 ```
 
 | model name                  | type     | length | Download                                                                                                                                            |
@@ -76,9 +85,9 @@ pip install -r requirements.txt
 ## Quick Start
 
 > [!Important]
-> **Qwen2.5-Coder-xB-Chat** are instruction models for chatting;
+> **Qwen2.5-Coder-\[1.5-7\]B-Instrcut** are instruction models for chatting;
 >
-> **Qwen2.5-Coder-xB** is a base model typically used for completion, serving as a better starting point for fine-tuning.
+> **Qwen2.5-Coder-\[1.5-7\]B** is a base model typically used for completion, serving as a better starting point for fine-tuning.
 > 
 ### 👉🏻 Chat with Qwen2.5-Coder-7B-Instruct
 You can just write several lines of code with `transformers` to chat with Qwen2.5-Coder-7B-Instruct. Essentially, we build the tokenizer and the model with `from_pretrained` method, and we use generate method to perform chatting with the help of chat template provided by the tokenizer. Below is an example of how to chat with Qwen2.5-Coder-7B-Instruct:
@@ -152,7 +161,7 @@ The `max_new_tokens` argument is used to set the maximum length of the response.
 The `input_text` could be any text that you would like model to continue with.
 
 
-#### 2.Processing Long Texts
+#### 2. Processing Long Texts
 
 The current `config.json` is set for context length up to 32,768 tokens.
 To handle extensive inputs exceeding 32,768 tokens, we utilize [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
@@ -371,18 +380,26 @@ llm = LLM(model="Qwen/Qwen2.5-Coder-7B", tensor_parallel_size=4)
 
 
 ## Performance
-see blog <a href="https://qwenlm.github.io/blog/qwen2.5-coder"> 📑 blog</a>.
+see blog first <a href="https://qwenlm.github.io/blog/qwen2.5-coder"> 📑 blog</a>.
+
 
 
 ## Citation
 If you find our work helpful, feel free to give us a cite.
 
 ```bibtex
-@article{qwen,
-  title={Qwen Technical Report},
-  author={Jinze Bai and Shuai Bai and Yunfei Chu and Zeyu Cui and Kai Dang and Xiaodong Deng and Yang Fan and Wenbin Ge and Yu Han and Fei Huang and Binyuan Hui and Luo Ji and Mei Li and Junyang Lin and Runji Lin and Dayiheng Liu and Gao Liu and Chengqiang Lu and Keming Lu and Jianxin Ma and Rui Men and Xingzhang Ren and Xuancheng Ren and Chuanqi Tan and Sinan Tan and Jianhong Tu and Peng Wang and Shijie Wang and Wei Wang and Shengguang Wu and Benfeng Xu and Jin Xu and An Yang and Hao Yang and Jian Yang and Shusheng Yang and Yang Yao and Bowen Yu and Hongyi Yuan and Zheng Yuan and Jianwei Zhang and Xingxuan Zhang and Yichang Zhang and Zhenru Zhang and Chang Zhou and Jingren Zhou and Xiaohuan Zhou and Tianhang Zhu},
-  journal={arXiv preprint arXiv:2309.16609},
-  year={2023}
+@misc{qwen2.5,
+title = {Qwen2.5: A Party of Foundation Models},
+url = {https://qwenlm.github.io/blog/qwen2.5/},
+author = {Qwen Team},
+month = {September},
+year = {2024}
+}
+@article{qwen2,
+title={Qwen2 Technical Report},
+author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
+journal={arXiv preprint arXiv:2407.10671},
+year={2024}
 }
 ```
 
diff --git a/examples/Qwen2.5-Coder-Instruct-stream.py b/examples/Qwen2.5-Coder-Instruct-stream.py
@@ -5,8 +5,8 @@
 device = "cuda" # the device to load the model onto
 
 # Now you do not need to add "trust_remote_code=True"
-tokenizer = AutoTokenizer.from_pretrained("Qwen/CodeQwen1.5-7B-Chat")
-model = AutoModelForCausalLM.from_pretrained("Qwen/CodeQwen1.5-7B-Chat", device_map="auto").eval()
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")
+model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct", device_map="auto").eval()
 
 # model = AutoModelForCausalLM.from_pretrained(
 # "Qwen/CodeQwen1.5-7B-Chat",
diff --git a/examples/Qwen2.5-Coder-Instruct.md b/examples/Qwen2.5-Coder-Instruct.md
@@ -1,17 +1,17 @@
-# Use CodeQwen1.5-base-chat By transformers
-The most significant but also the simplest usage of CodeQwen1.5-base-chat is using the `transformers` library. In this document, we show how to chat with CodeQwen1.5-base-chat in either streaming mode or not.
+# Use Qwen2.5-Coder-7B-Instruct By transformers
+The most significant but also the simplest usage of Qwen2.5-Coder-7B-Instruct is using the `transformers` library. In this document, we show how to chat with Qwen2.5-Coder-7B-Instruct in either streaming mode or not.
 
 ## Basic Usage
-You can just write several lines of code with `transformers` to chat with CodeQwen1.5-7B-Chat. Essentially, we build the tokenizer and the model with `from_pretrained` method, and we use generate method to perform chatting with the help of chat template provided by the tokenizer. Below is an example of how to chat with CodeQwen1.5-7B-Chat:
+You can just write several lines of code with `transformers` to chat with Qwen2.5-Coder-7B-Instruct. Essentially, we build the tokenizer and the model with `from_pretrained` method, and we use generate method to perform chatting with the help of chat template provided by the tokenizer. Below is an example of how to chat with Qwen2.5-Coder-7B-Instruct:
 
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 
 device = "cuda" # the device to load the model onto
 
 # Now you do not need to add "trust_remote_code=True"
-tokenizer = AutoTokenizer.from_pretrained("Qwen/CodeQwen1.5-7B-Chat")
-model = AutoModelForCausalLM.from_pretrained("Qwen/CodeQwen1.5-7B-Chat", device_map="auto").eval()
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")
+model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct", device_map="auto").eval()
 
 # tokenize the input into tokens
 
diff --git a/examples/Qwen2.5-Coder-Instruct.py b/examples/Qwen2.5-Coder-Instruct.py
@@ -3,8 +3,8 @@
 device = "cuda" # the device to load the model onto
 
 # Now you do not need to add "trust_remote_code=True"
-tokenizer = AutoTokenizer.from_pretrained("Qwen/CodeQwen1.5-7B-Chat")
-model = AutoModelForCausalLM.from_pretrained("Qwen/CodeQwen1.5-7B-Chat", device_map="auto").eval()
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")
+model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct", device_map="auto").eval()
 
 # tokenize the input into tokens
 
diff --git a/examples/Qwen2.5-Coder-fim.py b/examples/Qwen2.5-Coder-fim.py
@@ -2,17 +2,17 @@
 # load model
 device = "cuda" # the device to load the model onto
 
-tokenizer = AutoTokenizer.from_pretrained("Qwen/CodeQwen1.5-7B")
-model = AutoModelForCausalLM.from_pretrained("Qwen/CodeQwen1.5-7B", device_map="auto").eval()
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B")
+model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B", device_map="auto").eval()
 
-input_text = """<fim_prefix>def quicksort(arr):
+input_text = """<|fim_prefix|>def quicksort(arr):
     if len(arr) <= 1:
         return arr
     pivot = arr[len(arr) // 2]
-    <fim_suffix>
+    <|fim_suffix|>
     middle = [x for x in arr if x == pivot]
     right = [x for x in arr if x > pivot]
-    return quicksort(left) + middle + quicksort(right)<fim_middle>"""
+    return quicksort(left) + middle + quicksort(right)<|fim_middle|>"""
 
 model_inputs = tokenizer([input_text], return_tensors="pt").to(device)
 
diff --git a/examples/Qwen2.5-Coder-repolevel-fim.py b/examples/Qwen2.5-Coder-repolevel-fim.py
@@ -2,13 +2,13 @@
 device = "cuda" # the device to load the model onto
 
 # Now you do not need to add "trust_remote_code=True"
-tokenizer = AutoTokenizer.from_pretrained("Qwen/CodeQwen1.5-7B")
-model = AutoModelForCausalLM.from_pretrained("Qwen/CodeQwen1.5-7B", device_map="auto").eval()
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B")
+model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B", device_map="auto").eval()
 
 # tokenize the input into tokens
 # set fim format into the corresponding file you need to infilling
-input_text = """<repo_name>library-system
-<file_sep>library.py
+input_text = """<|repo_name|>library-system
+<|file_sep|>library.py
 class Book:
     def __init__(self, title, author, isbn, copies):
         self.title = title
@@ -36,7 +36,7 @@ def find_book(self, isbn):
     def list_books(self):
         return self.books
 
-<file_sep>student.py
+<|file_sep|>student.py
 class Student:
     def __init__(self, name, id):
         self.name = name
@@ -57,8 +57,8 @@ def return_book(self, book, library):
             return True
         return False
 
-<file_sep>main.py
-<fim_prefix>from library import Library
+<|file_sep|>main.py
+<|fim_prefix|>from library import Library
 from student import Student
 
 def main():
@@ -70,7 +70,7 @@ def main():
     # Set up a student
     student = Student("Alice", "S1")
     
-    # Student borrows a book<fim_suffix>
+    # Student borrows a book<|fim_suffix|>
     if student.borrow_book(book, library):
         print(f"{student.name} borrowed {book.title}")
     else:
@@ -88,7 +88,7 @@ def main():
         print(book)
 
 if __name__ == "__main__":
-    main()<fim_middle>
+    main()<|fim_middle|>
 """
 model_inputs = tokenizer([input_text], return_tensors="pt").to(device)
 
@@ -97,7 +97,7 @@ def main():
 # The generated_ids include prompt_ids, so we only need to decode the tokens after prompt_ids.
 output_text = tokenizer.decode(generated_ids[len(model_inputs.input_ids[0]):], skip_special_tokens=True)
 
-print(f"Prompt: \n{input_text}\n\nGenerated text: \n{output_text.split('<file_sep>')[0]}")
+print(f"Prompt: \n{input_text}\n\nGenerated text: \n{output_text.split('<|file_sep|>')[0]}")
 
 # the expected output as following:
 """
diff --git a/examples/Qwen2.5-Coder-repolevel.py b/examples/Qwen2.5-Coder-repolevel.py
@@ -2,12 +2,12 @@
 device = "cuda" # the device to load the model onto
 
 # Now you do not need to add "trust_remote_code=True"
-tokenizer = AutoTokenizer.from_pretrained("Qwen/CodeQwen1.5-7B")
-model = AutoModelForCausalLM.from_pretrained("Qwen/CodeQwen1.5-7B", device_map="auto").eval()
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B")
+model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B", device_map="auto").eval()
 
 # tokenize the input into tokens
-input_text = """<repo_name>library-system
-<file_sep>library.py
+input_text = """<|repo_name|>library-system
+<|file_sep|>library.py
 class Book:
     def __init__(self, title, author, isbn, copies):
         self.title = title
@@ -35,7 +35,7 @@ def find_book(self, isbn):
     def list_books(self):
         return self.books
 
-<file_sep>student.py
+<|file_sep|>student.py
 class Student:
     def __init__(self, name, id):
         self.name = name
@@ -56,7 +56,7 @@ def return_book(self, book, library):
             return True
         return False
 
-<file_sep>main.py
+<|file_sep|>main.py
 from library import Library
 from student import Student
 
@@ -78,7 +78,7 @@ def main():
 # The generated_ids include prompt_ids, so we only need to decode the tokens after prompt_ids.
 output_text = tokenizer.decode(generated_ids[len(model_inputs.input_ids[0]):], skip_special_tokens=True)
 
-print(f"Prompt: \n{input_text}\n\nGenerated text: \n{output_text.split('<file_sep>')[0]}")
+print(f"Prompt: \n{input_text}\n\nGenerated text: \n{output_text.split('<|file_sep|>')[0]}")
 
 # the expected output as following:
 """
diff --git a/examples/Qwen2.5-Coder.md b/examples/Qwen2.5-Coder.md
@@ -1,11 +1,11 @@
-# Use CodeQwen1.5-base By transformers
-One of the simple but fundamental ways to try CodeQwen1.5-base is to use the `transformers` library. In this document, we show how to use CodeQwen1.5-base in three common scenarios of code generation, respectively.
+# Use Qwen2.5-Coder-7B By transformers
+One of the simple but fundamental ways to try Qwen2.5-Coder-7B is to use the `transformers` library. In this document, we show how to use Qwen2.5-Coder-7B in three common scenarios of code generation, respectively.
 
 
 ## Basic Usage
 The model completes the code snipplets according to the given prompts, without any additional formatting, which is usually termed as `code completion` in the code generation tasks.
  
-Essentially, we build the tokenizer and the model with `from_pretrained` method, and we use generate method to perform code completion. Below is an example on how to chat with CodeQwen1.5-base:
+Essentially, we build the tokenizer and the model with `from_pretrained` method, and we use generate method to perform code completion. Below is an example on how to chat with Qwen2.5-Coder-7B:
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 
@@ -53,7 +53,7 @@ input_text = """<|fim_prefix|>def quicksort(arr):
     <|fim_suffix|>
     middle = [x for x in arr if x == pivot]
     right = [x for x in arr if x > pivot]
-    return quicksort(left) + middle + quicksort(right)<fim_middle>"""
+    return quicksort(left) + middle + quicksort(right)<|fim_middle|>"""
 
 model_inputs = tokenizer([input_text], return_tensors="pt").to(device)
 
@@ -88,7 +88,7 @@ model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B", device_map
 
 # tokenize the input into tokens
 input_text = """<repo_name>library-system
-<file_sep>library.py
+<|file_sep|>library.py
 class Book:
     def __init__(self, title, author, isbn, copies):
         self.title = title
@@ -116,7 +116,7 @@ class Library:
     def list_books(self):
         return self.books
 
-<file_sep>student.py
+<|file_sep|>student.py
 class Student:
     def __init__(self, name, id):
         self.name = name
@@ -137,7 +137,7 @@ class Student:
             return True
         return False
 
-<file_sep>main.py
+<|file_sep|>main.py
 from library import Library
 from student import Student
 
@@ -159,7 +159,7 @@ generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=1024, do_s
 # The generated_ids include prompt_ids, so we only need to decode the tokens after prompt_ids.
 output_text = tokenizer.decode(generated_ids[len(model_inputs.input_ids[0]):], skip_special_tokens=True)
 
-print(f"Prompt: \n{input_text}\n\nGenerated text: \n{output_text.split('<file_sep>')[0]}")
+print(f"Prompt: \n{input_text}\n\nGenerated text: \n{output_text.split('<|file_sep|>')[0]}")
 
 ```
 The expected output as following:
@@ -209,8 +209,8 @@ model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B", device_map
 
 # tokenize the input into tokens
 # set fim format into the corresponding file you need to infilling
-input_text = """<repo_name>library-system
-<file_sep>library.py
+input_text = """<|repo_name|>library-system
+<|file_sep|>library.py
 class Book:
     def __init__(self, title, author, isbn, copies):
         self.title = title
@@ -238,7 +238,7 @@ class Library:
     def list_books(self):
         return self.books
 
-<file_sep>student.py
+<|file_sep|>student.py
 class Student:
     def __init__(self, name, id):
         self.name = name
@@ -290,7 +290,7 @@ def main():
         print(book)
 
 if __name__ == "__main__":
-    main()<fim_middle>
+    main()<|fim_middle|>
 """
 model_inputs = tokenizer([input_text], return_tensors="pt").to(device)
 
@@ -299,7 +299,7 @@ generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=1024, do_s
 # The generated_ids include prompt_ids, so we only need to decode the tokens after prompt_ids.
 output_text = tokenizer.decode(generated_ids[len(model_inputs.input_ids[0]):], skip_special_tokens=True)
 
-print(f"Prompt: \n{input_text}\n\nGenerated text: \n{output_text.split('<file_sep>')[0]}")
+print(f"Prompt: \n{input_text}\n\nGenerated text: \n{output_text.split('<|file_sep|>')[0]}")
 
 # the expected output as following:
 """
@@ -308,8 +308,8 @@ Generated text:
 """
 ```
 
-# Use CodeQwen1.5-base By vllm
-As a family member of Qwen1.5, CodeQwen1.5 are supported by vLLM. The detail tutorial  could be found in [Qwen tutorial](https://qwen.readthedocs.io/en/latest/deployment/vllm.html). 
+# Use Qwen2.5-Coder-7B By vllm
+As a family member of Qwen2.5, Qwen2.5-Coder-7B are supported by vLLM. The detail tutorial  could be found in [Qwen tutorial](https://qwen.readthedocs.io/en/latest/deployment/vllm.html). 
 Here, we only give you an simple example of offline batched inference in vLLM.
 
 ## Offline Batched Inference
@@ -349,7 +349,7 @@ llm = LLM(model="Qwen/Qwen2.5-Coder-7B", tensor_parallel_size=4)
 
 ## Streaming Mode
 
-With the help of `TextStreamer`, you can modify generation with CodeQwen to streaming mode. Below we show you an example of how to use it:
+With the help of `TextStreamer`, you can modify generation with Qwen2.5-Coder to streaming mode. Below we show you an example of how to use it:
 
 
 ```python
diff --git a/examples/Qwen2.5-Coder.py b/examples/Qwen2.5-Coder.py
@@ -2,8 +2,8 @@
 device = "cuda" # the device to load the model onto
 
 # Now you do not need to add "trust_remote_code=True"
-tokenizer = AutoTokenizer.from_pretrained("Qwen/CodeQwen1.5-7B")
-model = AutoModelForCausalLM.from_pretrained("Qwen/CodeQwen1.5-7B", device_map="auto").eval()
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B")
+model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B", device_map="auto").eval()
 
 
 # tokenize the input into tokens