readme for llama2 and azure

silvanmelchior-zuehlke · silvanmelchior-zuehlke · commit cf5dee276e83 · 2023-08-16T21:09:26.000+02:00
diff --git a/README.md b/README.md
@@ -23,11 +23,19 @@ and much more!
 
 https://github.com/silvanmelchior/IncognitoPilot/assets/6033305/05b0a874-6f76-4d22-afca-36c11f90b1ff
 
-The video shows GPT-4 in action.
+The video shows Incognito Pilot with GPT-4.
 While your conversation and approved code results are sent to OpenAI, your **data is kept locally** on your machine.
-And you can go even further and use Llama 2 to be fully locally.
+And you can go even further and use Llama 2 to have everything running on your machine.
 
-## :package: Installation
+## :package: Installation (GPT via OpenAI API)
+
+This section shows how to install **Incognito Pilot** using a GPT model via OpenAI's API. For
+
+- **Llama 2**, check [Installation for Llama 2](/docs/INSTALLATION_LLAMA.md) instead, and for
+- **GPT on Azure**, check [Installation with Azure](/docs/INSTALLATION_AZURE.md) instead.
+- If you don't have docker, you can install **Incognito Pilot** on your system directly, using the development setup (see below).
+
+Follow these steps:
 
 1. Install [docker](https://www.docker.com/).
 2. Create an empty folder somewhere on your system.
@@ -41,21 +49,18 @@ And you can go even further and use Llama 2 to be fully locally.
 
 ```shell
 docker run -i -t \
-  -p 3030:3030 -p 3031:3031 \
+  -p 3030:80 \
   -e OPENAI_API_KEY="sk-your-api-key" \
   -v /home/user/ipilot:/mnt/data \
   silvanmelchior/incognito-pilot:latest-slim
 ```
 
 You can now visit http://localhost:3030 and should see the **Incognito Pilot** interface.
 
-Some final remarks:
-
-- If you don't have docker, you can install **Incognito Pilot** on your system directly, using the development setup (see below).
-- You can also run **Incognito Pilot** with the free trial credits of OpenAI, without adding a credit card.
-  At the moment, this does not include GPT-4 however, so see below how to change the model to GPT-3.5.
+It's also possible to run **Incognito Pilot** with the free trial credits of OpenAI, without adding a credit card.
+At the moment, this does not include GPT-4 however, so see below how to change the model to GPT-3.5.
 
-## :rocket: Getting started
+## :rocket: Getting started (GPT)
 
 In the **Incognito Pilot** interface, you will see a chat interface, with which you can interact with the model.
 Let's try it out!
@@ -78,8 +83,6 @@ To change this, head back to the console and press Ctrl-C to stop the container.
 Now re-run the command, but remove the `-slim` suffix from the image.
 This will download a much larger version, equipped with [many packages](/docker/requirements_full.txt).
 
-## :gear: Settings
-
 ### Change model
 
 To use another model than the default one (GPT-4), set the environment variable `LLM`.
@@ -89,32 +92,40 @@ OpenAI's GPT models have the prefix `gpt:`, so to use GPT-3.5 for example (the o
 -e LLM="gpt:gpt-3.5-turbo"
 ```
 
-Please note that GPT-4 is considerably better in this interpreter setup than GPT-3.5.
+Please note that GPT-4 is considerably better in the interpreter setup than GPT-3.5.
+
+## :gear: Settings
 
 ### Change port
 
-Per default, the UI is served on port 3030 and contacts the interpreter at port 3031.
-This can be changed to any ports using the port mapping of docker.
-However, the new port for the interpreter also needs to be communicated to the UI, using the environment variable `INTERPRETER_URL`.
-For example, to serve the UI on port 8080 and the interpreter on port 8081, run the following:
+To serve the UI at a different port than 3030, just expose the internal port 80 to a different one, for example 8080:
 
 ```shell
 docker run -i -t \
-  -p 8080:3030 -p 8081:3031 \
-  -e OPENAI_API_KEY="sk-your-api-key" \
-  -e INTERPRETER_PORT=8081 \
-  -v /home/user/ipilot:/mnt/data \
+  -p 8080:80 \
+  ... \
   silvanmelchior/incognito-pilot
 ```
 
-### Further settings
+### Timeout
+
+Per default, the Python interpreter stops after 30 seconds.
+To change this, set the environment variable `INTERPRETER_TIMEOUT`.
+For 2 minutes for example, add the following to the docker run command:
+
+```shell
+-e INTERPRETER_TIMEOUT="120"
+```
 
-The following further settings are available
+### Autostart
 
-- Per default, the Python interpreter stops after 30 seconds.
-  To change this, set the environment variable `INTERPRETER_TIMEOUT`. 
-- To automatically start **Incognito Pilot** with docker / at startup, remove the remove `-i -t` from the run command and add `--restart always`.
-  Together with a bookmark of the UI URL, you'll have **Incognito Pilot** at your fingertips whenever you need it.
+To automatically start **Incognito Pilot** with docker / at startup, remove the `-i -t` from the run command and add the following:
+
+```shell
+--restart always
+```
+
+Together with a bookmark of the UI URL, you'll have **Incognito Pilot** at your fingertips whenever you need it.
 
 ## :toolbox: Own dependencies
 
@@ -149,9 +160,7 @@ Then run the container like this:
 
 ```shell
 docker run -i -t \
-  -p 3030:3030 -p 3031:3031 \
-  -e OPENAI_API_KEY="sk-your-api-key" \
-  -v /home/user/ipilot:/mnt/data \
+  ... \
   incognito-pilot-custom
 ```
 
diff --git a/docs/INSTALLATION_AZURE.md b/docs/INSTALLATION_AZURE.md
@@ -0,0 +1,30 @@
+# :package: Installation (GPT via Azure)
+
+This section shows how to install **Incognito Pilot** using a GPT model via Azure.
+Follow these steps:
+
+1. Install [docker](https://www.docker.com/).
+2. Create an empty folder somewhere on your system.
+   This will be the working directory to which **Incognito Pilot** has access to.
+   The code interpreter can read your files in this folder and store any results.
+   In the following, we assume it to be */home/user/ipilot*.
+3. Login to Azure portal and create an [Azure OpenAI Service](https://azure.microsoft.com/en-us/products/ai-services/openai-service-b).
+4. You will see the access key and endpoint, which we will use later.
+5. Open Azure OpenAI Studio and deploy a model.
+6Now, just run the following command (replace your working directory, model-name and API information):
+
+```shell
+docker run -i -t \
+  -p 3030:80 \
+  -e LLM="gpt-azure:your-deployment-name" \
+  -e AZURE_API_KEY="your-azure-openai-api-key" \
+  -e AZURE_API_BASE="https://your-azure-openai-service-name.openai.azure.com/" \
+  -v /home/user/ipilot:/mnt/data \
+  silvanmelchior/incognito-pilot:latest-slim
+```
+
+You can now visit http://localhost:3030 and should see the **Incognito Pilot** interface.
+
+Make sure you have access to a model which is capable of function calling, otherwise you will get an error similar to "unknown argument 'function'".
+
+Let's head back to the [Getting Started](/README.md#rocket-getting-started-gpt) section.
diff --git a/docs/INSTALLATION_LLAMA.md b/docs/INSTALLATION_LLAMA.md
@@ -0,0 +1,120 @@
+# :package: Installation (Llama 2)
+
+This section shows how to install **Incognito Pilot** using Llama 2.
+Please note that you will only get satisfactory results with the largest model *llama-2-70b-chat*, which needs considerable hardware resources.
+And even then, the experience will not be comparable to GPT-4, since Llama 2 was not fine-tuned for this task.
+
+Nevertheless, it's a lot of fun to see what's already possible with open-source models.
+At the moment, there are two ways of using **Incognito Pilot** with Llama 2:
+
+- Using a cloud API from [replicate](https://replicate.com/).
+  While you don't have the advantage of a fully local setup here, you can try out the 70B model in a quick way without owning powerful hardware.
+- Using Hugging Face's [Text Generation Inference](https://github.com/huggingface/text-generation-inference) container,
+  which allows you to run llama 2 locally with a simple `docker run` command.
+
+## Replicate
+
+Follow these steps:
+
+1. Install [docker](https://www.docker.com/).
+2. Create an empty folder somewhere on your system.
+   This will be the working directory to which **Incognito Pilot** has access to.
+   The code interpreter can read your files in this folder and store any results.
+   In the following, we assume it to be */home/user/ipilot*.
+3. Create a [Replicate](https://replicate.com/) account,
+   add a [credit card](https://replicate.com/account/billing)
+   and copy your [API key](https://replicate.com/account/api-tokens).
+4. Now, just run the following command (replace your working directory and API key):
+
+```shell
+docker run -i -t \
+  -p 3030:80 \
+  -e LLM="llama-replicate:replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1" \
+  -e REPLICATE_API_KEY="your-replicate-api-key" \
+  -v /home/user/ipilot:/mnt/data \
+  silvanmelchior/incognito-pilot:latest-slim
+```
+
+You can of course also choose a [different model](https://replicate.com/blog/all-the-llamas), but the smaller ones are much less suited for this task.
+
+Now visit http://localhost:3030 and should see the **Incognito Pilot** interface.
+Does it work? Great, let's move to the [Getting started](#rocket-getting-started-llama-2) section.
+
+## Text Generation Inference
+
+Follow these steps:
+
+1. Install [docker](https://www.docker.com/).
+2. Create an empty folder somewhere on your system.
+   This will be the working directory to which **Incognito Pilot** has access to.
+   The code interpreter can read your files in this folder and store any results.
+   In the following, we assume it to be */home/user/ipilot*.
+3. Create a [Hugging Face](https://huggingface.co/) account.
+4. Make sure you get access to the [Llama 2 model weights](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) on Hugging Face.
+5. In the *Files and versions* tab, download the following three files (we assume them to be in */home/user/tokenizer*):
+   - tokenizer.json
+   - tokenizer.model
+   - tokenizer_config.json
+6. Create an [access token](https://huggingface.co/settings/tokens).
+
+Now, let's first run the *Text Generation Inference* service.
+Check out their [Readme](https://github.com/huggingface/text-generation-inference#readme).
+I had to run something similar to this:
+
+```shell
+docker run \
+  --gpus all \
+  --shm-size 1g \
+  -p 8080:80 \
+  -v /home/user/tgi_cache:/data
+  -e HUGGING_FACE_HUB_TOKEN=hf_your-huggingface-api-token
+  ghcr.io/huggingface/text-generation-inference \
+  --model-id "meta-llama/Llama-2-70b-chat-hf"
+```
+
+You can of course also choose a different model, but the smaller ones are much less suited for this task.
+Once the container shows a success message, you are ready for the next step.
+
+Visit http://localhost:8080/info.
+You should see a JSON with model information.
+We will need the value for *max_total_tokens* in the next command.
+
+Now, just run the following command (replace your directories and max tokens):
+
+```shell
+docker run -i -t \
+  -p 3030:80 \
+  -e LLM="llama-tgi:http://host.docker.internal:8080" \
+  -e MAX_TOKENS="your-max-tokens" \
+  -e TOKENIZER_PATH="/mnt/tokenizer/tokenizer.model" \
+  -v /home/user/tokenizer:/mnt/tokenizer \
+  -v /home/user/ipilot:/mnt/data \
+  silvanmelchior/incognito-pilot:latest-slim
+```
+
+Visit http://localhost:3030 and should see the **Incognito Pilot** interface.
+
+## :rocket: Getting started (Llama 2)
+
+In the **Incognito Pilot** interface, you will see a chat interface, with which you can interact with the model.
+Let's try it out!
+
+1. **File Access**: Type "Create a text file with all numbers from 0 to 100".
+   You will see how the *Code* part of the UI shows you a Python snippet.
+   As soon as you approve, the code will be executed on your machine (within the docker container).
+   You will see the result in the *Result* part of the UI.
+   As soon as you approve it, it will be sent back to the model.
+   In the case of using an API (like Replicate), this of course also means that this result will be sent to their services.
+   After the approval, the model will confirm you the execution.
+   Check your working directory now (e.g. */home/user/ipilot*): You should see the file!
+2. **Math**: Type "What is 1 + 2 * 3 + 4 * 5 + 6 * 7 + 8 * 9?".
+   The model will use the Python interpreter to come to the correct result.
+
+Now you should be ready to use **Incognito Pilot** for your own tasks.
+One more thing: The version you just used has nearly no packages shipped with the Python interpreter.
+This means, things like reading images or Excel files will not work.
+To change this, head back to the console and press Ctrl-C to stop the container.
+Now re-run the command, but remove the `-slim` suffix from the image.
+This will download a much larger version, equipped with [many packages](/docker/requirements_full.txt).
+
+Let's head back to the [Getting Started](/README.md#gear-settings) section.