|
| 1 | +# :package: Installation (Llama 2) |
| 2 | + |
| 3 | +This section shows how to install **Incognito Pilot** using Llama 2. |
| 4 | +Please note that you will only get satisfactory results with the largest model *llama-2-70b-chat*, which needs considerable hardware resources. |
| 5 | +And even then, the experience will not be comparable to GPT-4, since Llama 2 was not fine-tuned for this task. |
| 6 | + |
| 7 | +Nevertheless, it's a lot of fun to see what's already possible with open-source models. |
| 8 | +At the moment, there are two ways of using **Incognito Pilot** with Llama 2: |
| 9 | + |
| 10 | +- Using a cloud API from [replicate](https://replicate.com/). |
| 11 | + While you don't have the advantage of a fully local setup here, you can try out the 70B model in a quick way without owning powerful hardware. |
| 12 | +- Using Hugging Face's [Text Generation Inference](https://github.com/huggingface/text-generation-inference) container, |
| 13 | + which allows you to run llama 2 locally with a simple `docker run` command. |
| 14 | + |
| 15 | +## Replicate |
| 16 | + |
| 17 | +Follow these steps: |
| 18 | + |
| 19 | +1. Install [docker](https://www.docker.com/). |
| 20 | +2. Create an empty folder somewhere on your system. |
| 21 | + This will be the working directory to which **Incognito Pilot** has access to. |
| 22 | + The code interpreter can read your files in this folder and store any results. |
| 23 | + In the following, we assume it to be */home/user/ipilot*. |
| 24 | +3. Create a [Replicate](https://replicate.com/) account, |
| 25 | + add a [credit card](https://replicate.com/account/billing) |
| 26 | + and copy your [API key](https://replicate.com/account/api-tokens). |
| 27 | +4. Now, just run the following command (replace your working directory and API key): |
| 28 | + |
| 29 | +```shell |
| 30 | +docker run -i -t \ |
| 31 | + -p 3030:80 \ |
| 32 | + -e LLM="llama-replicate:replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1" \ |
| 33 | + -e REPLICATE_API_KEY="your-replicate-api-key" \ |
| 34 | + -v /home/user/ipilot:/mnt/data \ |
| 35 | + silvanmelchior/incognito-pilot:latest-slim |
| 36 | +``` |
| 37 | + |
| 38 | +You can of course also choose a [different model](https://replicate.com/blog/all-the-llamas), but the smaller ones are much less suited for this task. |
| 39 | + |
| 40 | +Now visit http://localhost:3030 and should see the **Incognito Pilot** interface. |
| 41 | +Does it work? Great, let's move to the [Getting started](#rocket-getting-started-llama-2) section. |
| 42 | + |
| 43 | +## Text Generation Inference |
| 44 | + |
| 45 | +Follow these steps: |
| 46 | + |
| 47 | +1. Install [docker](https://www.docker.com/). |
| 48 | +2. Create an empty folder somewhere on your system. |
| 49 | + This will be the working directory to which **Incognito Pilot** has access to. |
| 50 | + The code interpreter can read your files in this folder and store any results. |
| 51 | + In the following, we assume it to be */home/user/ipilot*. |
| 52 | +3. Create a [Hugging Face](https://huggingface.co/) account. |
| 53 | +4. Make sure you get access to the [Llama 2 model weights](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) on Hugging Face. |
| 54 | +5. In the *Files and versions* tab, download the following three files (we assume them to be in */home/user/tokenizer*): |
| 55 | + - tokenizer.json |
| 56 | + - tokenizer.model |
| 57 | + - tokenizer_config.json |
| 58 | +6. Create an [access token](https://huggingface.co/settings/tokens). |
| 59 | + |
| 60 | +Now, let's first run the *Text Generation Inference* service. |
| 61 | +Check out their [Readme](https://github.com/huggingface/text-generation-inference#readme). |
| 62 | +I had to run something similar to this: |
| 63 | + |
| 64 | +```shell |
| 65 | +docker run \ |
| 66 | + --gpus all \ |
| 67 | + --shm-size 1g \ |
| 68 | + -p 8080:80 \ |
| 69 | + -v /home/user/tgi_cache:/data |
| 70 | + -e HUGGING_FACE_HUB_TOKEN=hf_your-huggingface-api-token |
| 71 | + ghcr.io/huggingface/text-generation-inference \ |
| 72 | + --model-id "meta-llama/Llama-2-70b-chat-hf" |
| 73 | +``` |
| 74 | + |
| 75 | +You can of course also choose a different model, but the smaller ones are much less suited for this task. |
| 76 | +Once the container shows a success message, you are ready for the next step. |
| 77 | + |
| 78 | +Visit http://localhost:8080/info. |
| 79 | +You should see a JSON with model information. |
| 80 | +We will need the value for *max_total_tokens* in the next command. |
| 81 | + |
| 82 | +Now, just run the following command (replace your directories and max tokens): |
| 83 | + |
| 84 | +```shell |
| 85 | +docker run -i -t \ |
| 86 | + -p 3030:80 \ |
| 87 | + -e LLM="llama-tgi:http://host.docker.internal:8080" \ |
| 88 | + -e MAX_TOKENS="your-max-tokens" \ |
| 89 | + -e TOKENIZER_PATH="/mnt/tokenizer/tokenizer.model" \ |
| 90 | + -v /home/user/tokenizer:/mnt/tokenizer \ |
| 91 | + -v /home/user/ipilot:/mnt/data \ |
| 92 | + silvanmelchior/incognito-pilot:latest-slim |
| 93 | +``` |
| 94 | + |
| 95 | +Visit http://localhost:3030 and should see the **Incognito Pilot** interface. |
| 96 | + |
| 97 | +## :rocket: Getting started (Llama 2) |
| 98 | + |
| 99 | +In the **Incognito Pilot** interface, you will see a chat interface, with which you can interact with the model. |
| 100 | +Let's try it out! |
| 101 | + |
| 102 | +1. **File Access**: Type "Create a text file with all numbers from 0 to 100". |
| 103 | + You will see how the *Code* part of the UI shows you a Python snippet. |
| 104 | + As soon as you approve, the code will be executed on your machine (within the docker container). |
| 105 | + You will see the result in the *Result* part of the UI. |
| 106 | + As soon as you approve it, it will be sent back to the model. |
| 107 | + In the case of using an API (like Replicate), this of course also means that this result will be sent to their services. |
| 108 | + After the approval, the model will confirm you the execution. |
| 109 | + Check your working directory now (e.g. */home/user/ipilot*): You should see the file! |
| 110 | +2. **Math**: Type "What is 1 + 2 * 3 + 4 * 5 + 6 * 7 + 8 * 9?". |
| 111 | + The model will use the Python interpreter to come to the correct result. |
| 112 | + |
| 113 | +Now you should be ready to use **Incognito Pilot** for your own tasks. |
| 114 | +One more thing: The version you just used has nearly no packages shipped with the Python interpreter. |
| 115 | +This means, things like reading images or Excel files will not work. |
| 116 | +To change this, head back to the console and press Ctrl-C to stop the container. |
| 117 | +Now re-run the command, but remove the `-slim` suffix from the image. |
| 118 | +This will download a much larger version, equipped with [many packages](/docker/requirements_full.txt). |
| 119 | + |
| 120 | +Let's head back to the [Getting Started](/README.md#gear-settings) section. |
0 commit comments