|
1 | | -# Go Embeddings with llama.cpp |
| 1 | +# Semantic Search |
| 2 | + |
| 3 | +This library was created to provide an **easy and efficient solution for embedding and vector search**, making it perfect for small to medium-scale projects that still need some **serious semantic power**. It’s built around a simple idea: if your dataset is small enough, you can achieve accurate results with brute-force techniques, and with some smart optimizations like **SIMD**, you can keep things fast and lean. |
| 4 | + |
| 5 | +The library’s strength lies in its simplicity and support for **GGUF BERT models**, letting you leverage sophisticated embeddings without getting bogged down by the complexities of traditional search systems. It offers **GPU acceleration**, enabling quick computations on supported hardware. If your dataset has fewer than 100,000 entries, this library is a great fit for integrating semantic search into your Go applications with minimal hassle. |
2 | 6 |
|
3 | 7 |  |
4 | 8 |
|
5 | | -## Precompiled binaries |
| 9 | +## 🚀 Key Features |
| 10 | + |
| 11 | +- **llama.cpp without cgo**: The library is built to work with [llama.cpp](https://github.com/ggerganov/llama.cpp) without using cgo. Instead, it relies on [purego](https://github.com/ebitengine/purego) , which allows calling shared C libraries directly from Go code without the need for cgo. This design significantly simplifies the integration, deployment, and cross-compilation, making it easier to build Go applications that interface with native libraries. |
| 12 | +- **Support for BERT Models**: The library supports BERT models via [llama.cpp](https://github.com/ggerganov/llama.cpp/pull/5423). Vast variations of BERT models can be used, as long as they are using GGUF format. |
| 13 | +- **Precompiled Binaries with Vulkan GPU Support**: Available for Windows and Linux in the [dist](dist) directory, compiled with Vulkan for GPU acceleration. However, you can compile the library yourself with or without GPU support. |
| 14 | +- **Search Index for Embeddings**: The library supports the creation of a search index from computed embeddings, which can be saved to disk and loaded later. This feature is suitable for basic vector-based searches in small-scale applications, but it may face efficiency challenges with large datasets due to the use of brute-force techniques. |
| 15 | + |
| 16 | +## Limitations |
| 17 | + |
| 18 | +While simple vector search excels in small-scale applications,avoid using this library if you have the following requirements. |
| 19 | + |
| 20 | +- **Large Datasets**: The current implementation is designed for small-scale applications, and datasets exceeding 100,000 entries may suffer from performance bottlenecks due to the brute-force search approach. For larger datasets, approximate nearest neighbor (ANN) algorithms and specialized data structures should be considered for efficiency. |
| 21 | +- **Complex Query Requirements**: The library focuses on simple vector similarity search and does not support advanced query capabilities like multi-field filtering, fuzzy matching, or SQL-like operations that are common in more sophisticated search engines. |
| 22 | +- **High-Dimensional Complex Embeddings**: Large language models (LLMs) generate embeddings that are both high-dimensional and computationally intensive. Handling these embeddings in real-time can be taxing on the system unless sufficient GPU resources are available and optimized for low-latency inference. |
| 23 | + |
| 24 | +## 📚 How to Use the Library |
| 25 | + |
| 26 | +This example demonstrates how to use the library to generate embeddings for text and perform a simple vector search. The code snippet below shows how to load a model, generate embeddings for text, create a search index, and perform a search. |
| 27 | + |
| 28 | +1. **Install library**: Precompiled binaries for Windows and Linux are provided in the [dist](dist) directory. If your target architecture or platform isn't covered by these binaries, you'll need to compile the library from the source. Drop these binaries in `/usr/lib` or equivalent. |
| 29 | + |
| 30 | +1. **Load a model**: The `search.NewVectorizer` function initializes a model using a GGUF file. This example loads the _MiniLM-L6-v2.Q8_0.gguf_ model. The second parameter, indicates the number of GPU layers to enable (0 for CPU only). |
6 | 31 |
|
7 | | -Precompiled binaries for Windows and Linux are available in the [dist](dist) directory. If the architecture/platform you are using is not available, you would need to compile the library yourself. |
| 32 | +```go |
| 33 | +m, err := search.NewVectorizer("../dist/MiniLM-L6-v2.Q8_0.gguf", 0) |
| 34 | +if err != nil { |
| 35 | + // handle error |
| 36 | +} |
| 37 | +defer m.Close() |
| 38 | +``` |
| 39 | + |
| 40 | +3. **Generate text embeddings**: The `EmbedText` method is used to generate vector embeddings for a given text input. This converts your text into a dense numerical vector representation given the model you loaded in the previous step. |
| 41 | + |
| 42 | +```go |
| 43 | +embedding, err := m.EmbedText("Your text here") |
| 44 | +``` |
| 45 | + |
| 46 | +4. **Create an index and adding vectors**: Create a new index using `search.NewIndex`. The type parameter `[string]` in this example specifies that each vector is associated with a string value. You can add multiple vectors with corresponding labels. |
| 47 | + |
| 48 | +```go |
| 49 | +index := search.NewIndex[string]() |
| 50 | +index.Add(embedding, "Your text here") |
| 51 | +``` |
| 52 | + |
| 53 | +5. **Search the index**: Perform a search using the `Search` method, which takes an embedding vector and a number of results to retrieve. This example searches for the 10 most relevant results and prints them along with their relevance scores. |
| 54 | + |
| 55 | +```go |
| 56 | +results := index.Search(embedding, 10) |
| 57 | +for _, r := range results { |
| 58 | + fmt.Printf("Result: %s (Relevance: %.2f)\n", r.Value, r.Relevance) |
| 59 | +} |
| 60 | +``` |
8 | 61 |
|
9 | | -## Compile library |
| 62 | +## 🛠 Compile library |
10 | 63 |
|
11 | 64 | First, clone the repository and its submodules with the following commands. The `--recurse-submodules` flag is used to clone the `ggml` submodule, which is a header-only library for matrix operations. |
12 | 65 |
|
|
0 commit comments