Skip to content

Commit 98834ad

Browse files
committed
readme
1 parent e5f6cd8 commit 98834ad

File tree

1 file changed

+57
-4
lines changed

1 file changed

+57
-4
lines changed

README.md

Lines changed: 57 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,65 @@
1-
# Go Embeddings with llama.cpp
1+
# Semantic Search
2+
3+
This library was created to provide an **easy and efficient solution for embedding and vector search**, making it perfect for small to medium-scale projects that still need some **serious semantic power**. It’s built around a simple idea: if your dataset is small enough, you can achieve accurate results with brute-force techniques, and with some smart optimizations like **SIMD**, you can keep things fast and lean.
4+
5+
The library’s strength lies in its simplicity and support for **GGUF BERT models**, letting you leverage sophisticated embeddings without getting bogged down by the complexities of traditional search systems. It offers **GPU acceleration**, enabling quick computations on supported hardware. If your dataset has fewer than 100,000 entries, this library is a great fit for integrating semantic search into your Go applications with minimal hassle.
26

37
![demo](./.github/demo.gif)
48

5-
## Precompiled binaries
9+
## 🚀 Key Features
10+
11+
- **llama.cpp without cgo**: The library is built to work with [llama.cpp](https://github.com/ggerganov/llama.cpp) without using cgo. Instead, it relies on [purego](https://github.com/ebitengine/purego) , which allows calling shared C libraries directly from Go code without the need for cgo. This design significantly simplifies the integration, deployment, and cross-compilation, making it easier to build Go applications that interface with native libraries.
12+
- **Support for BERT Models**: The library supports BERT models via [llama.cpp](https://github.com/ggerganov/llama.cpp/pull/5423). Vast variations of BERT models can be used, as long as they are using GGUF format.
13+
- **Precompiled Binaries with Vulkan GPU Support**: Available for Windows and Linux in the [dist](dist) directory, compiled with Vulkan for GPU acceleration. However, you can compile the library yourself with or without GPU support.
14+
- **Search Index for Embeddings**: The library supports the creation of a search index from computed embeddings, which can be saved to disk and loaded later. This feature is suitable for basic vector-based searches in small-scale applications, but it may face efficiency challenges with large datasets due to the use of brute-force techniques.
15+
16+
## Limitations
17+
18+
While simple vector search excels in small-scale applications,avoid using this library if you have the following requirements.
19+
20+
- **Large Datasets**: The current implementation is designed for small-scale applications, and datasets exceeding 100,000 entries may suffer from performance bottlenecks due to the brute-force search approach. For larger datasets, approximate nearest neighbor (ANN) algorithms and specialized data structures should be considered for efficiency.
21+
- **Complex Query Requirements**: The library focuses on simple vector similarity search and does not support advanced query capabilities like multi-field filtering, fuzzy matching, or SQL-like operations that are common in more sophisticated search engines.
22+
- **High-Dimensional Complex Embeddings**: Large language models (LLMs) generate embeddings that are both high-dimensional and computationally intensive. Handling these embeddings in real-time can be taxing on the system unless sufficient GPU resources are available and optimized for low-latency inference.
23+
24+
## 📚 How to Use the Library
25+
26+
This example demonstrates how to use the library to generate embeddings for text and perform a simple vector search. The code snippet below shows how to load a model, generate embeddings for text, create a search index, and perform a search.
27+
28+
1. **Install library**: Precompiled binaries for Windows and Linux are provided in the [dist](dist) directory. If your target architecture or platform isn't covered by these binaries, you'll need to compile the library from the source. Drop these binaries in `/usr/lib` or equivalent.
29+
30+
1. **Load a model**: The `search.NewVectorizer` function initializes a model using a GGUF file. This example loads the _MiniLM-L6-v2.Q8_0.gguf_ model. The second parameter, indicates the number of GPU layers to enable (0 for CPU only).
631

7-
Precompiled binaries for Windows and Linux are available in the [dist](dist) directory. If the architecture/platform you are using is not available, you would need to compile the library yourself.
32+
```go
33+
m, err := search.NewVectorizer("../dist/MiniLM-L6-v2.Q8_0.gguf", 0)
34+
if err != nil {
35+
// handle error
36+
}
37+
defer m.Close()
38+
```
39+
40+
3. **Generate text embeddings**: The `EmbedText` method is used to generate vector embeddings for a given text input. This converts your text into a dense numerical vector representation given the model you loaded in the previous step.
41+
42+
```go
43+
embedding, err := m.EmbedText("Your text here")
44+
```
45+
46+
4. **Create an index and adding vectors**: Create a new index using `search.NewIndex`. The type parameter `[string]` in this example specifies that each vector is associated with a string value. You can add multiple vectors with corresponding labels.
47+
48+
```go
49+
index := search.NewIndex[string]()
50+
index.Add(embedding, "Your text here")
51+
```
52+
53+
5. **Search the index**: Perform a search using the `Search` method, which takes an embedding vector and a number of results to retrieve. This example searches for the 10 most relevant results and prints them along with their relevance scores.
54+
55+
```go
56+
results := index.Search(embedding, 10)
57+
for _, r := range results {
58+
fmt.Printf("Result: %s (Relevance: %.2f)\n", r.Value, r.Relevance)
59+
}
60+
```
861

9-
## Compile library
62+
## 🛠 Compile library
1063

1164
First, clone the repository and its submodules with the following commands. The `--recurse-submodules` flag is used to clone the `ggml` submodule, which is a header-only library for matrix operations.
1265

0 commit comments

Comments
 (0)