From d7cdb0203d89b5e1390c27281e32886a3230a3aa Mon Sep 17 00:00:00 2001 From: Satish K C Date: Fri, 27 Mar 2026 11:53:27 -0500 Subject: [PATCH] docs: update llama.cpp repo links from ggerganov to ggml-org --- .../pipeline-components/generators/llamacppchatgenerator.mdx | 2 +- .../docs/pipeline-components/generators/llamacppgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppchatgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppchatgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppchatgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppchatgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppchatgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppchatgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppchatgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppchatgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppchatgenerator.mdx | 2 +- .../pipeline-components/generators/llamacppgenerator.mdx | 2 +- 20 files changed, 20 insertions(+), 20 deletions(-) diff --git a/docs-website/docs/pipeline-components/generators/llamacppchatgenerator.mdx b/docs-website/docs/pipeline-components/generators/llamacppchatgenerator.mdx index e7df99bb25..8ec53623c5 100644 --- a/docs-website/docs/pipeline-components/generators/llamacppchatgenerator.mdx +++ b/docs-website/docs/pipeline-components/generators/llamacppchatgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/docs/pipeline-components/generators/llamacppgenerator.mdx b/docs-website/docs/pipeline-components/generators/llamacppgenerator.mdx index c45fb8c546..714eb42db3 100644 --- a/docs-website/docs/pipeline-components/generators/llamacppgenerator.mdx +++ b/docs-website/docs/pipeline-components/generators/llamacppgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.18/pipeline-components/generators/llamacppchatgenerator.mdx b/docs-website/versioned_docs/version-2.18/pipeline-components/generators/llamacppchatgenerator.mdx index 3d52555c74..54c73d4575 100644 --- a/docs-website/versioned_docs/version-2.18/pipeline-components/generators/llamacppchatgenerator.mdx +++ b/docs-website/versioned_docs/version-2.18/pipeline-components/generators/llamacppchatgenerator.mdx @@ -20,7 +20,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.18/pipeline-components/generators/llamacppgenerator.mdx b/docs-website/versioned_docs/version-2.18/pipeline-components/generators/llamacppgenerator.mdx index d93716407e..095d0f8154 100644 --- a/docs-website/versioned_docs/version-2.18/pipeline-components/generators/llamacppgenerator.mdx +++ b/docs-website/versioned_docs/version-2.18/pipeline-components/generators/llamacppgenerator.mdx @@ -20,7 +20,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.19/pipeline-components/generators/llamacppchatgenerator.mdx b/docs-website/versioned_docs/version-2.19/pipeline-components/generators/llamacppchatgenerator.mdx index df8183fe68..d8cab82fcc 100644 --- a/docs-website/versioned_docs/version-2.19/pipeline-components/generators/llamacppchatgenerator.mdx +++ b/docs-website/versioned_docs/version-2.19/pipeline-components/generators/llamacppchatgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.19/pipeline-components/generators/llamacppgenerator.mdx b/docs-website/versioned_docs/version-2.19/pipeline-components/generators/llamacppgenerator.mdx index 34889a37b8..27036f082b 100644 --- a/docs-website/versioned_docs/version-2.19/pipeline-components/generators/llamacppgenerator.mdx +++ b/docs-website/versioned_docs/version-2.19/pipeline-components/generators/llamacppgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.20/pipeline-components/generators/llamacppchatgenerator.mdx b/docs-website/versioned_docs/version-2.20/pipeline-components/generators/llamacppchatgenerator.mdx index bbdf4d2744..ede636579f 100644 --- a/docs-website/versioned_docs/version-2.20/pipeline-components/generators/llamacppchatgenerator.mdx +++ b/docs-website/versioned_docs/version-2.20/pipeline-components/generators/llamacppchatgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.20/pipeline-components/generators/llamacppgenerator.mdx b/docs-website/versioned_docs/version-2.20/pipeline-components/generators/llamacppgenerator.mdx index 6d05117f21..ee46d410ca 100644 --- a/docs-website/versioned_docs/version-2.20/pipeline-components/generators/llamacppgenerator.mdx +++ b/docs-website/versioned_docs/version-2.20/pipeline-components/generators/llamacppgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.21/pipeline-components/generators/llamacppchatgenerator.mdx b/docs-website/versioned_docs/version-2.21/pipeline-components/generators/llamacppchatgenerator.mdx index bbdf4d2744..ede636579f 100644 --- a/docs-website/versioned_docs/version-2.21/pipeline-components/generators/llamacppchatgenerator.mdx +++ b/docs-website/versioned_docs/version-2.21/pipeline-components/generators/llamacppchatgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.21/pipeline-components/generators/llamacppgenerator.mdx b/docs-website/versioned_docs/version-2.21/pipeline-components/generators/llamacppgenerator.mdx index 6d05117f21..ee46d410ca 100644 --- a/docs-website/versioned_docs/version-2.21/pipeline-components/generators/llamacppgenerator.mdx +++ b/docs-website/versioned_docs/version-2.21/pipeline-components/generators/llamacppgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.22/pipeline-components/generators/llamacppchatgenerator.mdx b/docs-website/versioned_docs/version-2.22/pipeline-components/generators/llamacppchatgenerator.mdx index bbdf4d2744..ede636579f 100644 --- a/docs-website/versioned_docs/version-2.22/pipeline-components/generators/llamacppchatgenerator.mdx +++ b/docs-website/versioned_docs/version-2.22/pipeline-components/generators/llamacppchatgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.22/pipeline-components/generators/llamacppgenerator.mdx b/docs-website/versioned_docs/version-2.22/pipeline-components/generators/llamacppgenerator.mdx index 6d05117f21..ee46d410ca 100644 --- a/docs-website/versioned_docs/version-2.22/pipeline-components/generators/llamacppgenerator.mdx +++ b/docs-website/versioned_docs/version-2.22/pipeline-components/generators/llamacppgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.23/pipeline-components/generators/llamacppchatgenerator.mdx b/docs-website/versioned_docs/version-2.23/pipeline-components/generators/llamacppchatgenerator.mdx index bbdf4d2744..ede636579f 100644 --- a/docs-website/versioned_docs/version-2.23/pipeline-components/generators/llamacppchatgenerator.mdx +++ b/docs-website/versioned_docs/version-2.23/pipeline-components/generators/llamacppchatgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.23/pipeline-components/generators/llamacppgenerator.mdx b/docs-website/versioned_docs/version-2.23/pipeline-components/generators/llamacppgenerator.mdx index 6d05117f21..ee46d410ca 100644 --- a/docs-website/versioned_docs/version-2.23/pipeline-components/generators/llamacppgenerator.mdx +++ b/docs-website/versioned_docs/version-2.23/pipeline-components/generators/llamacppgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.24/pipeline-components/generators/llamacppchatgenerator.mdx b/docs-website/versioned_docs/version-2.24/pipeline-components/generators/llamacppchatgenerator.mdx index bbdf4d2744..ede636579f 100644 --- a/docs-website/versioned_docs/version-2.24/pipeline-components/generators/llamacppchatgenerator.mdx +++ b/docs-website/versioned_docs/version-2.24/pipeline-components/generators/llamacppchatgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.24/pipeline-components/generators/llamacppgenerator.mdx b/docs-website/versioned_docs/version-2.24/pipeline-components/generators/llamacppgenerator.mdx index 6d05117f21..ee46d410ca 100644 --- a/docs-website/versioned_docs/version-2.24/pipeline-components/generators/llamacppgenerator.mdx +++ b/docs-website/versioned_docs/version-2.24/pipeline-components/generators/llamacppgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.25/pipeline-components/generators/llamacppchatgenerator.mdx b/docs-website/versioned_docs/version-2.25/pipeline-components/generators/llamacppchatgenerator.mdx index c208c3a54c..52646dd73a 100644 --- a/docs-website/versioned_docs/version-2.25/pipeline-components/generators/llamacppchatgenerator.mdx +++ b/docs-website/versioned_docs/version-2.25/pipeline-components/generators/llamacppchatgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.25/pipeline-components/generators/llamacppgenerator.mdx b/docs-website/versioned_docs/version-2.25/pipeline-components/generators/llamacppgenerator.mdx index c45fb8c546..714eb42db3 100644 --- a/docs-website/versioned_docs/version-2.25/pipeline-components/generators/llamacppgenerator.mdx +++ b/docs-website/versioned_docs/version-2.25/pipeline-components/generators/llamacppgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.26/pipeline-components/generators/llamacppchatgenerator.mdx b/docs-website/versioned_docs/version-2.26/pipeline-components/generators/llamacppchatgenerator.mdx index e7df99bb25..8ec53623c5 100644 --- a/docs-website/versioned_docs/version-2.26/pipeline-components/generators/llamacppchatgenerator.mdx +++ b/docs-website/versioned_docs/version-2.26/pipeline-components/generators/llamacppchatgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` enables chat completion using an LLM running o ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format, which can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppChatGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization. diff --git a/docs-website/versioned_docs/version-2.26/pipeline-components/generators/llamacppgenerator.mdx b/docs-website/versioned_docs/version-2.26/pipeline-components/generators/llamacppgenerator.mdx index c45fb8c546..714eb42db3 100644 --- a/docs-website/versioned_docs/version-2.26/pipeline-components/generators/llamacppgenerator.mdx +++ b/docs-website/versioned_docs/version-2.26/pipeline-components/generators/llamacppgenerator.mdx @@ -24,7 +24,7 @@ description: "`LlamaCppGenerator` provides an interface to generate text using a ## Overview -[Llama.cpp](https://github.com/ggerganov/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). +[Llama.cpp](https://github.com/ggml-org/llama.cpp) is a library written in C/C++ for efficient inference of Large Language Models. It leverages the efficient quantized GGUF format, dramatically reducing memory requirements and accelerating inference. This means it is possible to run LLMs efficiently on standard machines (even without GPUs). `Llama.cpp` uses the quantized binary file of the LLM in GGUF format that can be downloaded from [Hugging Face](https://huggingface.co/models?library=gguf). `LlamaCppGenerator` supports models running on `Llama.cpp` by taking the path to the locally saved GGUF file as `model` parameter at initialization.