serengil
diff --git a/‎README.md‎
Lines changed: 16 additions & 7 deletions b/‎README.md‎
Lines changed: 16 additions & 7 deletions
diff --git a/‎deepface/DeepFace.py‎
Lines changed: 8 additions & 6 deletions b/‎deepface/DeepFace.py‎
Lines changed: 8 additions & 6 deletions
diff --git a/‎deepface/api/src/dependencies/variables.py‎
Lines changed: 2 additions & 0 deletions b/‎deepface/api/src/dependencies/variables.py‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎deepface/modules/database/pinecone.py‎
Lines changed: 252 additions & 0 deletions b/‎deepface/modules/database/pinecone.py‎
Lines changed: 252 additions & 0 deletions
@@ -77,7 +77,7 @@ dfs: List[pd.DataFrame] = DeepFace.find(img_path = "img1.jpg", db_path = "C:/my_
 
 <p align="center"><img src="https://raw.githubusercontent.com/serengil/deepface/master/icon/stock-6-v2.jpg" width="95%"></p>
 
-Here, the `find` function relies on a directory-based face datastore and stores embeddings on disk. Alternatively, DeepFace provides a database-backed [`search`](https://sefiks.com/2026/01/01/introducing-brand-new-face-recognition-in-deepface/) functionality where embeddings are explicitly registered and queried. Currently, [postgres](https://sefiks.com/2023/06/22/vector-similarity-search-in-postgresql/), [mongo](https://sefiks.com/2021/01/22/deep-face-recognition-with-mongodb/), [neo4j](https://sefiks.com/2021/04/03/deep-face-recognition-with-neo4j/), [pgvector](https://sefiks.com/2024/07/05/postgres-as-a-vector-database-billion-scale-vector-similarity-search-with-pgvector/) and weaviate are supported as backend databases.
+Here, the `find` function relies on a directory-based face datastore and stores embeddings on disk. Alternatively, DeepFace provides a database-backed [`search`](https://sefiks.com/2026/01/01/introducing-brand-new-face-recognition-in-deepface/) functionality where embeddings are explicitly registered and queried. Currently, [postgres](https://sefiks.com/2023/06/22/vector-similarity-search-in-postgresql/), [mongo](https://sefiks.com/2021/01/22/deep-face-recognition-with-mongodb/), [neo4j](https://sefiks.com/2021/04/03/deep-face-recognition-with-neo4j/), [pgvector](https://sefiks.com/2024/07/05/postgres-as-a-vector-database-billion-scale-vector-similarity-search-with-pgvector/), [pinecone](https://sefiks.com/2021/05/19/large-scale-face-recognition-with-pinecone-vector-database/) and weaviate are supported as backend databases.
 
 ```python
 # register an image into the database
@@ -87,7 +87,7 @@ DeepFace.register(img = "img1.jpg")
 dfs: List[pd.DataFrame] = DeepFace.search(img = "target.jpg")
 ```
 
-If you want to perform [`approximate nearest neighbor`](https://sefiks.com/2023/12/31/a-step-by-step-approximate-nearest-neighbor-example-in-python-from-scratch/) search instead of exact search to achieve faster results on [large-scale databases](https://www.youtube.com/playlist?list=PLsS_1RYmYQQGSJu_Z3OVhXhGmZ86_zuIm), you can build an index beforehand and explicitly enable ANN search. Here, [Faiss](https://sefiks.com/2020/09/17/large-scale-face-recognition-with-facebook-faiss/) is used to index embeddings in postgres and mongo; whereas pgvector, weaviate and neo4j handle indexing internally.
+If you want to perform [`approximate nearest neighbor`](https://sefiks.com/2023/12/31/a-step-by-step-approximate-nearest-neighbor-example-in-python-from-scratch/) search instead of exact search to achieve faster results on [large-scale databases](https://www.youtube.com/playlist?list=PLsS_1RYmYQQGSJu_Z3OVhXhGmZ86_zuIm), you can build an index beforehand and explicitly enable ANN search. Here, [Faiss](https://sefiks.com/2020/09/17/large-scale-face-recognition-with-facebook-faiss/) is used to index embeddings in postgres and mongo; whereas vector databases such as pgvector, weaviate, pinecone and neo4j handle indexing internally.
 
 ```python
 # build index on registered embeddings (for postgres and mongo only)
@@ -316,11 +316,20 @@ cd scripts && ./dockerize.sh
 Face verification, facial attribute analysis, vector representation and register & search functions are covered in the API. The API accepts images as file uploads (via form data), or as exact image paths, URLs, or base64-encoded strings (via either JSON or form data).
 
 ```shell
-$ curl -X POST http://localhost:5005/represent -d '{"model_name":"Facenet", "img":"img1.jpg"}' -H "Content-Type: application/json"
-$ curl -X POST http://localhost:5005/verify -d '{"img1":"img1.jpg", "img2":"img3.jpg"}' -H "Content-Type: application/json"
-$ curl -X POST http://localhost:5005/analyze -d '{"img": "img2.jpg", "actions": ["age", "gender"]}' -H "Content-Type: application/json"
-$ curl -X POST http://localhost:5005/register -d '{"model_name":"Facenet", "img":"img18.jpg"}' -H "Content-Type: application/json"
-$ curl -X POST http://localhost:5005/search -d '{"img":"img1.jpg", "model_name":"Facenet"}' -H "Content-Type: application/json"
+$ curl -X POST http://localhost:5005/represent \
+   -d '{"model_name":"Facenet", "img":"img1.jpg"}'
+
+$ curl -X POST http://localhost:5005/verify \
+   -d '{"img1":"img1.jpg", "img2":"img3.jpg"}'
+
+$ curl -X POST http://localhost:5005/analyze \
+   -d '{"img": "img2.jpg", "actions": ["age", "gender"]}'
+
+$ curl -X POST http://localhost:5005/register \
+   -d '{"model_name":"Facenet", "img":"img18.jpg"}'
+
+$ curl -X POST http://localhost:5005/search \
+   -d '{"img":"img1.jpg", "model_name":"Facenet"}'
 ```
 
 [`Here`](https://github.com/serengil/deepface/tree/master/deepface/api/postman), you can find a postman project to find out how these methods should be called.
 
@@ -760,7 +760,7 @@ def register(
             Options: base, raw, Facenet, Facenet2018, VGGFace, VGGFace2, ArcFace (default is base).
         anti_spoofing (boolean): Flag to enable anti spoofing (default is False).
         database_type (str): Type of database to register identities. Options: 'postgres', 'mongo',
-            'weaviate', 'neo4j', 'pgvector' (default is 'postgres').
+            'weaviate', 'neo4j', 'pgvector', 'pinecone' (default is 'postgres').
         connection_details (dict or str): Connection details for the database.
         connection (Any): Existing database connection object. If provided, this connection
             will be used instead of creating a new one.
@@ -772,7 +772,7 @@ def register(
             - DEEPFACE_MONGO_URI
             - DEEPFACE_WEAVIATE_URI
             - DEEPFACE_NEO4J_URI
-
+            - DEEPFACE_PINECONE_API_KEY
     Returns:
         result (dict): A dictionary containing registration results with following keys.
             - inserted (int): Number of embeddings successfully registered to the database.
@@ -844,7 +844,7 @@ def search(
         search_method (str): Method to use for searching identities. Options: 'exact', 'ann'.
             To use ann search, you must run build_index function first to create the index.
         database_type (str): Type of database to search identities. Options: 'postgres', 'mongo',
-            'weaviate', 'neo4j', 'pgvector' (default is 'postgres').
+            'weaviate', 'neo4j', 'pgvector', 'pinecone' (default is 'postgres').
         connection_details (dict or str): Connection details for the database.
         connection (Any): Existing database connection object. If provided, this connection
             will be used instead of creating a new one.
@@ -856,7 +856,7 @@ def search(
             - DEEPFACE_MONGO_URI
             - DEEPFACE_WEAVIATE_URI
             - DEEPFACE_NEO4J_URI
-
+            - DEEPFACE_PINECONE_API_KEY
     Returns:
         results (List[pd.DataFrame]):
             A list of pandas dataframes or a list of dicts. Each dataframe or dict corresponds
@@ -919,7 +919,8 @@ def build_index(
     - Use this function after registering all identities to the database.
     - This function is resumable, run again whenever new identities are added to the db.
     - Vector databases handle indexing internally, so you don't need to use this function
-        when using a vector database ('weaviate', 'neo4j', 'pgvector') as database_type.
+        when using a vector database ('weaviate', 'neo4j', 'pgvector', 'pinecone')
+        as database_type.
 
     Args:
         model_name (str): Model for face recognition. Options: VGG-Face, Facenet, Facenet512,
@@ -933,7 +934,7 @@ def build_index(
         max_neighbors_per_node (int): Maximum number of neighbors per node in the index
             (default is 32).
         database_type (str): Type of database to build index. Options: 'postgres', 'mongo',
-            'weaviate', 'neo4j', 'pgvector' (default is 'postgres').
+            'weaviate', 'neo4j', 'pgvector', 'pinecone' (default is 'postgres').
         connection (Any): Existing database connection object. If provided, this connection
             will be used instead of creating a new one.
         connection_details (dict or str): Connection details for the database.
@@ -945,6 +946,7 @@ def build_index(
             - DEEPFACE_MONGO_URI
             - DEEPFACE_WEAVIATE_URI
             - DEEPFACE_NEO4J_URI
+            - DEEPFACE_PINECONE_API_KEY
     """
     return datastore.build_index(
         model_name=model_name,
 
@@ -15,6 +15,8 @@ def __init__(self) -> None:
             conection_details = os.getenv("DEEPFACE_WEAVIATE_URI")
         elif self.database_type == "neo4j":
             conection_details = os.getenv("DEEPFACE_NEO4J_URI")
+        elif self.database_type == "pinecone":
+            conection_details = os.getenv("DEEPFACE_PINECONE_API_KEY")
         else:
             conection_details = None
 
 
@@ -0,0 +1,252 @@
+# built-in dependencies
+import os
+import json
+import hashlib
+import struct
+import math
+from typing import Any, Dict, Optional, List, Union
+
+# project dependencies
+from deepface.modules.database.types import Database
+from deepface.modules.modeling import build_model
+from deepface.commons.logger import Logger
+
+logger = Logger()
+
+
+class PineconeClient(Database):
+    """
+    Pinecone client for storing and retrieving face embeddings and indices.
+    """
+
+    def __init__(
+        self,
+        connection_details: Optional[Union[str, Dict[str, Any]]] = None,
+        connection: Any = None,
+    ):
+        try:
+            from pinecone import Pinecone, ServerlessSpec
+        except (ModuleNotFoundError, ImportError) as e:
+            raise ValueError(
+                "pinecone is an optional dependency. Install with 'pip install pinecone'"
+            ) from e
+
+        self.pinecone = Pinecone
+        self.serverless_spec = ServerlessSpec
+
+        if connection is not None:
+            self.client = connection
+        else:
+            self.conn_details = connection_details or os.environ.get("DEEPFACE_PINECONE_API_KEY")
+            if not isinstance(self.conn_details, str):
+                raise ValueError(
+                    "Pinecone api key must be provided as a string in connection_details "
+                    "or via DEEPFACE_PINECONE_API_KEY environment variable."
+                )
+
+            self.client = self.pinecone(api_key=self.conn_details)
+
+    def initialize_database(self, **kwargs: Any) -> None:
+        """
+        Ensure Pinecone index exists.
+        """
+        model_name = kwargs.get("model_name", "VGG-Face")
+        detector_backend = kwargs.get("detector_backend", "opencv")
+        aligned = kwargs.get("aligned", True)
+        l2_normalized = kwargs.get("l2_normalized", False)
+
+        index_name = self.__generate_index_name(
+            model_name, detector_backend, aligned, l2_normalized
+        )
+
+        if self.client.has_index(index_name):
+            logger.debug(f"Pinecone index '{index_name}' already exists.")
+            return
+
+        model = build_model(task="facial_recognition", model_name=model_name)
+        dimensions = model.output_shape
+        similarity_function = "cosine" if l2_normalized else "euclidean"
+
+        self.client.create_index(
+            name=index_name,
+            dimension=dimensions,
+            metric=similarity_function,
+            spec=self.serverless_spec(
+                cloud=os.getenv("DEEPFACE_PINECONE_CLOUD", "aws"),
+                region=os.getenv("DEEPFACE_PINECONE_REGION", "us-east-1"),
+            ),
+        )
+        logger.debug(f"Created Pinecone index '{index_name}' with dimension {dimensions}.")
+
+    def insert_embeddings(self, embeddings: List[Dict[str, Any]], batch_size: int = 100) -> int:
+        """
+        Insert embeddings into Pinecone database in batches.
+        """
+        if not embeddings:
+            raise ValueError("No embeddings to insert.")
+
+        self.initialize_database(
+            model_name=embeddings[0]["model_name"],
+            detector_backend=embeddings[0]["detector_backend"],
+            aligned=embeddings[0]["aligned"],
+            l2_normalized=embeddings[0]["l2_normalized"],
+        )
+
+        index_name = self.__generate_index_name(
+            embeddings[0]["model_name"],
+            embeddings[0]["detector_backend"],
+            embeddings[0]["aligned"],
+            embeddings[0]["l2_normalized"],
+        )
+
+        # connect to the index
+        index = self.client.Index(index_name)
+
+        total = 0
+        for i in range(0, len(embeddings), batch_size):
+            batch = embeddings[i : i + batch_size]
+            vectors = []
+            for e in batch:
+                face_json = json.dumps(e["face"].tolist())
+                face_hash = hashlib.sha256(face_json.encode()).hexdigest()
+                embedding_bytes = struct.pack(f'{len(e["embedding"])}d', *e["embedding"])
+                embedding_hash = hashlib.sha256(embedding_bytes).hexdigest()
+
+                vectors.append(
+                    {
+                        "id": f"{face_hash}:{embedding_hash}",
+                        "values": e["embedding"],
+                        "metadata": {
+                            "img_name": e["img_name"],
+                            # "face": e["face"].tolist(),
+                            # "face_shape": list(e["face"].shape),
+                        },
+                    }
+                )
+            index.upsert(vectors=vectors)
+            total += len(vectors)
+
+        return total
+
+    def search_by_vector(
+        self,
+        vector: List[float],
+        model_name: str = "VGG-Face",
+        detector_backend: str = "opencv",
+        aligned: bool = True,
+        l2_normalized: bool = False,
+        limit: int = 10,
+    ) -> List[Dict[str, Any]]:
+        """
+        ANN search using the main vector (embedding).
+        """
+        out: List[Dict[str, Any]] = []
+
+        self.initialize_database(
+            model_name=model_name,
+            detector_backend=detector_backend,
+            aligned=aligned,
+            l2_normalized=l2_normalized,
+        )
+
+        index_name = self.__generate_index_name(
+            model_name, detector_backend, aligned, l2_normalized
+        )
+
+        index = self.client.Index(index_name)
+        results = index.query(
+            vector=vector,
+            top_k=limit,
+            include_metadata=True,
+            include_values=False,
+        )
+
+        if not results.matches:
+            return out
+
+        for res in results.matches:
+            score = float(res.score)
+            if l2_normalized:
+                distance = 1 - score
+            else:
+                distance = math.sqrt(max(score, 0.0))
+
+            out.append(
+                {
+                    "id": res.id,
+                    "distance": distance,
+                    "img_name": res.metadata.get("img_name"),
+                }
+            )
+        return out
+
+    def fetch_all_embeddings(
+        self,
+        model_name: str,
+        detector_backend: str,
+        aligned: bool,
+        l2_normalized: bool,
+        batch_size: int = 1000,
+    ) -> List[Dict[str, Any]]:
+        """
+        Fetch all embeddings from Pinecone database in batches.
+        """
+        out: List[Dict[str, Any]] = []
+
+        self.initialize_database(
+            model_name=model_name,
+            detector_backend=detector_backend,
+            aligned=aligned,
+            l2_normalized=l2_normalized,
+        )
+
+        index_name = self.__generate_index_name(
+            model_name, detector_backend, aligned, l2_normalized
+        )
+
+        index = self.client.Index(index_name)
+
+        # Fetch all IDs
+        ids: List[str] = []
+        for _id in index.list():
+            ids.extend(_id)
+
+        for i in range(0, len(ids), batch_size):
+            batch_ids = ids[i : i + batch_size]
+            fetched = index.fetch(ids=batch_ids)
+            for _id, v in fetched.get("vectors", {}).items():
+                md = v.get("metadata") or {}
+                out.append(
+                    {
+                        "id": _id,
+                        "embedding": v.get("values"),
+                        "img_name": md.get("img_name"),
+                        "face_hash": md.get("face_hash"),
+                        "embedding_hash": md.get("embedding_hash"),
+                    }
+                )
+
+        return out
+
+    def close(self) -> None:
+        """Pinecone client does not require explicit closure"""
+        return
+
+    @staticmethod
+    def __generate_index_name(
+        model_name: str,
+        detector_backend: str,
+        aligned: bool,
+        l2_normalized: bool,
+    ) -> str:
+        """
+        Generate Pinecone index name based on parameters.
+        """
+        index_name_attributes = [
+            "embeddings",
+            model_name.replace("-", ""),
+            detector_backend,
+            "Aligned" if aligned else "Unaligned",
+            "Norm" if l2_normalized else "Raw",
+        ]
+        return "-".join(index_name_attributes).lower()