image captions using blip. by gsaluja9 · Pull Request #204 · aperture-data/workflows

gsaluja9 · 2025-09-15T20:33:29Z

Adds auto generation of image captions using BLIP.
https://huggingface.co/docs/transformers/main/en/model_doc/blip#transformers.BlipForConditionalGeneration

TODO:

~~Add tests~~ : Adding a validation at build time with a basic script.

Add docs

.devcontainer/caption-image/devcontainer.json

bovlb · 2025-09-18T21:51:51Z

apps/caption-image/Dockerfile

+
+COPY requirements.txt /
+RUN pip install -U pip
+RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu


I think it's cleaner to keep requirements in requirements files. See how this is done in https://github.com/aperture-data/workflows/blob/main/base/docker/scripts/embeddings/requirements_cpu.txt

Thanks ! This is actually a TODO I care about. Actually, I was wondering on how to do this conditionally. Like if I want to specify CPU, cuda or metal at the top level and have images that optimizes for the architecture I am developing on. It'll be a major win. I still havent zeroed in, but as you can see this is WIP. Do you have any ideas for that.

I had a look around and I don't see a good solution that just does the right thing on any architecture. Forcing CPU versions comes closest. To do better, I think we would have to build and deploy multiple versions of the docker image with the device as a build argument.

apps/caption-image/README.md

bovlb · 2025-09-18T22:19:38Z

apps/caption-image/app/images.py

+        query = [{
+            "FindImage": {
+                "constraints": {
+                    self.done_property: ["==", None]


Might be slightly more robust to say ["!=", True].

bovlb · 2025-09-18T22:26:31Z

apps/caption-image/app/images.py

+processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
+model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")


I can't find any documentation that says these are thread-safe.

bovlb · 2025-09-18T22:30:52Z

apps/caption-image/app/images.py

+                "UpdateImage": {
+                    "ref": i + 1,
+                    "properties": {
+                        self.done_property: captions[i]


Oh. So not really a done property then.

yea. should have changed after copying images from embedding extarction. :)

I updated the properties name.

Can we distinguish between "not yet tried to generate a caption" and "I tried but couldn't do it"?

bovlb · 2025-09-18T22:32:28Z

apps/caption-image/Dockerfile

+RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
+RUN pip install --no-cache-dir -r /requirements.txt
+
+COPY app/weights.py /app/weights.py


I think this is intended for cache warmup. It might be better to make that explicit in the name.

sounds good. will cahnge.

workflows-devcontiner.code-workspace

bovlb · 2025-09-18T22:36:26Z

configuration_params.py

@@ -0,0 +1,11 @@
+import platform


Should these files really go at the top level?

image captions usign blip.

38084c4

gsaluja9 requested review from bovlb and drewaogle September 15, 2025 22:42

gsaluja9 marked this pull request as ready for review September 17, 2025 13:52

Adding devcontainers. (#208)

2dfb0b8

gsaluja9 requested a review from luisremis September 17, 2025 18:22

stray file.

e3b5994

bovlb reviewed Sep 18, 2025

View reviewed changes

gsaluja9 and others added 2 commits September 19, 2025 09:29

Some review feedback

4a3651f

Merge branch 'main' into image_captions

5516ccd

		processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
		model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

Conversation

gsaluja9 commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gsaluja9 commented Sep 15, 2025 •

edited

Loading