harpreetsahota204

Harpreet Sahota harpreetsahota204

🤖 Hacker-in-Residence @ Voxel51 | 👨🏽‍💻 AI/ML Engineer

179 followers · 43 following

Winnipeg, Manitoba, Canada
14:33 (UTC -05:00)
https://www.linkedin.com/in/harpreetsahota204
@datascienceharp
https://huggingface.co/harpreetsahota

Achievements

x2 x2

Achievements

x2 x2

Stars

👁️💬 Vision-Language

Repos related to all things Vision-Language models

57 repositories

yossigandelsman / clip_text_span

official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"

Jupyter Notebook 233 28 Updated Jun 1, 2025

SunzeY / AlphaCLIP

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Jupyter Notebook 869 58 Updated Jul 20, 2025

Understanding-Visual-Datasets / VisDiff

Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)

Jupyter Notebook 132 14 Updated Nov 5, 2025

amitakamath / whatsup_vlms

Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".

Python 71 8 Updated Feb 28, 2024

hughplay / Visual-Reasoning-Papers

📄 A curated list of visual reasoning papers.

TeX 31 2 Updated Mar 4, 2026

clova-tool / CLOVA-tool

Python 30 1 Updated Jun 19, 2024

csuhan / OneLLM

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language

Python 665 37 Updated Oct 22, 2024

shikiw / OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Python 401 33 Updated Aug 24, 2024

isekai-portal / Link-Context-Learning

Python 101 7 Updated May 16, 2024

X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

Python 2,539 189 Updated Apr 2, 2025

jingyi0000 / Awesome-Visual-Instruction-Tuning

Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey

9 Updated Feb 16, 2024

jusiro / CLAP

[CVPR'24] Validation-free few-shot adaptation of CLIP, using a well-initialized Linear Probe (ZSLP) and class-adaptive constraints (CLAP).

Python 82 3 Updated Jun 7, 2025

SALT-NLP / PersuationGames

[ACL2023, Findings] Source codes for the paper "Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduction Games"

Python 16 5 Updated Feb 22, 2025

facebookresearch / MetaCLIP

NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024

Python 1,826 76 Updated Nov 27, 2025

keio-smilab24 / Polos

[CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

Python 33 1 Updated May 25, 2025

jmiemirza / MMFM-Challenge

Official repository for the MMFM challenge

Python 25 5 Updated Jun 18, 2024

LLVM-AD / MAPLM

[CVPR 2024] MAPLM: A Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding

Python 164 3 Updated Nov 20, 2023

snap-research / MyVLM

Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)

Python 186 13 Updated Jul 5, 2024

OpenHelix-Team / cobra

[AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference

Python 293 13 Updated Jan 8, 2025

ContextualAI / lens

This is the official repository for the LENS (Large Language Models Enhanced to See) system.

Jupyter Notebook 355 12 Updated Jul 22, 2025

AILab-CVC / SEED-Bench

(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.

Python 362 13 Updated Jan 14, 2025

vis-nlp / ChartQA

Python 245 26 Updated Apr 18, 2025

christophschuhmann / improved-aesthetic-predictor

CLIP+MLP Aesthetic Score Predictor

Python 1,268 113 Updated Jul 1, 2024

apple / ml-mgie

Python 3,886 253 Updated Mar 15, 2024

voxel51 / fiftyone

Refine high-quality datasets and visual AI models

Python 10,491 730 Updated Mar 21, 2026

mv-lab / InstructIR

[ECCV 2024] InstructIR: High-Quality Image Restoration Following Human Instructions https://huggingface.co/spaces/marcosv/InstructIR

Jupyter Notebook 714 44 Updated Sep 26, 2024

PKU-YuanGroup / MoE-LLaVA

【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models

Python 2,308 143 Updated Jul 15, 2025

PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Python 874 59 Updated Mar 25, 2024

Doubiiu / DynamiCrafter

[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Python 3,006 245 Updated Sep 8, 2024

CircleRadon / Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Python 837 43 Updated Aug 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harpreet Sahota harpreetsahota204

Achievements

Achievements

Block or report harpreetsahota204

👁️💬 Vision-Language

yossigandelsman / clip_text_span

SunzeY / AlphaCLIP

Understanding-Visual-Datasets / VisDiff

amitakamath / whatsup_vlms

hughplay / Visual-Reasoning-Papers

clova-tool / CLOVA-tool

csuhan / OneLLM

shikiw / OPERA

isekai-portal / Link-Context-Learning

X-PLUG / mPLUG-Owl

jingyi0000 / Awesome-Visual-Instruction-Tuning

jusiro / CLAP

SALT-NLP / PersuationGames

facebookresearch / MetaCLIP

keio-smilab24 / Polos

jmiemirza / MMFM-Challenge

LLVM-AD / MAPLM

snap-research / MyVLM

OpenHelix-Team / cobra

ContextualAI / lens

AILab-CVC / SEED-Bench

vis-nlp / ChartQA

christophschuhmann / improved-aesthetic-predictor

apple / ml-mgie

voxel51 / fiftyone

mv-lab / InstructIR

PKU-YuanGroup / MoE-LLaVA

PKU-YuanGroup / LanguageBind

Doubiiu / DynamiCrafter

CircleRadon / Osprey