Skip to content
View harpreetsahota204's full-sized avatar

Block or report harpreetsahota204

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

πŸ‘οΈπŸ’¬ Vision-Language

Repos related to all things Vision-Language models
57 repositories

official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"

Jupyter Notebook 233 28 Updated Jun 1, 2025

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Jupyter Notebook 869 58 Updated Jul 20, 2025

Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)

Jupyter Notebook 132 14 Updated Nov 5, 2025

Code and datasets for "What’s β€œup” with vision-language models? Investigating their struggle with spatial reasoning".

Python 71 8 Updated Feb 28, 2024

πŸ“„ A curated list of visual reasoning papers.

TeX 31 2 Updated Mar 4, 2026
Python 30 1 Updated Jun 19, 2024

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language

Python 665 37 Updated Oct 22, 2024

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Python 401 33 Updated Aug 24, 2024

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

Python 2,539 189 Updated Apr 2, 2025

Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey

9 Updated Feb 16, 2024

[CVPR'24] Validation-free few-shot adaptation of CLIP, using a well-initialized Linear Probe (ZSLP) and class-adaptive constraints (CLAP).

Python 82 3 Updated Jun 7, 2025

[ACL2023, Findings] Source codes for the paper "Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduction Games"

Python 16 5 Updated Feb 22, 2025

NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024

Python 1,826 76 Updated Nov 27, 2025

[CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

Python 33 1 Updated May 25, 2025

Official repository for the MMFM challenge

Python 25 5 Updated Jun 18, 2024

[CVPR 2024] MAPLM: A Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding

Python 164 3 Updated Nov 20, 2023

Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)

Python 186 13 Updated Jul 5, 2024

[AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference

Python 293 13 Updated Jan 8, 2025

This is the official repository for the LENS (Large Language Models Enhanced to See) system.

Jupyter Notebook 355 12 Updated Jul 22, 2025

(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.

Python 362 13 Updated Jan 14, 2025
Python 245 26 Updated Apr 18, 2025

CLIP+MLP Aesthetic Score Predictor

Python 1,268 113 Updated Jul 1, 2024
Python 3,886 253 Updated Mar 15, 2024

Refine high-quality datasets and visual AI models

Python 10,491 730 Updated Mar 21, 2026

[ECCV 2024] InstructIR: High-Quality Image Restoration Following Human Instructions https://huggingface.co/spaces/marcosv/InstructIR

Jupyter Notebook 714 44 Updated Sep 26, 2024

【TMM 2025πŸ”₯】 Mixture-of-Experts for Large Vision-Language Models

Python 2,308 143 Updated Jul 15, 2025

【ICLR 2024πŸ”₯】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Python 874 59 Updated Mar 25, 2024

[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Python 3,006 245 Updated Sep 8, 2024

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Python 837 43 Updated Aug 19, 2025