- Winnipeg, Manitoba, Canada
-
14:33
(UTC -05:00) - https://www.linkedin.com/in/harpreetsahota204
- @datascienceharp
- https://huggingface.co/harpreetsahota
ποΈπ¬ Vision-Language
official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)
Code and datasets for "Whatβs βupβ with vision-language models? Investigating their struggle with spatial reasoning".
π A curated list of visual reasoning papers.
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
[CVPR'24] Validation-free few-shot adaptation of CLIP, using a well-initialized Linear Probe (ZSLP) and class-adaptive constraints (CLAP).
[ACL2023, Findings] Source codes for the paper "Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduction Games"
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
[CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Official repository for the MMFM challenge
[CVPR 2024] MAPLM: A Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
[AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
This is the official repository for the LENS (Large Language Models Enhanced to See) system.
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
CLIP+MLP Aesthetic Score Predictor
Refine high-quality datasets and visual AI models
[ECCV 2024] InstructIR: High-Quality Image Restoration Following Human Instructions https://huggingface.co/spaces/marcosv/InstructIR
γTMM 2025π₯γ Mixture-of-Experts for Large Vision-Language Models
γICLR 2024π₯γ Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"



