Skip to content

This repository is used to collect papers and code in the field of AI.

License

Notifications You must be signed in to change notification settings

songqiang321/Awesome-AI-Papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

297 Commits
 
 
 
 

Repository files navigation

Awesome-AI-Papers

This repository is used to collect papers and code in the field of AI. The contents contain the following parts:

Table of Content

  ├─ NLP/  
  │  ├─ Word2Vec/  
  │  ├─ Seq2Seq/           
  │  └─ Pretraining/  
  │    ├─ Large Language Model/          
  │    ├─ LLM Application/ 
  │      ├─ AI Agent/          
  │      ├─ Academic/          
  │      ├─ Code/       
  │      ├─ Financial Application/
  │      ├─ Information Retrieval/  
  │      ├─ Math/     
  │      ├─ Medicine and Law/   
  │      ├─ Recommend System/      
  │      └─ Tool Learning/             
  │    ├─ LLM Technique/ 
  │      ├─ Alignment/          
  │      ├─ Context Length/          
  │      ├─ Corpus/       
  │      ├─ Evaluation/
  │      ├─ Hallucination/  
  │      ├─ Inference/     
  │      ├─ MoE/   
  │      ├─ PEFT/     
  │      ├─ Prompt Learning/   
  │      ├─ RAG/       
  │      └─ Reasoning and Planning/       
  │    ├─ LLM Theory/       
  │    └─ Chinese Model/             
  ├─ CV/  
  │  ├─ CV Application/          
  │  ├─ Contrastive Learning/         
  │  ├─ Foundation Model/ 
  │  ├─ Generative Model (GAN and VAE)/          
  │  ├─ Image Editing/          
  │  ├─ Object Detection/          
  │  ├─ Semantic Segmentation/            
  │  └─ Video/          
  ├─ Multimodal/       
  │  ├─ Audio/          
  │  ├─ BLIP/         
  │  ├─ CLIP/        
  │  ├─ Diffusion Model/   
  │  ├─ Multimodal LLM/          
  │  ├─ Text2Image/          
  │  ├─ Text2Video/            
  │  └─ Survey/           
  │─ Reinforcement Learning/ 
  │─ GNN/ 
  └─ Transformer Architecture/          

NLP

1. Word2Vec

  • Efficient Estimation of Word Representations in Vector Space, Mikolov et al., arxiv 2013. [paper]
  • Distributed Representations of Words and Phrases and their Compositionality, Mikolov et al., NIPS 2013. [paper]
  • Distributed representations of sentences and documents, Le and Mikolov, ICML 2014. [paper]
  • Word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method, Goldberg and Levy, arxiv 2014. [paper]
  • word2vec Parameter Learning Explained, Rong, arxiv 2014. [paper]
  • Glove: Global vectors for word representation.Pennington et al., EMNLP 2014. [paper][code]
  • fastText: Bag of Tricks for Efficient Text Classification, Joulin et al., arxiv 2016. [paper][code]
  • ELMo: Deep Contextualized Word Representations, Peters et al., NAACL 2018. [paper]
  • Distilling the Knowledge in a Neural Network, Hinton et al., arxiv 2015. [paper][FitNets]
  • BPE: Neural Machine Translation of Rare Words with Subword Units, Sennrich et al., ACL 2016. [paper][code]
  • Byte-Level BPE: Neural Machine Translation with Byte-Level Subwords, Wang et al., arxiv 2019. [paper][code]

2. Seq2Seq

  • Generating Sequences With Recurrent Neural Networks, Graves, arxiv 2013. [paper]
  • Sequence to Sequence Learning with Neural Networks, Sutskever et al., NeruIPS 2014. [paper]
  • Neural Machine Translation by Jointly Learning to Align and Translate, Bahdanau et al., ICLR 2015. [paper][code]
  • On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, Cho et al., arxiv 2014. [paper]
  • Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Cho et al., arxiv 2014. [paper]
  • [fairseq][fairseq2][fairscale][pytorch-seq2seq]

3. Pretraining

3.1 Large Language Model

3.2 LLM Application

3.2.1 AI Agent
3.2.2 Academic
3.2.3 Code
3.2.4 Financial Application
3.2.5 Information Retrieval
3.2.6 Math
3.2.7 Medicine and Law
3.2.8 Recommend System
3.2.9 Tool Learning

3.3 LLM Technique

3.3.1 Alignment
3.3.2 Context Length
3.3.3 Corpus
3.3.4 Evaluation
3.3.5 Hallucination
  • Extrinsic Hallucinations in LLMs, Lilian Weng, 2024. [blog][hallucination-leaderboard]
  • Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models, Zhang et al., arxiv 2023. [paper][code]
  • A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions, Huang et al., arxiv 2023. [paper][code][Awesome-MLLM-Hallucination]
  • The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models, Li et al., arxiv 2024. [paper][code]
  • FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios, Chem et al., arxiv 2023. [paper][code][OlympicArena][FActScore]
  • Chain-of-Verification Reduces Hallucination in Large Language Models, Dhuliawala et al., arxiv 2023. [paper][code]
  • HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models, Guan et al., CVPR 2024. [paper][code][LRM-FactEval]
  • Woodpecker: Hallucination Correction for Multimodal Large Language Models, Yin et al., arxiv 2023. [paper][code]
  • OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation, Huang et al., CVPR 2024 Highlight. [paper][code]
  • TrustLLM: Trustworthiness in Large Language Models, Sun et al., arxiv 2024. [paper][code]
  • SAFE: Long-form factuality in large language models, Wei et al., arxiv 2024. [paper][code]
  • RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models, Hu et al., arxiv 2024. [paper][code][HaluAgent][LLMsKnow]
  • Detecting hallucinations in large language models using semantic entropy, Farquhar et al., Nature 2024. [paper][semantic_uncertainty][long_hallucinations][Semantic Uncertainty ICLR 2023][Lynx-hallucination-detection][hallucination_probes]
  • A Survey on the Honesty of Large Language Models, Li et al., arxiv 2024. [paper][code]
  • LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations, Orgad et al., arxiv 2024. [paper][code]
  • Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers, Bouchard and Chauhan, arxiv 2025. [paper][code]
  • Why language models hallucinate, OpenAI, 2025. [blog][paper]
3.3.6 Inference
3.3.7 MoE
3.3.8 PEFT (Parameter-efficient Fine-tuning)
3.3.9 Prompt Learning
3.3.10 RAG (Retrieval Augmented Generation)
Text Embedding
3.3.11 Reasoning and Planning
Survey
3.3.12 Continual Learning

3.4 LLM Theory

3.5 Chinese Model


CV

  • CS231n: Deep Learning for Computer Vision [link]

1. Basic for CV

  • AlexNet: ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al., NIPS 2012. [paper][AlexNet-Source-Code]
  • VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan et al., ICLR 2015. [paper]
  • GoogLeNet: Going Deeper with Convolutions, Szegedy et al., CVPR 2015. [paper]
  • ResNet: Deep Residual Learning for Image Recognition, He et al., CVPR 2016 Best Paper. [paper][code][resnet_inference.py]
  • DenseNet: Densely Connected Convolutional Networks, Huang et al., CVPR 2017 Oral. [paper][code]
  • EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Tan et al., ICML 2019. [paper][code][EfficientNet-PyTorch][noisystudent]
  • BYOL: Bootstrap your own latent: A new approach to self-supervised Learning, Grill et al., arxiv 2020. [paper][code][byol-pytorch][simsiam]
  • ConvNeXt: A ConvNet for the 2020s, Liu et al., CVPR 2022. [paper][code][ConvNeXt-V2]

2. Contrastive Learning

  • MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, He et al., CVPR 2020. [paper][code]

  • SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, Chen et al., PMLR 2020. [paper][code]

  • CoCa: Contrastive Captioners are Image-Text Foundation Models, Yu et al., arxiv 2024. [paper][CoCa-pytorch][multimodal]

  • DINOv2: Learning Robust Visual Features without Supervision, Oquab et al., arxiv 2023. [paper][code][DINOv3]

  • FeatUp: A Model-Agnostic Framework for Features at Any Resolution, Fu et al., ICLR 2024. [paper][code]

  • InfoNCE Loss: Representation Learning with Contrastive Predictive Coding, Oord et al., arxiv 2018. [paper][unofficial code]

3. CV Application

4. Foundation Model

5. Generative Model (GAN and VAE)

6. Image Editing

  • InstructPix2Pix: Learning to Follow Image Editing Instructions, Brooks et al., CVPR 2023 Highlight. [paper][code]

  • Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold, Pan et al., SIGGRAPH 2023. [paper][code]

  • DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing, Shi et al., arxiv 2023. [paper][code]

  • DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models, Mou et al., ICLR 2024 Spolight. [paper][code]

  • DragAnything: Motion Control for Anything using Entity Representation, Wu et al., ECCV 2024. [paper][code][Framer][SG-I2V][Go-with-the-Flow]

  • LEDITS++: Limitless Image Editing using Text-to-Image Models, Brack et al., arxiv 2023. [paper][code][demo]

  • Diffusion Model-Based Image Editing: A Survey, Huang et al., arxiv 2024. [paper][code]

  • MGIE: Guiding Instruction-based Image Editing via Multimodal Large Language Models, Fu et al., ICLR 2024 Spotlight. [paper][code]

  • PromptFix: You Prompt and We Fix the Photo, Yu et al., NeurIPS 2024. [paper][code]

  • MimicBrush: Zero-shot Image Editing with Reference Imitation, Chen et al., arxiv 2024. [paper][code][EchoMimic][echomimic_v2][echomimic_v3][BrushNet]

  • A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models, Shuai et al., arxiv 2024. [paper][code]

  • Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models, Atzmon et al., arxiv 2024. [paper]

  • MagicQuill: An Intelligent Interactive Image Editing System, Liu et al., CVPR 2025. [paper][code]

  • BrushEdit: All-In-One Image Inpainting and Editing, Li et al., arxiv 2024. [paper][code][DiffuEraser][PhotoDoodle][VideoPainter]

  • Step1X-Edit: A Practical Framework for General Image Editing, StepFun, arxiv 2025. [paper][code][SuperEdit][ICEdit][ImgEdit]

  • SeedEdit 3.0: Fast and High-Quality Generative Image Editing, Wang et al., arxiv 2025. [paper]

  • EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling, Luo et al., arxiv 2025. [paper][code]

  • [EditAnything][ComfyUI-UltraEdit-ZHO][libcom][Awesome-Image-Composition][RF-Solver-Edit][KV-Edit][HiDream-E1][VAREdit][Awesome-Image-Editing]

7. Object Detection

8. Semantic Segmentation

9. Video

10. Survey for CV

  • ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy, Vishniakov et al., arxiv 2023. [paper][code]
  • Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey, Xin et al., arxiv 2024. [paper][code]

Multimodal

1. Audio

2. Blip

  • ALBEF: Align before Fuse: Vision and Language Representation Learning with Momentum Distillation, Li et al., NeurIPS 2021. [paper][code]
  • BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation, Li et al., ICML 2022. [paper][code][laion-coco]
  • BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, Li et al., ICML 2023. [paper][code]
  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning, Dai et al., arxiv 2023. [paper][code]
  • X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning, Panagopoulou et al., arxiv 2023. [paper][code]
  • xGen-MM (BLIP-3): A Family of Open Large Multimodal Models, Xue et al., arxiv 2024. [paper][code]
  • xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations, Qin et al., arxiv 2024. [paper][code]
  • xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs, Ryoo et al., arxiv 2024. [paper]
  • BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset, Chen et al., arxiv 2025. [paper][code]
  • LAVIS: A Library for Language-Vision Intelligence, Li et al., arxiv 2022. [paper][code]
  • VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts, Bao et al., NeurIPS 2022. [paper][code]
  • BEiT: BERT Pre-Training of Image Transformers, Bao et al., ICLR 2022 Oral presentation. [paper][code]
  • BeiT-V3: Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks, Wang et al., CVPR 2023. [paper][code]

3. Clip

4. Diffusion Model

5. Multimodal LLM

6. Text2Image

7. Text2Video

8. Survey for Multimodal

9. Other

  • Fuyu-8B: A Multimodal Architecture for AI Agents Bavishi et al., Adept blog 2023. [blog][model]
  • Otter: A Multi-Modal Model with In-Context Instruction Tuning, Li et al., arxiv 2023. [paper][code]
  • OtterHD: A High-Resolution Multi-modality Model, Li et al., arxiv 2023. [paper][code][model]
  • CM3leon: Scaling Autoregressive Multi-Modal Models_Pretraining and Instruction Tuning, Yu et al., arxiv 2023. [paper][Unofficial Implementation]
  • MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer, Tian et al., arxiv 2024. [paper][code]
  • CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations, Qi et al., arxiv 2024. [paper][code]
  • SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models, Gao et al., arxiv 2024. [paper][code]
  • Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers, Gao et al., arxiv 2024. [paper][code][Lumina-Image-2.0][Lumina-DiMOO]
  • Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining, Liu et al., arxiv 2024. [paper][code][Lumina-Video][Lumina-mGPT 2.0]
  • LWM: World Model on Million-Length Video And Language With RingAttention, Liu et al., arxiv 2024. [paper][code]
  • Chameleon: Mixed-Modal Early-Fusion Foundation Models, Chameleon Team, arxiv 2024. [paper][code][X-Prompt]
  • *SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation, Ge et al., arxiv 2024. [paper][code][SEED][SEED-Story][SEED-Bench-R1][AnimeGamer]

Reinforcement Learning

1.Basic for RL

2. LLM for decision making

  • Decision Transformer_Reinforcement Learning via Sequence Modeling, Chen et al., NeurIPS 2021. [paper][code]
  • Trajectory Transformer: Offline Reinforcement Learning as One Big Sequence Modeling Problem, Janner et al., NeurIPS 2021. [paper][code]
  • Guiding Pretraining in Reinforcement Learning with Large Language Models, Du et al., ICML 2023. [paper][code]
  • Introspective Tips: Large Language Model for In-Context Decision Making, Chen et al., arxiv 2023. [paper]
  • Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, Chebotar et al., CoRL 2023. [paper][Unofficial Implementation]
  • Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods, Cao et al., arxiv 2024. [paper]

GNN

Survey for GNN


Transformer Architecture