This repository contains a curated list of papers and datasets that are devoted to research on Goal-oriented Prompt Engineering.
For more details, please refer to our survey paper Towards Goal-oriented Prompt Engineering for Large Language Models: A Survey.
Our list covers a wide range of applications, including Arithmetic Reasoning, Commonsense Reasoning, Symbolic Reasoning, Logical Reasoning, Planning in Virtual/Real Environment, Multihop Question Answering, Open-domain Question Answering, Code Generation, Dialogue, and Recommendation.
Fig.1 An overview of the goal-oriented framework for prompting LLMs taking solving a math word problem as an example. (1) Decomposing goal into sub-goal sequences. (2) Action selection for attaining sub-goals. (3) Executing actions to get sub-goal results. (4) Evaluating sub-goal results. (5) Further selection of valuable sub-goals. Note that stages (2)(3)(4) are taken for all the decomposed sub-goals.
Large Language Models (LLMs) have shown prominent performance in various downstream tasks and prompt engineering plays a pivotal role in optimizing LLMs' performance. This paper, not only as an overview of current prompt engineering methods, but also aims to highlight the limitation of designing prompts based on an anthropomorphic assumption that expects LLMs to think like humans. From our review of 50 representative studies, we demonstrate that a goal-oriented prompt formulation, which guides LLMs to follow established human logical thinking, significantly improves the performance of LLMs. Furthermore, We introduce a novel taxonomy that categorizes goal-oriented prompting methods into five interconnected stages and we demonstrate the broad applicability of our framework. With four future directions proposed, we hope to further emphasize the power and potential of goal-oriented prompt engineering in all fields.
Please feel free to send a pull request to add papers and relevant content that are not listed here.
- CoT - Chain of thought prompting elicits reasoning in large language models
- Zero-shot Planner - Language models as zero-shot planners: Extracting actionable knowledge for embodied agents
- Self-consistency - Self-consistency improves chain of thought reasoning in language models
- Least-to-most Prompting - Least-to-most prompting enables complex reasoning in large language models
- Selection-Inference - Selection-inference: Exploiting large language models for interpretable logical reasoning
- DecomP - Decomposed prompting: A modular approach for solving complex tasks
- Self-ask - Measuring and narrowing the compositionality gap in language models
- Zero-shot CoT - Large Language Models are Zero-Shot Reasoners
- Program of Thoughts - Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks
- Successive Prompting - Successive prompting for decomposing complex questions
- Self-refine - Self-refine: Iterative refinement with self-feedback
- Reflexion - Reflexion: an autonomous agent with dynamic memory and self-reflection
- MCR - Answering questions by meta-reasoning over multiple chains of thought
- LLM+P - Llm+p: Empowering large language models with optimal planning proficiency
- PEARL - PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents
- Plan-and-solve - Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models
- ToT - Tree of thoughts: Deliberate problem solving with large language models
- Toolformer - Toolformer: Language models can teach themselves to use tools
- MWP - Interpretable Math Word Problem Solution Generation Via Step-by-step Planning
- ProCoT - Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration
- GDP-Zero - Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning
- Self-debug - Teaching large language models to self-debug
- SayPlan - Sayplan: Grounding large language models using 3d scene graphs for scalable task planning
- DEPS - Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents
- GITM - Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory
- Re-prompting - Planning with large language models via corrective re-prompting
- HuggingGPT - HuggingGPT: Solving ai tasks with ChatGPT and its friends in huggingface
- Recmind - Recmind: Large language model powered agent for recommendation
- GoT - Graph of thoughts: Solving elaborate problems with large language models
- SALP - Generating executable action plans with environmentally-aware language models
- RAP - Reasoning with language model is planning with world model
- SelfCheck - SelfCheck: Using LLMs to zero-shot check their own step-by-step reasoning
- RLP - Reflective linguistic programming (rlp): A stepping stone in socially-aware agi (socialagi)
- Inner Monologue - Inner monologue: Embodied reasoning through planning with language models
- LLM-Planner - LLM-Planner: Few-shot grounded planning for embodied agents with large language models
- INTERVENOR - INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair
- DOKE - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations
- InteRecAgent - Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
- Faithful CoT - Faithful Chain-of-Thought Reasoning
- RoT - Recursion of Thought: A Divide-and-Conquer Approach to Multi-Context Reasoning with Language Models
- MathPrompter - MathPrompter: Mathematical Reasoning using Large Language Models
- PAL - PAL: Program-aided Language Models
- LINC - LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers
- Logical-LM - Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning
- REFINER - REFINER: Reasoning Feedback on Intermediate Representations
- CRITIC - CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
- Verify-and-Edit - Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework
- MAF - MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models
- Cue-CoT - Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs
- SAFARI - Large Language Models as Source Planner for Personalized Knowledge-grounded Dialogues
- GSM8K - Training Verifiers to Solve Math Word Problems
- SVAMP - Are NLP Models really able to Solve Simple Math Word Problems?
- ASDiv - A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers
- AQuA - Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
- MAWPS - MAWPS: A Math Word Problem Repository
- AddSub - Learning to solve arithmetic word problems with verb categorization
- MultiArith - Solving general arithmetic word problems
- DROP - DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
- TabMWP - Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning
- FinQA - FinQA: A Dataset of Numerical Reasoning over Financial Data
- ConvFinQA - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering
- TATQA - TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance
- SingleEq - Parsing algebraic word problems into equations
- MathQA - MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
- Game of 24 - Tree of thoughts: Deliberate problem solving with large language models
- CommonsenseQA - CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
- StrategyQA - Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
- BIG-bench - Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
- SayCan - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
- AI2 Reasoning Challenge - Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
- TEMPLAMA - Time-Aware Language Models as Temporal Knowledge Bases
- SQuAD - SQuAD: 100,000+ Questions for Machine Comprehension of Text
- Google-RE - Language Models as Knowledge Bases?
- T-Rex - T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples
- Last Letter Concatenation - Chain-of-thought prompting elicits reasoning in large language models
- Coin Flip - Chain-of-thought prompting elicits reasoning in large language models
- K-th Letter Concatenation - Decomposed prompting: A modular approach for solving complex tasks
- bABI - Towards ai-complete question answering: A set of prerequisite toy tasks
- ProofWriter - ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language
- List Reversal - Decomposed prompting: A modular approach for solving complex tasks
- PrOntoQA - Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
- FOLIO - FOLIO: Natural Language Reasoning with First-Order Logic
- VirtualHome - Virtualhome: Simulating household activities via programs
- ALFWorld - ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
- Blocksworld, Barman, Floortile, Grippers, Storage, Termes, Tyreworld - LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
- Home - SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning
- Office - SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning
- Minecraft - Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory
- Simulated/Real Tabletop Rearrangement - Inner monologue: Embodied reasoning through planning with language models
- Mobile Manipulator in a Kitchen Setting - Inner monologue: Embodied reasoning through planning with language models
- CommaQA - Hey AI, can you solve complex tasks by talking to agents?
- 2WikiMultihopQA - Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps
- MuSiQue - MuSiQue: Multihop Questions via Single-hop Question Composition
- HotpotQA - HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
- Bamboogle - Measuring and Narrowing the Compositionality Gap in Language Models
- FERMI - How much coffee was consumed during EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI
- QuaRTz - QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions
- FEVER - FEVER: a large-scale dataset for Fact Extraction and VERification
- QuALITY QA - QuALITY: Question Answering with Long Input Texts, Yes!
- Web Questions - Semantic Parsing on Freebase from Question-Answer Pairs
- Natural Questions - Natural Questions: a Benchmark for Question Answering Research
- TriviaQA - TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
- MLQA - MLQA: Evaluating Cross-lingual Extractive Question Answering
- HumanEval - Evaluating Large Language Models Trained on Code
- MBPP - Program Synthesis with Large Language Models
- LeetcodeHard - Reflexion: Language Agents with Verbal Reinforcement Learning
- Spider - Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task
- TransCoder - Unsupervised translation of programming languages
- PIE - Learning performance-improving code edits
- Abg-coqa - Abg-coqa: Clarifying ambiguity in conversational question answering
- PACIFIC - PACIFIC: towards proactive conversational question answering over tabular and textual data in finance
- OTTers - OTTers: One-turn Topic Transitions for Open-Domain Dialogue
- TGConv - TopKG: Target-oriented Dialog via Global Planning on Knowledge Graph
- CraigslistBargain - Decoupling Strategy and Generation in Negotiation Dialogues
- PersuationForGood - Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good
- FED - Unsupervised Evaluation of Interactive Dialog with DialoGPT
- WebShop - Webshop: Towards scalable real-world web interaction with grounded language agents
- Amazon Reviews - Justifying recommendations using distantly-labeled reviews and fine-grained aspects
- Yelp - Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
- Compositional Generalization - Least-to-most prompting enables complex reasoning in large language models
- Acronym Generation - Self-refine: Iterative refinement with self-feedback
- Sentiment Reversal - Self-refine: Iterative refinement with self-feedback
- Constrained Generation - Self-refine: Iterative refinement with self-feedback
- Mini Crosswords - Tree of thoughts: Deliberate problem solving with large language models
- Creative Writing - Tree of thoughts: Deliberate problem solving with large language models
- Sorting - Graph of thoughts: Solving elaborate problems with large language models
- Set Operations - Graph of thoughts: Solving elaborate problems with large language models
- Keyword Counting - Graph of thoughts: Solving elaborate problems with large language models
- Document Merging - Graph of thoughts: Solving elaborate problems with large language models
If you found this repository useful, please consider citing:
@article{li2024towards,
title={Towards Goal-oriented Large Language Model Prompting: A Survey},
author={Li, Haochen and Leung, Jonathan and Shen, Zhiqi},
journal={arXiv preprint arXiv:2401.14043},
year={2024}
}