Goal-oriented Prompt Engineering

This repository contains a curated list of papers and datasets that are devoted to research on Goal-oriented Prompt Engineering.

For more details, please refer to our survey paper Towards Goal-oriented Prompt Engineering for Large Language Models: A Survey.

Our list covers a wide range of applications, including Arithmetic Reasoning, Commonsense Reasoning, Symbolic Reasoning, Logical Reasoning, Planning in Virtual/Real Environment, Multihop Question Answering, Open-domain Question Answering, Code Generation, Dialogue, and Recommendation.

Fig.1 An overview of the goal-oriented framework for prompting LLMs taking solving a math word problem as an example. (1) Decomposing goal into sub-goal sequences. (2) Action selection for attaining sub-goals. (3) Executing actions to get sub-goal results. (4) Evaluating sub-goal results. (5) Further selection of valuable sub-goals. Note that stages (2)(3)(4) are taken for all the decomposed sub-goals.

Large Language Models (LLMs) have shown prominent performance in various downstream tasks and prompt engineering plays a pivotal role in optimizing LLMs' performance. This paper, not only as an overview of current prompt engineering methods, but also aims to highlight the limitation of designing prompts based on an anthropomorphic assumption that expects LLMs to think like humans. From our review of 50 representative studies, we demonstrate that a goal-oriented prompt formulation, which guides LLMs to follow established human logical thinking, significantly improves the performance of LLMs. Furthermore, We introduce a novel taxonomy that categorizes goal-oriented prompting methods into five interconnected stages and we demonstrate the broad applicability of our framework. With four future directions proposed, we hope to further emphasize the power and potential of goal-oriented prompt engineering in all fields.

Please feel free to send a pull request to add papers and relevant content that are not listed here.

Papers

CoT - Chain of thought prompting elicits reasoning in large language models
Zero-shot Planner - Language models as zero-shot planners: Extracting actionable knowledge for embodied agents
Self-consistency - Self-consistency improves chain of thought reasoning in language models
Least-to-most Prompting - Least-to-most prompting enables complex reasoning in large language models
Selection-Inference - Selection-inference: Exploiting large language models for interpretable logical reasoning
DecomP - Decomposed prompting: A modular approach for solving complex tasks
Self-ask - Measuring and narrowing the compositionality gap in language models
Zero-shot CoT - Large Language Models are Zero-Shot Reasoners
Program of Thoughts - Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks
Successive Prompting - Successive prompting for decomposing complex questions
Self-refine - Self-refine: Iterative refinement with self-feedback
Reflexion - Reflexion: an autonomous agent with dynamic memory and self-reflection
MCR - Answering questions by meta-reasoning over multiple chains of thought
LLM+P - Llm+p: Empowering large language models with optimal planning proficiency
PEARL - PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents
Plan-and-solve - Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models
ToT - Tree of thoughts: Deliberate problem solving with large language models
Toolformer - Toolformer: Language models can teach themselves to use tools
MWP - Interpretable Math Word Problem Solution Generation Via Step-by-step Planning
ProCoT - Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration
GDP-Zero - Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning
Self-debug - Teaching large language models to self-debug
SayPlan - Sayplan: Grounding large language models using 3d scene graphs for scalable task planning
DEPS - Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents
GITM - Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory
Re-prompting - Planning with large language models via corrective re-prompting
HuggingGPT - HuggingGPT: Solving ai tasks with ChatGPT and its friends in huggingface
Recmind - Recmind: Large language model powered agent for recommendation
GoT - Graph of thoughts: Solving elaborate problems with large language models
SALP - Generating executable action plans with environmentally-aware language models
RAP - Reasoning with language model is planning with world model
SelfCheck - SelfCheck: Using LLMs to zero-shot check their own step-by-step reasoning
RLP - Reflective linguistic programming (rlp): A stepping stone in socially-aware agi (socialagi)
Inner Monologue - Inner monologue: Embodied reasoning through planning with language models
LLM-Planner - LLM-Planner: Few-shot grounded planning for embodied agents with large language models
INTERVENOR - INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair
DOKE - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations
InteRecAgent - Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
Faithful CoT - Faithful Chain-of-Thought Reasoning
RoT - Recursion of Thought: A Divide-and-Conquer Approach to Multi-Context Reasoning with Language Models
MathPrompter - MathPrompter: Mathematical Reasoning using Large Language Models
PAL - PAL: Program-aided Language Models
LINC - LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers
Logical-LM - Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning
REFINER - REFINER: Reasoning Feedback on Intermediate Representations
CRITIC - CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Verify-and-Edit - Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework
MAF - MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models
Cue-CoT - Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs
SAFARI - Large Language Models as Source Planner for Personalized Knowledge-grounded Dialogues

Tasks

Arithmetic Reasoning

GSM8K - Training Verifiers to Solve Math Word Problems
SVAMP - Are NLP Models really able to Solve Simple Math Word Problems?
ASDiv - A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers
AQuA - Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
MAWPS - MAWPS: A Math Word Problem Repository
AddSub - Learning to solve arithmetic word problems with verb categorization
MultiArith - Solving general arithmetic word problems
DROP - DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
TabMWP - Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning
FinQA - FinQA: A Dataset of Numerical Reasoning over Financial Data
ConvFinQA - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering
TATQA - TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance
SingleEq - Parsing algebraic word problems into equations
MathQA - MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
Game of 24 - Tree of thoughts: Deliberate problem solving with large language models

Commonsense Reasoning

CommonsenseQA - CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
StrategyQA - Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
BIG-bench - Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
SayCan - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
AI2 Reasoning Challenge - Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
TEMPLAMA - Time-Aware Language Models as Temporal Knowledge Bases
SQuAD - SQuAD: 100,000+ Questions for Machine Comprehension of Text
Google-RE - Language Models as Knowledge Bases?
T-Rex - T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples

Symbolic Reasoning

Last Letter Concatenation - Chain-of-thought prompting elicits reasoning in large language models
Coin Flip - Chain-of-thought prompting elicits reasoning in large language models
K-th Letter Concatenation - Decomposed prompting: A modular approach for solving complex tasks

Logical Reasoning

bABI - Towards ai-complete question answering: A set of prerequisite toy tasks
ProofWriter - ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language
List Reversal - Decomposed prompting: A modular approach for solving complex tasks
PrOntoQA - Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
FOLIO - FOLIO: Natural Language Reasoning with First-Order Logic

Planning

VirtualHome - Virtualhome: Simulating household activities via programs
ALFWorld - ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Blocksworld, Barman, Floortile, Grippers, Storage, Termes, Tyreworld - LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
Home - SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning
Office - SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning
Minecraft - Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory
Simulated/Real Tabletop Rearrangement - Inner monologue: Embodied reasoning through planning with language models
Mobile Manipulator in a Kitchen Setting - Inner monologue: Embodied reasoning through planning with language models

Multihop Question Answering

CommaQA - Hey AI, can you solve complex tasks by talking to agents？
2WikiMultihopQA - Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps
MuSiQue - MuSiQue: Multihop Questions via Single-hop Question Composition
HotpotQA - HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Bamboogle - Measuring and Narrowing the Compositionality Gap in Language Models
FERMI - How much coffee was consumed during EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI
QuaRTz - QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions
FEVER - FEVER: a large-scale dataset for Fact Extraction and VERification
QuALITY QA - QuALITY: Question Answering with Long Input Texts, Yes!

Open-domain Question Answering

Web Questions - Semantic Parsing on Freebase from Question-Answer Pairs
Natural Questions - Natural Questions: a Benchmark for Question Answering Research
TriviaQA - TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
MLQA - MLQA: Evaluating Cross-lingual Extractive Question Answering

Code Generation

HumanEval - Evaluating Large Language Models Trained on Code
MBPP - Program Synthesis with Large Language Models
LeetcodeHard - Reflexion: Language Agents with Verbal Reinforcement Learning
Spider - Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task
TransCoder - Unsupervised translation of programming languages
PIE - Learning performance-improving code edits

Dialogue

Abg-coqa - Abg-coqa: Clarifying ambiguity in conversational question answering
PACIFIC - PACIFIC: towards proactive conversational question answering over tabular and textual data in finance
OTTers - OTTers: One-turn Topic Transitions for Open-Domain Dialogue
TGConv - TopKG: Target-oriented Dialog via Global Planning on Knowledge Graph
CraigslistBargain - Decoupling Strategy and Generation in Negotiation Dialogues
PersuationForGood - Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good
FED - Unsupervised Evaluation of Interactive Dialog with DialoGPT

Recommendation

WebShop - Webshop: Towards scalable real-world web interaction with grounded language agents
Amazon Reviews - Justifying recommendations using distantly-labeled reviews and fine-grained aspects
Yelp - Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)

Misc

Compositional Generalization - Least-to-most prompting enables complex reasoning in large language models
Acronym Generation - Self-refine: Iterative refinement with self-feedback
Sentiment Reversal - Self-refine: Iterative refinement with self-feedback
Constrained Generation - Self-refine: Iterative refinement with self-feedback
Mini Crosswords - Tree of thoughts: Deliberate problem solving with large language models
Creative Writing - Tree of thoughts: Deliberate problem solving with large language models
Sorting - Graph of thoughts: Solving elaborate problems with large language models
Set Operations - Graph of thoughts: Solving elaborate problems with large language models
Keyword Counting - Graph of thoughts: Solving elaborate problems with large language models
Document Merging - Graph of thoughts: Solving elaborate problems with large language models

Citation

If you found this repository useful, please consider citing:

@article{li2024towards,
  title={Towards Goal-oriented Large Language Model Prompting: A Survey},
  author={Li, Haochen and Leung, Jonathan and Shen, Zhiqi},
  journal={arXiv preprint arXiv:2401.14043},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
Figures		Figures
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Goal-oriented Prompt Engineering

Content

Papers

Tasks

Arithmetic Reasoning

Commonsense Reasoning

Symbolic Reasoning

Logical Reasoning

Planning

Multihop Question Answering

Open-domain Question Answering

Code Generation

Dialogue

Recommendation

Misc

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Goal-oriented Prompt Engineering

Content

Papers

Tasks

Arithmetic Reasoning

Commonsense Reasoning

Symbolic Reasoning

Logical Reasoning

Planning

Multihop Question Answering

Open-domain Question Answering

Code Generation

Dialogue

Recommendation

Misc

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages