human-feedback

Here are 17 public repositories matching this topic...

lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

reinforcement-learning deep-learning transformers artificial-intelligence attention-mechanisms human-feedback

Updated Oct 11, 2025
Python

conceptofmind / LaMDA-rlhf-pytorch

Star

Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.

machine-learning reinforcement-learning deep-learning transformers artificial-intelligence attention-mechanism human-feedback

Updated Feb 24, 2024
Python

yk7333 / d3po

Star

[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"

reinforcement-learning diffusion-models human-feedback

Updated Apr 6, 2024
Python

wxjiao / ParroT

Star

The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.

machine-translation llama lora contrastive gpt-4 chatgpt human-feedback instruction-tuning bloomz error-guided

Updated Dec 31, 2024
Python

trubrics / trubrics-python

Star

Product analytics for AI Assistants

machine-learning mlops streamlit ml-monitoring llm human-feedback llmops model-feedback

Updated May 26, 2025
Python

JD-GenX / Reliable_AD

Star

[ECCV2024] Towards Reliable Advertising Image Generation Using Human Feedback

image-generation advertising datasets diffusion diffusion-models diffusers human-feedback rlhf eccv2024

Updated Nov 8, 2024
Python

davidberenstein1957 / dataset-viber

Star

Dataset Viber is your chill repo for data collection, annotation and vibe checks.

evaluation data-collection data-quality human-feedback

Updated Sep 5, 2024
Python

gao-g / prelude

Star

Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".

transformers alignment user-feedback edits interpretability preference-learning gpt4 llm llms human-feedback

Updated Nov 23, 2024
Python

ZiyiZhang27 / tdpo

Star

[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"

reinforcement-learning alignment text-to-image diffusion-models stable-diffusion human-feedback rlhf

Updated Jul 12, 2024
Python

AlaaLab / pathologist-in-the-loop

Star

[ NeurIPS 2023 ] Official Codebase for "Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback"

synthetic-data human-feedback rlhf pathology-images

Updated Oct 19, 2023
Python

wang8740 / MAP

Star

Documentation at

finetuning llm human-feedback rlhf human-value-alignment multi-objective-alignment

Updated Mar 27, 2025
Python

victor-iyi / rlhf-trl

Star

Reinforcement Learning from Human Feedback with 🤗 TRL

reinforcment-learning human-feedback rlhf

Updated Jun 14, 2023
Python

RapidataAI / crowd-eval

Star

Break out of the AI training bubble

machine-learning asyncronous wandb human-feedback training-loop crowd-evaluation rapidata

Updated Jul 15, 2025
Python

CogniSeeker / REBCAT

Star

REactive Behavior Constraint-Aware Tree learning (REBCAT) - a human-robot collaboration framework to learn task from demonstrations. Interpretable, fast, object-centric, and reactive.

behavior-trees decision-tree-classifier learning-from-demonstration task-planning interpretable-ai reactive-systems object-properties catboost-classifier human-feedback manipulation-tasks action-ordering

Updated May 29, 2025
Python

Yousifus / rlhf_loop_humain

Star

RLHF Loop System - Learning project with monitoring dashboard, drift detection, and AI feedback analysis built with Claude's assistance

python machine-learning typescript reinforcement-learning ai dashboard transformers openai model-calibration batch-processing bert real-time-monitoring drift-detection streamlit openai-api human-feedback rlhf lmstudio deepseek

Updated Jul 10, 2025
Python

sunwang-ai-linguist / bilingual-rlhf-semantic-repair-corpus

Star

Daily Mandarin-English semantic alignment corpus for RLHF training, tone repair, AI metaphor translation, and OpenAI contributor tracking. #SamPickMe #RLHF #TSMC

openai bilingual bilingual-corpora crosslingual semantic-alignment crosslingual-transfer chatgpt human-feedback rlhf sam-altman gpt-training tone-correction

Updated May 22, 2025
Python

Dylsimple60 / RLHF_learn

Star

🤖 Enhance reinforcement learning stability and efficiency with advanced algorithms like TRPO, PPO, DPO, GRPO, DAPO, and GSPO for optimized policy training.

Updated Jan 9, 2026
Python

Improve this page

Add a description, image, and links to the human-feedback topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the human-feedback topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

human-feedback

Here are 17 public repositories matching this topic...

lucidrains / PaLM-rlhf-pytorch

conceptofmind / LaMDA-rlhf-pytorch

yk7333 / d3po

wxjiao / ParroT

trubrics / trubrics-python

JD-GenX / Reliable_AD

davidberenstein1957 / dataset-viber

gao-g / prelude

ZiyiZhang27 / tdpo

AlaaLab / pathologist-in-the-loop

wang8740 / MAP

victor-iyi / rlhf-trl

RapidataAI / crowd-eval

CogniSeeker / REBCAT

Yousifus / rlhf_loop_humain

sunwang-ai-linguist / bilingual-rlhf-semantic-repair-corpus

Dylsimple60 / RLHF_learn

Improve this page

Add this topic to your repo