Skip to content

Add difficulty-controlled Sample-N preference generation pipeline#209

Open
EMZEDI wants to merge 1 commit into
mainfrom
feat/sample-n-difficulty-pipeline
Open

Add difficulty-controlled Sample-N preference generation pipeline#209
EMZEDI wants to merge 1 commit into
mainfrom
feat/sample-n-difficulty-pipeline

Conversation

@EMZEDI
Copy link
Copy Markdown
Collaborator

@EMZEDI EMZEDI commented May 14, 2026

Summary

This PR implements an opt-in difficulty-controlled preference data generation pipeline for AIF-Gen using:

  • RLCD-style persona-conditioned candidate prompting
  • HelpSteer2-style rubric scoring
  • West-of-N style margin-based pair selection

The pipeline follows: Sample-N -> Score -> Select-by-Margin and introduces two primary knobs:

  • n_candidates (cost/coverage)
  • target_margin (0 hardest, 1 easiest)

Legacy behavior is preserved when no generation config is provided.

What was added

  • New pair selector module:

    • aif_gen/generate/mappers/pair_selector.py
    • ScoredCandidate dataclass
    • select_pair(...) with:
      • target margin control
      • min/max margin filters
      • length-ratio filter
      • persona-aware tie-break
  • Response mapper enhancements:

    • generate_candidate_prompt(...) with personas:
      • aligned
      • anti_aligned
      • neutral
    • default_persona_schedule(...)
    • exported persona constants in aif_gen/generate/mappers/__init__.py
  • Rubric judge additions in aif_gen/validation/llm_judge.py:

    • _get_rubric_judge_prompt(...)
    • _RubricResponse schema
    • _get_rubric_score(...)
    • weighted aggregation defaults:
      • preference_adherence: 0.6
      • objective_fidelity: 0.25
      • coherence: 0.15
  • Generation engine refactor (aif_gen/generate/engine.py):

    • generate_continual_dataset(..., generation_config=None)
    • new sampled path _generate_sample_sampled(...)
    • parallel candidate sampling + rubric scoring + margin selection
    • dedicated rubric cache index when sampled pipeline is enabled
    • legacy path unchanged when generation_config is absent
  • CLI wiring (aif_gen/cli/commands/generate.py):

    • added flags:
      • --n-candidates
      • --target-margin
    • reads optional YAML generation: block
    • CLI values override YAML values

Tests

  • Added:
    • test/test_generate/test_pair_selector.py
  • Extended:
    • test/test_generate/test_response_mapper.py

Validation run:

  • pytest test/ -x -q -> 197 passed

Notes

  • Existing generation configs remain backward compatible.
  • Sampled pipeline is opt-in via CLI flags or YAML generation: block.

Implement RLCD-style persona-conditioned candidate sampling, rubric-based scoring, and West-of-N margin pair selection.

- Add pair selector with margin/length constraints
- Add persona prompt generation and default schedule
- Add rubric judge prompt/schema/scoring helper
- Add opt-in generation_config path in engine
- Add CLI knobs (--n-candidates, --target-margin) with YAML generation block support
- Add/extend tests for selector and response mapper
- Preserve legacy generation behavior when generation config is absent
@Francodbalso Francodbalso self-requested a review May 15, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant