NVIDIA-NeMo
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎resources_servers/mcqa/README.md‎
Lines changed: 82 additions & 48 deletions b/‎resources_servers/mcqa/README.md‎
Lines changed: 82 additions & 48 deletions
diff --git a/‎resources_servers/mcqa/app.py‎
Lines changed: 86 additions & 31 deletions b/‎resources_servers/mcqa/app.py‎
Lines changed: 86 additions & 31 deletions
diff --git a/‎resources_servers/mcqa/configs/mcqa.yaml‎
Lines changed: 3 additions & 0 deletions b/‎resources_servers/mcqa/configs/mcqa.yaml‎
Lines changed: 3 additions & 0 deletions
@@ -240,4 +240,6 @@ outputs
 
 # Environment with sensitive information like API keys
 env.yaml
+
+# Backup files
 *.backup
@@ -85,7 +85,7 @@ NeMo Gym includes a curated collection of resource servers for training and eval
 | instruction_following | Instruction Following | <a href='resources_servers/instruction_following/configs/instruction_following.yaml'>resources_servers/instruction_following/configs/instruction_following.yaml</a> | Apache 2.0                                                | Train, Example             |
 | instruction_following | Multineedle           | <a href='resources_servers/multineedle/configs/multineedle.yaml'>resources_servers/multineedle/configs/multineedle.yaml</a>                                         | Apache 2.0                                                | Train, Validation, Example |
 | knowledge             | Equivalence Llm Judge | <a href='resources_servers/equivalence_llm_judge/configs/equivalence_llm_judge.yaml'>resources_servers/equivalence_llm_judge/configs/equivalence_llm_judge.yaml</a> | None                                                      | Example, Example           |
-| knowledge             | Mcqa                  | <a href='resources_servers/mcqa/configs/mcqa.yaml'>resources_servers/mcqa/configs/mcqa.yaml</a>                                                                     | Apache 2.0                                                | Train, Example             |
+| knowledge             | Mcqa                  | <a href='resources_servers/mcqa/configs/mcqa.yaml'>resources_servers/mcqa/configs/mcqa.yaml</a>                                                                     | Apache 2.0                                                | Train, Example, Example    |
 | math                  | Library Judge Math    | <a href='resources_servers/library_judge_math/configs/bytedtsinghua_dapo17k.yaml'>resources_servers/library_judge_math/configs/bytedtsinghua_dapo17k.yaml</a>       | Apache 2.0                                                | Train, Validation          |
 | math                  | Library Judge Math    | <a href='resources_servers/library_judge_math/configs/dapo17k.yaml'>resources_servers/library_judge_math/configs/dapo17k.yaml</a>                                   | Apache 2.0                                                | Train, Validation          |
 | math                  | Library Judge Math    | <a href='resources_servers/library_judge_math/configs/library_judge_math.yaml'>resources_servers/library_judge_math/configs/library_judge_math.yaml</a>             | Creative Commons Attribution 4.0 International            | Train, Validation, Example |
 
@@ -5,17 +5,21 @@ Verifies multiple-choice QA (MCQA) model outputs.
 It consumes agent trajectories and returns a reward based on whether the assistant’s final output matches the gold answer.
 
 ### Input schema
+Required fields:
 - `responses_create_params`: OpenAI Responses create params
-  - Use only a user message with the question and options (e.g., “A: … B: …”).
-  - `metadata` (dataset format):
-    - `options` (required): list of dicts mapping a single letter to the option text, e.g. `[{"A": "Option_Text"}, {"B": "..."}]`.
-    - `expected_answer` (required): the gold letter (single character). Must be one of the letters present in `metadata.options`.
-    - `prompt_type` (required): must be `"mcqa"`.
+  - Use only a user message with the question and options (e.g., "A: … B: …").
+- `options` (required): List of dicts mapping a single letter to option text, e.g. `[{"A": "Option_Text"}, {"B": "..."}]`
+- `expected_answer` (required): The gold letter (single character). Must be one of the letters present in `options`
 
-Notes
-- Letters are validated against the keys present in `metadata.options`.
-- While most datasets use A–D, any letter set is supported as long as it matches the provided options.
-- Legacy support: top-level `options` and `expected_answer` are still accepted for backward compatibility, but the dataset format above is preferred.
+Optional fields:
+- `grading_mode`: Answer extraction method (default: `"strict_single_letter_boxed"`)
+- `template_metadata`: Custom regex pattern for answer extraction (see below)
+- `uuid`: Unique identifier for the question
+- `metadata`: Optional arbitrary metadata (not used for grading)
+
+Notes:
+- Letters are validated against the keys present in `options`
+- While most datasets use A–D, any letter set is supported as long as it matches the provided options
 
 ### Grading modes
 - `strict_single_letter_boxed` (default)
@@ -28,11 +32,31 @@ Notes
 - `lenient_answer_colon`
   - Extracts content after `Answer:` (case-insensitive).
   - If it is a single allowed letter, use it.
-  - Otherwise, if it exactly equals (after normalization) one option’s text, use that letter.
+  - Otherwise, if it exactly equals (after normalization) one option's text, use that letter.
   - Example: `options = [{"A": "Circle"}, {"B": "Square}]`. This will match `Answer: B` or `Answer: Square`.
     - Legacy from NeMo-RL
 
-### Example dataset row (dataset format)
+### Custom answer extraction (template_metadata) - OPTIONAL
+For datasets with custom prompt formats, you can optionally use `template_metadata` with a custom regex pattern.
+
+**Note:** If you don't need custom formats, see `data/example.jsonl` for standard usage with `grading_mode` only.
+
+- `template_metadata.output_regex`: Custom regex pattern to extract the answer letter
+  - **Optional field** - use only if you need custom answer formats
+  - Takes **priority** over `grading_mode` when present
+  - Case-insensitive matching (IGNORECASE flag)
+  - Uses rightmost (last) match if multiple matches exist
+  - Gracefully falls back to `grading_mode` if regex is invalid
+
+**Example formats supported:**
+- `"Option Selected: B"` → regex: `Option Selected:\s*([A-Za-z])`
+- `"Final Choice: C"` → regex: `Final Choice:\s*([A-Za-z])`
+- `"ANSWER IS D"` → regex: `ANSWER IS\s*([A-Za-z])`
+- `"Answer: B"` (plain) → regex: `Answer\s*:\s*(?!Answer)\s*([A-Za-z])`
+
+**Priority order:** `template_metadata.output_regex` (if present) → `grading_mode` (default)
+
+### Example dataset row (standard format)
 ```json
 {
     "responses_create_params":
@@ -41,51 +65,49 @@ Notes
         [
             {
                 "role": "user",
-                "content": "You should output your final response letter inside \\boxed{} and nothing else You can first think step-by-step. Which of the following genetic tests is used to identify the presence of a specific mutation associated with cystic fibrosis?\nA: Karyotyping\nB: Polymerase Chain Reaction (PCR)\nC: Whole-genome sequencing\nD: Chromosome painting\nE: Restriction Fragment Length Polymorphism (RFLP) analysis\nF: Southern blotting\nG: Microarray analysis\nH: Fluorescence in situ hybridization (FISH)\nI: Enzyme-linked immunosorbent assay (ELISA)\nJ: Methylation-specific PCR"
+                "content": "You should output your final response letter inside \\boxed{} and nothing else You can first think step-by-step. Which of the following genetic tests is used to identify the presence of a specific mutation associated with cystic fibrosis?\nA: Karyotyping\nB: Polymerase Chain Reaction (PCR)\n..."
             }
         ]
     },
-    "options":
-    [
-        {
-            "A": "Karyotyping"
-        },
-        {
-            "B": "Polymerase Chain Reaction (PCR)"
-        },
-        {
-            "C": "Whole-genome sequencing"
-        },
-        {
-            "D": "Chromosome painting"
-        },
-        {
-            "E": "Restriction Fragment Length Polymorphism (RFLP) analysis"
-        },
-        {
-            "F": "Southern blotting"
-        },
-        {
-            "G": "Microarray analysis"
-        },
-        {
-            "H": "Fluorescence in situ hybridization (FISH)"
-        },
-        {
-            "I": "Enzyme-linked immunosorbent assay (ELISA)"
-        },
-        {
-            "J": "Methylation-specific PCR"
-        }
-    ],
+    "options": [{"A": "Karyotyping"}, {"B": "Polymerase Chain Reaction (PCR)"}, ...],
     "expected_answer": "B",
     "grading_mode": "strict_single_letter_boxed",
     "uuid": "3c26f339-4b88-54be-b72a-e9c438ca6335"
 }
 ```
 
+### Example with template_metadata (custom format)
+```json
+{
+    "responses_create_params":
+    {
+        "input":
+        [
+            {
+                "role": "user",
+                "content": "Which genetic test identifies cystic fibrosis mutations?\nA: Karyotyping\nB: PCR\n...\n\nChoose the correct option.\nConclude with \"ANSWER IS X\" on the final line."
+            }
+        ]
+    },
+    "options": [{"A": "Karyotyping"}, {"B": "PCR"}, ...],
+    "expected_answer": "B",
+    "grading_mode": "strict_single_letter_boxed",
+    "template_metadata":
+    {
+        "output_regex": "ANSWER IS\\s*([A-Za-z])\\s*",
+        "template_id": "mcqa_generated_019",
+        "prompt_type": "generated",
+        "format_type": "mcqa"
+    },
+    "uuid": "eb07c826-fed5-57f8-bee6-bb29e099069d"
+}
+```
+
+**Note:** Example files in `data/example_with_template_metadata.jsonl` use simulated `reward_profiles` for demonstration purposes.
+
 ### Example of rollouts and usage
 
+**Standard format (with `grading_mode`):**
 ```bash
 config_paths="responses_api_agents/simple_agent/configs/simple_agent.yaml,\
 responses_api_models/openai_model/configs/openai_model.yaml,\
@@ -107,6 +129,16 @@ ng_collect_rollouts \
     +output_jsonl_fpath=data/MCQA_filtered_decontaminated_samples_rollouts.jsonl +limit=5
 ```
 
+**With template_metadata (custom regex):**
+```bash
+# Using example file with 5 different custom prompt formats
+ng_collect_rollouts \
+    +agent_name=simple_agent \
+    +input_jsonl_fpath=resources_servers/mcqa/data/example_with_template_metadata.jsonl \
+    +output_jsonl_fpath=resources_servers/mcqa/data/example_rollouts_with_template_metadata.jsonl \
+    +limit=5
+```
+
 Rollout example
 
 ```json
@@ -231,9 +263,11 @@ Rollout example
 ```
 
 ### Implementation notes
-- The server extracts the last assistant message’s text from the Responses output.
-- Letters are validated against the provided `metadata.options` keys (or legacy top-level if present).
-- For `lenient_boxed`, only boxed content is considered; it must match exactly one option’s text after normalization.
+- The server extracts the last assistant message's text from the Responses output.
+- Letters are validated against the provided `options` keys.
+- For `lenient_boxed`, only boxed content is considered; it must match exactly one option's text after normalization.
+- **template_metadata priority**: When `template_metadata.output_regex` is present, it takes priority over `grading_mode` for answer extraction.
+- **Backward compatibility**: Existing datasets without `template_metadata` continue to work using `grading_mode`.
 
 
 ## Licensing information
 
@@ -45,6 +45,8 @@ class MCQARunRequest(BaseRunRequest):
         "lenient_boxed",
         "lenient_answer_colon",
     ] = "strict_single_letter_boxed"
+    # Template metadata with custom regex support
+    template_metadata: Optional[dict[str, Any]] = None
 
 
 class MCQAVerifyRequest(MCQARunRequest, BaseVerifyRequest):
@@ -151,6 +153,54 @@ def _match_option_text(text: str, options: list[dict[str, str]], allowed_letters
     return None
 
 
+def _parse_answer_with_custom_regex(
+    text: str, regex_pattern: str, allowed_letters: set[str], options: Optional[list[dict[str, str]]]
+) -> Optional[str]:
+    """Parse answer using custom regex from template_metadata.
+
+    Uses rightmost (last) match to handle reasoning before final answer.
+    Case-insensitive matching to handle capitalization variations.
+
+    When using template_metadata with custom regex, we trust the regex pattern
+    and allow extracted letters even if options metadata is incomplete.
+    """
+    try:
+        # Use IGNORECASE flag and findall to get all matches
+        matches = re.findall(regex_pattern, text, re.IGNORECASE)
+        if not matches:
+            return None
+
+        # Take the LAST match (rightmost)
+        captured = matches[-1].strip().upper()
+
+        # Try direct letter match first
+        if len(captured) == 1 and captured.isalpha():
+            # If we have options metadata, validate against it
+            if allowed_letters and captured in allowed_letters:
+                return captured
+            # If options metadata is missing/incomplete, trust the regex
+            # This handles cases where template_metadata regex is used but options are incomplete
+            elif not allowed_letters:
+                return captured
+            # If captured letter is not in allowed_letters but allowed_letters exists,
+            # it might be a data quality issue - still return it when using template_metadata
+            else:
+                # Trust the regex when using template_metadata (this function is only called for template_metadata)
+                return captured
+
+        # Try matching against option text (normalized)
+        normalized_captured = _normalize_for_match(captured)
+        for entry in options or []:
+            for k, v in entry.items():
+                if k.upper() in allowed_letters and _normalize_for_match(v) == normalized_captured:
+                    return k.upper()
+
+        return None
+    except re.error:
+        # Invalid regex pattern, return None
+        return None
+
+
 class MCQAResourcesServer(SimpleResourcesServer):
     config: MCQAResourcesServerConfig
 
@@ -167,39 +217,44 @@ async def verify(self, body: MCQAVerifyRequest) -> MCQAVerifyResponse:
 
         pred: Optional[str] = None
 
-        if body.grading_mode == "strict_single_letter_boxed":
-            pred, _, _ = _parse_answer_letter_strict_boxed(text, allowed_letters)
-        elif body.grading_mode == "lenient_boxed":
-            # Try strict boxed first
-            pred, _, _ = _parse_answer_letter_strict_boxed(text, allowed_letters)
-            if pred is None:
-                # Then try to match option text inside boxed content only
-                letter_from_text = _match_option_text(text, options, allowed_letters)
-                if letter_from_text is not None:
-                    pred = letter_from_text
-        elif body.grading_mode == "lenient_answer_colon":
-            # Look for Answer: <...>
-            m = ANSWER_COLON_PATTERN.search(text)
-            if m:
-                candidate = _strip_latex_wrappers(m.group(1)).strip()
-                # Letter case
-                if len(candidate) == 1 and candidate.isalpha():
-                    letter_up = candidate.upper()
-                    if letter_up in allowed_letters:
-                        pred = letter_up
-                # Option text equality (normalized)
+        # Check for template_metadata first (highest priority)
+        if body.template_metadata and "output_regex" in body.template_metadata:
+            regex_pattern = body.template_metadata["output_regex"]
+            pred = _parse_answer_with_custom_regex(text, regex_pattern, allowed_letters, options)
+
+        # Fallback to existing grading_mode logic if template_metadata didn't work
+        if pred is None:
+            if body.grading_mode == "strict_single_letter_boxed":
+                pred, _, _ = _parse_answer_letter_strict_boxed(text, allowed_letters)
+            elif body.grading_mode == "lenient_boxed":
+                # Try strict boxed first
+                pred, _, _ = _parse_answer_letter_strict_boxed(text, allowed_letters)
                 if pred is None:
-                    cand_norm = _normalize_for_match(candidate)
-                    for entry in options or []:
-                        for k, v in entry.items():
-                            k_up = k.upper()
-                            if k_up in allowed_letters and _normalize_for_match(v) == cand_norm:
-                                pred = k_up
+                    # Then try to match option text inside boxed content only
+                    letter_from_text = _match_option_text(text, options, allowed_letters)
+                    if letter_from_text is not None:
+                        pred = letter_from_text
+            elif body.grading_mode == "lenient_answer_colon":
+                # Look for Answer: <...>
+                m = ANSWER_COLON_PATTERN.search(text)
+                if m:
+                    candidate = _strip_latex_wrappers(m.group(1)).strip()
+                    # Letter case
+                    if len(candidate) == 1 and candidate.isalpha():
+                        letter_up = candidate.upper()
+                        if letter_up in allowed_letters:
+                            pred = letter_up
+                    # Option text equality (normalized)
+                    if pred is None:
+                        cand_norm = _normalize_for_match(candidate)
+                        for entry in options or []:
+                            for k, v in entry.items():
+                                k_up = k.upper()
+                                if k_up in allowed_letters and _normalize_for_match(v) == cand_norm:
+                                    pred = k_up
+                                    break
+                            if pred is not None:
                                 break
-                        if pred is not None:
-                            break
-        else:
-            pred = None
 
         gold = (expected_answer or "").strip().upper()
         is_correct = (pred == gold) if (pred is not None and gold) else False
 
@@ -25,3 +25,6 @@ mcqa_simple_agent:
       - name: example
         type: example
         jsonl_fpath: resources_servers/mcqa/data/example.jsonl
+      - name: example_with_template_metadata
+        type: example
+        jsonl_fpath: resources_servers/mcqa/data/example_with_template_metadata.jsonl