`miniwob.find-greatest` can be marked successful after selecting a non-maximum card

## Summary

There appears to be a success-threshold bug in the BrowserGym MiniWoB wrapper for `miniwob.find-greatest`.

The task text says:

```text
Find and pick the card with the greatest number, then press submit.
```

However, a run can be marked successful after selecting a card that is not the greatest card.

In the attached concrete run, the page had cards with values `4`, `7`, and `2`. The agent selected the card with value `2` and submitted. The final DOM still shows another card with value `7`, so this should not satisfy the task. The native evaluator nevertheless reports `success: true`.

## Why this happens

In MiniWoB++ `find-greatest.html`, selecting the correct card ends with reward `1.0`, but selecting the wrong card and submitting still ends the episode with a positive partial reward:

```js
if(userIndex === expectedIndex.toString()) core.endEpisode(1.0, true);
else core.endEpisode(0.1, true);
```

In BrowserGym's MiniWoB wrapper, `AbstractMiniwobTask.validate()` converts any positive raw reward into a full success reward:

```python
reward = float(info["RAW_REWARD_GLOBAL"] > 0)  # TODO: shouldn't it be 0.5?
```

As a result, the wrong-card partial reward `0.1` becomes `reward: 1.0` and `success: true` in the run output.

## Actual behavior in the attached run

The attached `evidence/native_evaluator_output_agent_a.json` shows:

```json
"RAW_REWARD_GLOBAL": 0.1,
"reward": 1.0,
"success": true
```

It also shows the agent actions:

```json
"action": "click('17')"
"action": "click('20')"
```

The attached final DOM snapshot `evidence/final_dom_step_002_agent_a.html` shows that bid `17` is the revealed selected card, with value `2`:

```html
<div class="card" data-index="2" ... bid="17">
  <span class="card-value" ...>2</span>
</div>
```

The same DOM line also shows another card with value `7`:

```html
<div class="card hidden" data-index="1" ... bid="15">
  <span class="card-value" ...>7</span>
</div>
```

So the submitted card was not the greatest card, but the run was still reported as successful.

## Expected behavior

For `find-greatest`, BrowserGym should only report success when the selected card is actually the greatest card.

Possible fixes:

- change the MiniWoB success threshold so partial positive rewards such as `0.1` do not count as success, for example `RAW_REWARD_GLOBAL > 0.5` or `RAW_REWARD_GLOBAL >= 1.0`; or
- add task-specific handling where partial rewards are not treated as benchmark success; or
- if the intended fix belongs in MiniWoB++, make wrong-card submission produce a non-success raw reward rather than `0.1`.

The issue seems most directly triggered by BrowserGym treating any `RAW_REWARD_GLOBAL > 0` as success.

## Attached evidence package

- `source/browsergym_miniwob_base.py`: BrowserGym MiniWoB wrapper showing `reward = float(info["RAW_REWARD_GLOBAL"] > 0)`.
- `source/find-greatest.html`: MiniWoB++ task source showing the task text and wrong-card `core.endEpisode(0.1, true)` branch.
- `evidence/native_evaluator_output_agent_a.json`: run output showing `RAW_REWARD_GLOBAL=0.1`, `reward=1.0`, and `success=true`.
- `evidence/final_dom_step_002_agent_a.html`: final DOM showing selected card value `2` while another card value is `7`.
- `evidence/final_screenshot_step_002_agent_a.png`: final screenshot for the same run.
- `evidence/validation_final_agent_a.json` and `evidence/task_info_final_agent_a.json`: final validator/task state.
- `evidence/miniwob_scores_flat.csv`: score table; row 170 contains this `miniwob.find-greatest / Agent A` run.

[find_greatest_agent_a_wrong_card_success_issue.zip](https://github.com/user-attachments/files/27905607/find_greatest_agent_a_wrong_card_success_issue.zip)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`miniwob.find-greatest` can be marked successful after selecting a non-maximum card #392

Summary

Why this happens

Actual behavior in the attached run

Expected behavior

Attached evidence package

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

miniwob.find-greatest can be marked successful after selecting a non-maximum card #392

Description

Summary

Why this happens

Actual behavior in the attached run

Expected behavior

Attached evidence package

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`miniwob.find-greatest` can be marked successful after selecting a non-maximum card #392