Microsoft-fara

mirror of https://github.com/microsoft/fara.git synced 2026-06-10 02:54:01 +08:00

Files

corby 2ec67d7236 Add WebTailBench rubric comparison visualizer

Standalone HTML page that shows, for each WebTailBench task, the
rubric criteria produced by three different judge configurations
side by side:

  1. O4-Mini Rubric                    — historical baseline
  2. GPT-5 (v1)                        — original GPT-5 judge
  3. Universal Verifier Rubric (GPT-5.2) — current release

Tasks are grouped by WebTailBench benchmark category, with
incremental search across ids / summaries / criteria, and an
"all three rubrics only" toggle. The header links directly to the
microsoft/WebTailBench dataset on Hugging Face and to the
WebTailBench-v1-rubrics.tsv download so readers can grab the
underlying data.

Source layout:
  docs/webtailbench_rubric_comparison.html  (single self-contained file)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-16 21:12:47 -07:00

webtailbench_rubric_comparison.html

Add WebTailBench rubric comparison visualizer

2026-04-16 21:12:47 -07:00