Microsoft-fara

AI/Microsoft-fara

Fork 0

mirror of https://github.com/microsoft/fara.git synced 2026-06-10 02:54:01 +08:00

Commit Graph

Author	SHA1	Message	Date
corby	2ec67d7236	Add WebTailBench rubric comparison visualizer Standalone HTML page that shows, for each WebTailBench task, the rubric criteria produced by three different judge configurations side by side: 1. O4-Mini Rubric — historical baseline 2. GPT-5 (v1) — original GPT-5 judge 3. Universal Verifier Rubric (GPT-5.2) — current release Tasks are grouped by WebTailBench benchmark category, with incremental search across ids / summaries / criteria, and an "all three rubrics only" toggle. The header links directly to the microsoft/WebTailBench dataset on Hugging Face and to the WebTailBench-v1-rubrics.tsv download so readers can grab the underlying data. Source layout: docs/webtailbench_rubric_comparison.html (single self-contained file) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 21:12:47 -07:00

Author

SHA1

Message

Date

corby

2ec67d7236

Add WebTailBench rubric comparison visualizer

Standalone HTML page that shows, for each WebTailBench task, the
rubric criteria produced by three different judge configurations
side by side:

  1. O4-Mini Rubric                    — historical baseline
  2. GPT-5 (v1)                        — original GPT-5 judge
  3. Universal Verifier Rubric (GPT-5.2) — current release

Tasks are grouped by WebTailBench benchmark category, with
incremental search across ids / summaries / criteria, and an
"all three rubrics only" toggle. The header links directly to the
microsoft/WebTailBench dataset on Hugging Face and to the
WebTailBench-v1-rubrics.tsv download so readers can grab the
underlying data.

Source layout:
  docs/webtailbench_rubric_comparison.html  (single self-contained file)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-16 21:12:47 -07:00

1 Commits