From 273614e6f4e0b0c013c51c40c1740f1606503780 Mon Sep 17 00:00:00 2001 From: Corby Rosset Date: Mon, 20 Apr 2026 18:13:41 -0400 Subject: [PATCH] Update WebTailBench results and success rate definition Added revised WebTailBench numbers and clarified process success definition. --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 282a8ff..aad5098 100644 --- a/README.md +++ b/README.md @@ -152,6 +152,8 @@ We are releasing **[WebTailBench](https://huggingface.co/datasets/microsoft/WebT #### WebTailBench Results (Process / Outcome) +UPDATE: we release revised WebTailBench numbers from the new Universal Verifier below, reporting both process- and outcome-based success rate. Process sucess is defined by whether the assessed rubric score is at least 80%. + | Task Segment | Tasks | SoM GPT-5 | SoM o3 | SoM 4o | GLM-4.1V-9B | OAI Comp-Use | UI-TARS-1.5 | **Fara-7B** | |----------------|-------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------| | **Single-Site Tasks** |