Refine Fara-7B description in README

Removed redundancy in the description of Fara-7B's visual operation.
This commit is contained in:
Hussein Mozannar
2025-11-28 14:30:31 -05:00
committed by GitHub
parent 21469308d6
commit ccdc3def6e

View File

@@ -49,8 +49,7 @@ Notes:
Unlike traditional chat models that generate text-based responses, Fara-7B leverages computer interfaces—mouse and keyboard—to perform multi-step tasks on behalf of users. The model:
- **Operates visually** by perceiving webpages and taking actions like scrolling, typing, and clicking on directly predicted coordinates
- **Uses the same modalities as humans** to interact with computers—no accessibility trees or separate parsing models required
- **Operates visually** by perceiving webpages and taking actions like scrolling, typing, and clicking on directly predicted coordinates without accessibility trees or separate parsing models
- **Enables on-device deployment** due to its compact 7B parameter size, resulting in reduced latency and improved privacy as user data remains local
- **Completes tasks efficiently**, averaging only ~16 steps per task compared to ~41 for comparable models