mirror of
https://github.com/microsoft/fara.git
synced 2026-06-10 02:54:01 +08:00
Refine Fara-7B description in README
Removed redundancy in the description of Fara-7B's visual operation.
This commit is contained in:
@@ -49,8 +49,7 @@ Notes:
|
||||
|
||||
Unlike traditional chat models that generate text-based responses, Fara-7B leverages computer interfaces—mouse and keyboard—to perform multi-step tasks on behalf of users. The model:
|
||||
|
||||
- **Operates visually** by perceiving webpages and taking actions like scrolling, typing, and clicking on directly predicted coordinates
|
||||
- **Uses the same modalities as humans** to interact with computers—no accessibility trees or separate parsing models required
|
||||
- **Operates visually** by perceiving webpages and taking actions like scrolling, typing, and clicking on directly predicted coordinates without accessibility trees or separate parsing models
|
||||
- **Enables on-device deployment** due to its compact 7B parameter size, resulting in reduced latency and improved privacy as user data remains local
|
||||
- **Completes tasks efficiently**, averaging only ~16 steps per task compared to ~41 for comparable models
|
||||
|
||||
|
||||
Reference in New Issue
Block a user