blob: 3beaa1e5e86cb685d858927bf961bad49eb1362c [file] [log] [blame] [view]
# ChromeOS FaceGaze
FaceGaze (publicly named "Face control") is a ChromeOS accessibility feature
that allows users to control the cursor with their head and perform various
actions using facial gestures.
## Summary
### User flow
FaceGaze can be enabled either in the accessibility quick settings menu or in
the ChromeOS settings app under the route Accessibility > Cursor and touchpad >
Face control. Once FaceGaze is enabled, the face recognition model and backing
web assembly will be downloaded via DLC (downloadable content). When the
download succeeds, the face model gets initialized and the webcam is turned on.
The user can then move the cursor with their head and perform actions with
facial gestures. When recognized, gestures and their associated actions will be
posted to the FaceGaze bubble UI, which is a floating UI component positioned at
the top of the display.
FaceGaze has several actions that temporarily put FaceGaze into a different
state. Examples include enter/exit scroll mode, start/end long click, pause/
resume FaceGaze, and start/stop Dictation. When scroll mode is active, for
example, head movements will not move the mouse but instead be used to determine
a scroll direction. When FaceGaze is in an alternate state, it will be
communicated via the bubble UI.
Note that if the DLC download fails, FaceGaze will automatically turn off and a
notification will be shown with a failure message.
### Technical overview
FaceGaze is implemented primarily as a Chrome extension in TypeScript. It also
has a few browser-side components (DLC hook and APIs), as well as ash-side
components (bubble UI). The high-level components of the feature are:
1. The Chrome extension, which is where most of the logic lives
2. A hook in the extension to connect to the device's webcam
3. An ML model, called [FaceLandmarker](https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker),
which processes video frames and returns results containing the location of all
relevant face points, confidences for facial gestures, and the amount of head
rotation. This is the technology that makes FaceGaze possible.
4. Extension APIs to update the cursor position, send synthetic mouse and key
events, and interact with the FaceGaze bubble in the browser (among other
things)
5. The ash-side implementation for the bubble UI
6. Settings page implementation, where users can configure their cursor
settings and update their gesture-to-action bindings
Once FaceGaze is initialized, here's a high-level flow of how it responds to a
single camera frame:
1. FaceGaze will grab the latest frame from the webcam feed
2. The frame is forwarded to the FaceLandmarker, which returns a raw result with
face points, gesture confidences, and head rotation
3. FaceGaze will further interpret this result and convert facial gestures to
actions (called "macros" in the code) depending on the user's preferences
4. FaceGaze will update the mouse location, perform actions, and update the floating bubble UI
5. The above process is repeated many times per second to give the user a
feeling of responsiveness, e.g. mouse movement responds quickly to head movement
As mentioned above, FaceGaze utilizes a [DLC](https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/chromiumos-overlay/app-accessibility/facegaze-assets/)
to supply the FaceLandmarker model and the backing web assembly.
### Accessing the webcam feed
FaceGaze utilizes the [webRTC API](https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API),
specifically the [ImageCapture API](https://developer.mozilla.org/en-US/docs/Web/API/ImageCapture)
to grab video frames and pass them to the FaceLandmarker model.
## Code structure
The majority of FaceGaze code lives in the [facegaze/](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/resources/chromeos/accessibility/accessibility_common/facegaze/) extension directory. Settings code lives in
[chrome/browser/resources/ash/settings/os_a11y_page](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/resources/ash/settings/os_a11y_page/).
Code for the bubble UI lives in [ash/system/accessibility](https://source.chromium.org/chromium/chromium/src/+/main:ash/system/accessibility/).
### FaceGaze extension classes
The `facegaze/` extension directory contains several noteworthy classes:
* `FaceGaze`, which is the main object. It handles setup/teardown, interacts
with APIs like chrome.settingsPrivate, and owns the other essential classes.
* `WebCamFaceLandmarker`, which requests the DLC download, initializes the
FaceLandmarker API, starts the webcam, continually passes frames from the video
stream into the FaceLandmarker while the video stream is active, and returns
results to the main `FaceGaze` object.
* `GestureDetector`, which computes which gestures were detected, filtering
out those with low confidence scores. It also transforms raw gestures into
ones supported by FaceGaze; for example, FaceGaze doesn't support "blink left
eye" and "blink right eye" individually. Instead, it supports a compound
"blink eyes" gesture.
* `GestureHandler`, which does additional processing of FaceLandmarker results
and converts recognized gestures into executable macros.
* `MouseController`, which similarly processes FaceLandmarker results to convert
recognized face points and rotation into a new cursor location. This class also
contains logic to smooth cursor movement so that the user gets natural cursor
movements instead of jumpy cursor movements.
* `ScrollModeController`, which gives users scroll functionality with FaceGaze.
* `BubbleController`, which controls all interaction with the FaceGaze bubble
UI.
### FaceGaze ash-side classes
* `FaceGazeBubbleController` manages the FaceGaze UI from ash and provides an
entry point for updating/changing the UI.
* `FaceGazeBubbleView` is the actual implementation of the FaceGaze UI.
### FaceGaze browser-side classes
* `AccessibilityManager` contains logic for setting up/tearing down the
extension, forwarding requests and results for DLC downloads, and showing
notifications to the user.
* `AccessibilityDlcInstaller` performs the install of the facegaze-assets DLC
and passes the contents through to the extension.
* `DragEventRewriter` is a common class that helps implement drag and drop for
Autoclick and FaceGaze. While the class is active, all mouse movement events
will be rewritten into mouse drag events.
### FaceGaze settings
TODO
### Testing
* See the `facegaze/` extension directory for all extension tests.
* See [facegaze_browsertest.cc](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/ash/accessibility/facegaze_browsertest.cc)
for C++ integration tests. Note that these tests hook into a JavaScript class
called `FaceGazeTestSupport` and allows the C++ tests to execute JavaScript or
wait for information to propagate to the extension side before continuing.
[facegaze_test_utils.cc](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/ash/accessibility/facegaze_test_utils.cc)
contains test support for writing tests.
* See [facegaze.go](https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/go.chromium.org/tast-tests/cros/local/a11y/facegaze/facegaze.go)
which provides infrastructure for FaceGaze in tast. Also see [idle_perf.go](https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/go.chromium.org/tast-tests/cros/local/bundles/cros/ui/idle_perf.go),
which runs FaceGaze idly for ten minutes and collects performance metrics across
many different types of physical devices.