docs/accessibility/os/facegaze.md - chromium/src - Git at Google

 # ChromeOS FaceGaze

 FaceGaze (publicly named "Face control") is a ChromeOS accessibility feature
 that allows users to control the cursor with their head and perform various
 actions using facial gestures.

 ## Summary

 ### User flow

 FaceGaze can be enabled either in the accessibility quick settings menu or in
 the ChromeOS settings app under the route Accessibility > Cursor and touchpad >
 Face control. Once FaceGaze is enabled, the face recognition model and backing
 web assembly will be downloaded via DLC (downloadable content). When the
 download succeeds, the face model gets initialized and the webcam is turned on.
 The user can then move the cursor with their head and perform actions with
 facial gestures. When recognized, gestures and their associated actions will be
 posted to the FaceGaze bubble UI, which is a floating UI component positioned at
 the top of the display.

 FaceGaze has several actions that temporarily put FaceGaze into a different
 state. Examples include enter/exit scroll mode, start/end long click, pause/
 resume FaceGaze, and start/stop Dictation. When scroll mode is active, for
 example, head movements will not move the mouse but instead be used to determine
 a scroll direction. When FaceGaze is in an alternate state, it will be
 communicated via the bubble UI.

 Note that if the DLC download fails, FaceGaze will automatically turn off and a
 notification will be shown with a failure message.

 ### Technical overview

 FaceGaze is implemented primarily as a Chrome extension in TypeScript. It also
 has a few browser-side components (DLC hook and APIs), as well as ash-side
 components (bubble UI). The high-level components of the feature are:

 1. The Chrome extension, which is where most of the logic lives
 2. A hook in the extension to connect to the device's webcam
 3. An ML model, called [FaceLandmarker](https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker),
 which processes video frames and returns results containing the location of all
 relevant face points, confidences for facial gestures, and the amount of head
 rotation. This is the technology that makes FaceGaze possible.
 4. Extension APIs to update the cursor position, send synthetic mouse and key
 events, and interact with the FaceGaze bubble in the browser (among other
 things)
 5. The ash-side implementation for the bubble UI
 6. Settings page implementation, where users can configure their cursor
 settings and update their gesture-to-action bindings

 Once FaceGaze is initialized, here's a high-level flow of how it responds to a
 single camera frame:

 1. FaceGaze will grab the latest frame from the webcam feed
 2. The frame is forwarded to the FaceLandmarker, which returns a raw result with
 face points, gesture confidences, and head rotation
 3. FaceGaze will further interpret this result and convert facial gestures to
 actions (called "macros" in the code) depending on the user's preferences
 4. FaceGaze will update the mouse location, perform actions, and update the floating bubble UI
 5. The above process is repeated many times per second to give the user a
 feeling of responsiveness, e.g. mouse movement responds quickly to head movement

 As mentioned above, FaceGaze utilizes a [DLC](https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/chromiumos-overlay/app-accessibility/facegaze-assets/)
 to supply the FaceLandmarker model and the backing web assembly.

 ### Accessing the webcam feed

 FaceGaze utilizes the [webRTC API](https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API),
 specifically the [ImageCapture API](https://developer.mozilla.org/en-US/docs/Web/API/ImageCapture)
 to grab video frames and pass them to the FaceLandmarker model.

 ## Code structure

 The majority of FaceGaze code lives in the [facegaze/](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/resources/chromeos/accessibility/accessibility_common/facegaze/) extension directory. Settings code lives in
 [chrome/browser/resources/ash/settings/os_a11y_page](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/resources/ash/settings/os_a11y_page/).
 Code for the bubble UI lives in [ash/system/accessibility](https://source.chromium.org/chromium/chromium/src/+/main:ash/system/accessibility/).

 ### FaceGaze extension classes

 The `facegaze/` extension directory contains several noteworthy classes:

 * `FaceGaze`, which is the main object. It handles setup/teardown, interacts
 with APIs like chrome.settingsPrivate, and owns the other essential classes.

 * `WebCamFaceLandmarker`, which requests the DLC download, initializes the
 FaceLandmarker API, starts the webcam, continually passes frames from the video
 stream into the FaceLandmarker while the video stream is active, and returns
 results to the main `FaceGaze` object.

 * `GestureDetector`, which computes which gestures were detected, filtering
 out those with low confidence scores. It also transforms raw gestures into
 ones supported by FaceGaze; for example, FaceGaze doesn't support "blink left
 eye" and "blink right eye" individually. Instead, it supports a compound
 "blink eyes" gesture.

 * `GestureHandler`, which does additional processing of FaceLandmarker results
 and converts recognized gestures into executable macros.

 * `MouseController`, which similarly processes FaceLandmarker results to convert
 recognized face points and rotation into a new cursor location. This class also
 contains logic to smooth cursor movement so that the user gets natural cursor
 movements instead of jumpy cursor movements.

 * `ScrollModeController`, which gives users scroll functionality with FaceGaze.

 * `BubbleController`, which controls all interaction with the FaceGaze bubble
 UI.

 ### FaceGaze ash-side classes

 * `FaceGazeBubbleController` manages the FaceGaze UI from ash and provides an
 entry point for updating/changing the UI.

 * `FaceGazeBubbleView` is the actual implementation of the FaceGaze UI.

 ### FaceGaze browser-side classes

 * `AccessibilityManager` contains logic for setting up/tearing down the
 extension, forwarding requests and results for DLC downloads, and showing
 notifications to the user.

 * `AccessibilityDlcInstaller` performs the install of the facegaze-assets DLC
 and passes the contents through to the extension.

 * `DragEventRewriter` is a common class that helps implement drag and drop for
 Autoclick and FaceGaze. While the class is active, all mouse movement events
 will be rewritten into mouse drag events.

 ### FaceGaze settings

 TODO

 ### Testing

 * See the `facegaze/` extension directory for all extension tests.

 * See [facegaze_browsertest.cc](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/ash/accessibility/facegaze_browsertest.cc)
 for C++ integration tests. Note that these tests hook into a JavaScript class
 called `FaceGazeTestSupport` and allows the C++ tests to execute JavaScript or
 wait for information to propagate to the extension side before continuing.
 [facegaze_test_utils.cc](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/ash/accessibility/facegaze_test_utils.cc)
 contains test support for writing tests.

 * See [facegaze.go](https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/go.chromium.org/tast-tests/cros/local/a11y/facegaze/facegaze.go)
 which provides infrastructure for FaceGaze in tast. Also see [idle_perf.go](https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/go.chromium.org/tast-tests/cros/local/bundles/cros/ui/idle_perf.go),
 which runs FaceGaze idly for ten minutes and collects performance metrics across
 many different types of physical devices.
	# ChromeOS FaceGaze

	FaceGaze (publicly named "Face control") is a ChromeOS accessibility feature
	that allows users to control the cursor with their head and perform various
	actions using facial gestures.

	## Summary

	### User flow

	FaceGaze can be enabled either in the accessibility quick settings menu or in
	the ChromeOS settings app under the route Accessibility > Cursor and touchpad >
	Face control. Once FaceGaze is enabled, the face recognition model and backing
	web assembly will be downloaded via DLC (downloadable content). When the
	download succeeds, the face model gets initialized and the webcam is turned on.
	The user can then move the cursor with their head and perform actions with
	facial gestures. When recognized, gestures and their associated actions will be
	posted to the FaceGaze bubble UI, which is a floating UI component positioned at
	the top of the display.

	FaceGaze has several actions that temporarily put FaceGaze into a different
	state. Examples include enter/exit scroll mode, start/end long click, pause/
	resume FaceGaze, and start/stop Dictation. When scroll mode is active, for
	example, head movements will not move the mouse but instead be used to determine
	a scroll direction. When FaceGaze is in an alternate state, it will be
	communicated via the bubble UI.

	Note that if the DLC download fails, FaceGaze will automatically turn off and a
	notification will be shown with a failure message.

	### Technical overview

	FaceGaze is implemented primarily as a Chrome extension in TypeScript. It also
	has a few browser-side components (DLC hook and APIs), as well as ash-side
	components (bubble UI). The high-level components of the feature are:

	1. The Chrome extension, which is where most of the logic lives
	2. A hook in the extension to connect to the device's webcam
	3. An ML model, called [FaceLandmarker](https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker),
	which processes video frames and returns results containing the location of all
	relevant face points, confidences for facial gestures, and the amount of head
	rotation. This is the technology that makes FaceGaze possible.
	4. Extension APIs to update the cursor position, send synthetic mouse and key
	events, and interact with the FaceGaze bubble in the browser (among other
	things)
	5. The ash-side implementation for the bubble UI
	6. Settings page implementation, where users can configure their cursor
	settings and update their gesture-to-action bindings

	Once FaceGaze is initialized, here's a high-level flow of how it responds to a
	single camera frame:

	1. FaceGaze will grab the latest frame from the webcam feed
	2. The frame is forwarded to the FaceLandmarker, which returns a raw result with
	face points, gesture confidences, and head rotation
	3. FaceGaze will further interpret this result and convert facial gestures to
	actions (called "macros" in the code) depending on the user's preferences
	4. FaceGaze will update the mouse location, perform actions, and update the floating bubble UI
	5. The above process is repeated many times per second to give the user a
	feeling of responsiveness, e.g. mouse movement responds quickly to head movement

	As mentioned above, FaceGaze utilizes a [DLC](https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/chromiumos-overlay/app-accessibility/facegaze-assets/)
	to supply the FaceLandmarker model and the backing web assembly.

	### Accessing the webcam feed

	FaceGaze utilizes the [webRTC API](https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API),
	specifically the [ImageCapture API](https://developer.mozilla.org/en-US/docs/Web/API/ImageCapture)
	to grab video frames and pass them to the FaceLandmarker model.

	## Code structure

	The majority of FaceGaze code lives in the [facegaze/](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/resources/chromeos/accessibility/accessibility_common/facegaze/) extension directory. Settings code lives in
	[chrome/browser/resources/ash/settings/os_a11y_page](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/resources/ash/settings/os_a11y_page/).
	Code for the bubble UI lives in [ash/system/accessibility](https://source.chromium.org/chromium/chromium/src/+/main:ash/system/accessibility/).

	### FaceGaze extension classes

	The `facegaze/` extension directory contains several noteworthy classes:

	* `FaceGaze`, which is the main object. It handles setup/teardown, interacts
	with APIs like chrome.settingsPrivate, and owns the other essential classes.

	* `WebCamFaceLandmarker`, which requests the DLC download, initializes the
	FaceLandmarker API, starts the webcam, continually passes frames from the video
	stream into the FaceLandmarker while the video stream is active, and returns
	results to the main `FaceGaze` object.

	* `GestureDetector`, which computes which gestures were detected, filtering
	out those with low confidence scores. It also transforms raw gestures into
	ones supported by FaceGaze; for example, FaceGaze doesn't support "blink left
	eye" and "blink right eye" individually. Instead, it supports a compound
	"blink eyes" gesture.

	* `GestureHandler`, which does additional processing of FaceLandmarker results
	and converts recognized gestures into executable macros.

	* `MouseController`, which similarly processes FaceLandmarker results to convert
	recognized face points and rotation into a new cursor location. This class also
	contains logic to smooth cursor movement so that the user gets natural cursor
	movements instead of jumpy cursor movements.

	* `ScrollModeController`, which gives users scroll functionality with FaceGaze.

	* `BubbleController`, which controls all interaction with the FaceGaze bubble
	UI.

	### FaceGaze ash-side classes

	* `FaceGazeBubbleController` manages the FaceGaze UI from ash and provides an
	entry point for updating/changing the UI.

	* `FaceGazeBubbleView` is the actual implementation of the FaceGaze UI.

	### FaceGaze browser-side classes

	* `AccessibilityManager` contains logic for setting up/tearing down the
	extension, forwarding requests and results for DLC downloads, and showing
	notifications to the user.

	* `AccessibilityDlcInstaller` performs the install of the facegaze-assets DLC
	and passes the contents through to the extension.

	* `DragEventRewriter` is a common class that helps implement drag and drop for
	Autoclick and FaceGaze. While the class is active, all mouse movement events
	will be rewritten into mouse drag events.

	### FaceGaze settings

	TODO

	### Testing

	* See the `facegaze/` extension directory for all extension tests.

	* See [facegaze_browsertest.cc](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/ash/accessibility/facegaze_browsertest.cc)
	for C++ integration tests. Note that these tests hook into a JavaScript class
	called `FaceGazeTestSupport` and allows the C++ tests to execute JavaScript or
	wait for information to propagate to the extension side before continuing.
	[facegaze_test_utils.cc](https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/ash/accessibility/facegaze_test_utils.cc)
	contains test support for writing tests.

	* See [facegaze.go](https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/go.chromium.org/tast-tests/cros/local/a11y/facegaze/facegaze.go)
	which provides infrastructure for FaceGaze in tast. Also see [idle_perf.go](https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/go.chromium.org/tast-tests/cros/local/bundles/cros/ui/idle_perf.go),
	which runs FaceGaze idly for ten minutes and collects performance metrics across
	many different types of physical devices.