ChromeOS Dictation

Dictation is a ChromeOS accessibility feature that allows users to type and edit text with their voice.

Semantic parsing with Pumpkin

Background

Dictation utilizes a semantic parser called Pumpkin to extract meaning and intent out of text and allows us to turn recognized text into commands. To use Pumpkin in Dictation, a few steps had to be taken:

A Web Assembly port was created so that Pumpkin could be run in JavaScript. It currently lives in Google3. For the specific build rule for this, see this build file.
Pumpkin and associated config files take up roughly 5.9MB of space (estimate generated in December 2022). Adding this much memory overhead to rootfs was not a feasible option, so we added a DLC for Pumpkin so that it could be downloaded and used when needed. We added a script in Google3 that would quickly generate the DLC and upload it to Google Cloud Storage whenever it needed to be updated.
We added logic in Dictation that would initiate a download of the Pumpkin DLC. Dictation uses the chrome.accessibilityPrivate.installPumpkinForDictation() API to initiate the download. Once the DLC is downloaded, the AccessibilityManager reads the bytes of each Pumpkin file and sends them back to the Dictation extension. Lastly, the extension spins up a new sandboxed context to run pumpkin in.

Adding a new Pumpkin command

Update the Dictation semantic_tags.txt file with the tags that correspond to the new commands you’d like to add. All Dictation commands can be found in macro_names.js. See the “creating patterns files” section of the Dictation google3 documentation for more information on semantic tags.
Update the Pumpkin DLC according to the documentation.
Add the newest pumpkin-<version>.tar.xz file to the chromium codebase for testing purposes. A copy of the tar file should be placed in your root directory (e.g. ~/pumpkin-3.0.tar.xz) if you followed the documentation above.
Add tests for the new commands to dictation_pumpkin_parse_test.js. These should fail initially.
Make your tests pass by connecting the new commands to pumpkin_parse_strategy.js.
(Optional, but strongly recommended) Add tests to dictation_browsertest.cc to get end-to-end test coverage.

Note: It's important that we never remove semantic tags from the Pumpkin DLC because we want to avoid backwards compatibility issues (we never want to regress any commands).