commit | f7993a1383df3a6c5e8249bc55631982d6b280d6 | [log] [tgz] |
---|---|---|
author | Mike Wasserman <msw@chromium.org> | Tue Feb 25 23:12:18 2025 |
committer | Chromium LUCI CQ <chromium-scoped@luci-project-accounts.iam.gserviceaccount.com> | Tue Feb 25 23:12:18 2025 |
tree | 13c1c445d3a33286f9aa92fc2adf363f3733728f | |
parent | 53d584dd1fd3e4ac379ee74992cf8570df837b6b [diff] |
Prompt API: Implement an expedient multimodal vision prototype Bypasses optimization guide to use on-device model service directly. This short-term implementation expedites a time-sensitive dev trial. Enables ai.languageModel.prompt({type: 'image', content: image}): i = document.getElementsByTagName('img'); s = await window.ai.languageModel.create(); r = await s.prompt(['describe this image', {type: 'image', content: i[0]}]); Pass prompt[Streaming]() input via on_device_model::mojom::Input. Invokes on_device_model::mojom::Session::Execute directly, with flag. Adds a basic unit test. TODO: Add basic WPTs. A more correct approach is in development; see crrev.com/c/6231611. Builds on crrev.com/c/6086800 and prior+parallel prototyping efforts. Credit to cduvall@chromium.org for big fixes in crrev.com/c/6253876. Caution: This VERY ROUGH WIP prototype has quirks and known issues! - Only a subset of JS ImageBitmapSource input types are supported - Lacks history integration, token counting, overflow handling, etc. - Does not support concurrent requests, multiple images, some errors. Test locally on a device compatible with existing Prompt-API usage: 1) Run a Chrome-branded build, with flags and a test user-data-dir $ chrome --no-sandbox --user-data-dir=/tmp/foo --enable-features=OptimizationGuideOnDeviceModel:on_device_model_image_input/true --enable-blink-features=AIPromptAPIMultimodalInput 2) Trigger model download and init via `ai.languageModel.create()` 3) Verify chrome://component "Optimization Guide..." is up-to-date 4) Quit & replace Chrome's model with a compatible test model: $ mv tmp/foo/OptGuideOnDeviceModel/2024.9.25.2033/weights.bin tmp/foo/OptGuideOnDeviceModel/2024.9.25.2033/weights.bin.OLD $ cp ~/Downloads/vision_model.tflite tmp/foo/OptGuideOnDeviceModel/2024.9.25.2033/weights.bin 5) Restart Chrome; try multimodal prompt() inputs per Explainer: https://github.com/webmachinelearning/prompt-api Bug: 385173789, 385173368 Change-Id: Ibc75d6777df6c31eed608bcfd9134458da4ce136 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6246232 Mega-CQ: Mike Wasserman <msw@chromium.org> Reviewed-by: Steven Holte <holte@chromium.org> Commit-Queue: Mike Wasserman <msw@chromium.org> Reviewed-by: Brad Triebwasser <btriebw@chromium.org> Cr-Commit-Position: refs/heads/main@{#1424824}
Chromium is an open-source browser project that aims to build a safer, faster, and more stable way for all users to experience the web.
The project's web site is https://www.chromium.org.
To check out the source code locally, don't use git clone
! Instead, follow the instructions on how to get the code.
Documentation in the source is rooted in docs/README.md.
Learn how to Get Around the Chromium Source Code Directory Structure.
For historical reasons, there are some small top level directories. Now the guidance is that new top level directories are for product (e.g. Chrome, Android WebView, Ash). Even if these products have multiple executables, the code should be in subdirectories of the product.
If you found a bug, please file it at https://crbug.com/new.