Prompt API: Implement an expedient multimodal vision prototype

Bypasses optimization guide to use on-device model service directly.
This short-term implementation expedites a time-sensitive dev trial.

Enables ai.languageModel.prompt({type: 'image', content: image}):
  i = document.getElementsByTagName('img');
  s = await window.ai.languageModel.create();
  r = await s.prompt(['describe this image',
                      {type: 'image', content: i[0]}]);

Pass prompt[Streaming]() input via on_device_model::mojom::Input.
Invokes on_device_model::mojom::Session::Execute directly, with flag.
Adds a basic unit test. TODO: Add basic WPTs.

A more correct approach is in development; see crrev.com/c/6231611.
Builds on crrev.com/c/6086800 and prior+parallel prototyping efforts.
Credit to cduvall@chromium.org for big fixes in crrev.com/c/6253876.

Caution: This VERY ROUGH WIP prototype has quirks and known issues!
- Only a subset of JS ImageBitmapSource input types are supported
- Lacks history integration, token counting, overflow handling, etc.
- Does not support concurrent requests, multiple images, some errors.

Test locally on a device compatible with existing Prompt-API usage:
1) Run a Chrome-branded build, with flags and a test user-data-dir
$ chrome --no-sandbox --user-data-dir=/tmp/foo --enable-features=OptimizationGuideOnDeviceModel:on_device_model_image_input/true --enable-blink-features=AIPromptAPIMultimodalInput
2) Trigger model download and init via `ai.languageModel.create()`
3) Verify chrome://component "Optimization Guide..." is up-to-date
4) Quit & replace Chrome's model with a compatible test model:
  $ mv tmp/foo/OptGuideOnDeviceModel/2024.9.25.2033/weights.bin tmp/foo/OptGuideOnDeviceModel/2024.9.25.2033/weights.bin.OLD
  $ cp ~/Downloads/vision_model.tflite tmp/foo/OptGuideOnDeviceModel/2024.9.25.2033/weights.bin
5) Restart Chrome; try multimodal prompt() inputs per Explainer:
   https://github.com/webmachinelearning/prompt-api

Bug: 385173789, 385173368
Change-Id: Ibc75d6777df6c31eed608bcfd9134458da4ce136
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6246232
Mega-CQ: Mike Wasserman <msw@chromium.org>
Reviewed-by: Steven Holte <holte@chromium.org>
Commit-Queue: Mike Wasserman <msw@chromium.org>
Reviewed-by: Brad Triebwasser <btriebw@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1424824}
10 files changed
tree: 13c1c445d3a33286f9aa92fc2adf363f3733728f
  1. android_webview/
  2. apps/
  3. ash/
  4. base/
  5. build/
  6. build_overrides/
  7. buildtools/
  8. cc/
  9. chrome/
  10. chromecast/
  11. chromeos/
  12. codelabs/
  13. components/
  14. content/
  15. crypto/
  16. dbus/
  17. device/
  18. docs/
  19. extensions/
  20. fuchsia_web/
  21. gin/
  22. google_apis/
  23. gpu/
  24. headless/
  25. infra/
  26. ios/
  27. ipc/
  28. media/
  29. mojo/
  30. native_client_sdk/
  31. net/
  32. pdf/
  33. ppapi/
  34. printing/
  35. remoting/
  36. rlz/
  37. sandbox/
  38. services/
  39. skia/
  40. sql/
  41. storage/
  42. styleguide/
  43. testing/
  44. third_party/
  45. tools/
  46. ui/
  47. url/
  48. webkit/
  49. .clang-format
  50. .clang-tidy
  51. .clangd
  52. .git-blame-ignore-revs
  53. .gitallowed
  54. .gitattributes
  55. .gitignore
  56. .gitmodules
  57. .gn
  58. .mailmap
  59. .rustfmt.toml
  60. .vpython3
  61. .yapfignore
  62. ATL_OWNERS
  63. AUTHORS
  64. BUILD.gn
  65. CODE_OF_CONDUCT.md
  66. codereview.settings
  67. CPPLINT.cfg
  68. CRYPTO_OWNERS
  69. DEPS
  70. DIR_METADATA
  71. LICENSE
  72. LICENSE.chromium_os
  73. OWNERS
  74. PRESUBMIT.py
  75. PRESUBMIT_test.py
  76. PRESUBMIT_test_mocks.py
  77. README.md
  78. WATCHLISTS
README.md

Logo Chromium

Chromium is an open-source browser project that aims to build a safer, faster, and more stable way for all users to experience the web.

The project's web site is https://www.chromium.org.

To check out the source code locally, don't use git clone! Instead, follow the instructions on how to get the code.

Documentation in the source is rooted in docs/README.md.

Learn how to Get Around the Chromium Source Code Directory Structure.

For historical reasons, there are some small top level directories. Now the guidance is that new top level directories are for product (e.g. Chrome, Android WebView, Ash). Even if these products have multiple executables, the code should be in subdirectories of the product.

If you found a bug, please file it at https://crbug.com/new.