| # Security Guidelines for LLMs and other large models in Chrome |
| |
| Large language models (LLMs), generative artificial intelligence (GenAI) models, |
| and other large machine learning (ML) models will find uses in Chromium and the |
| web. We will refer to all of these as _models_. This document outlines some |
| guidelines to help safely implement features using large models. |
| |
| Our main security goals are to prevent arbitrary code execution, and prevent |
| user information disclosure between origins. It is not possible to prevent |
| people using Chrome from seeing model weights or predictions as this is not |
| feasible on the client devices where Chrome runs. |
| |
| # Memory Safety |
| |
| Models are, abstractly, layers of mathematical operations that mix inputs from |
| trustworthy and untrustworthy sources and produce output that will be used |
| elsewhere in Chrome. In practice these models are implemented in memory-unsafe |
| languages and may include convenience functions to parse complex data formats as |
| part of their pipelines. They should be treated the same way as other |
| memory-unsafe code implementing a feature in Chrome to comply with the |
| [rule-of-2](rule-of-2.md). Models processing untrustworthy complex data must be |
| sandboxed, and data should be provided using safe types. |
| |
| ## Complex formats |
| |
| Models processing complex data -- such as images, audio or video -- could be |
| implemented using format helpers in their pipelines. To ensure memory safety any |
| parsing of complex formats should happen in a sandboxed, site-isolated process. |
| Either by sandboxing the model, or by parsing complex formats into accepted safe |
| formats before sending them to the process hosting the model. |
| |
| ### Exception - Tokenization |
| |
| Where the only function of the model is to tokenize a string of text before |
| performing inference to produce an output this is not considered to be complex |
| processing. |
| |
| ## Untrustworthy input -> untrustworthy output |
| |
| If an attacker can control any input to a model it must be assumed that they can |
| control all of its output. Models cannot be used to sanitize data, and their |
| output must be treated as untrustworthy content with an untrustworthy format. |
| |
| Model output will either need to be parsed in a sandboxed process, or limited to |
| only outputting safe types (e.g. an array of floats). |
| |
| ## Mitigations |
| |
| Models exposed to untrustworthy input can reduce the risk of exposing memory |
| safety flaws. |
| |
| * Use a tight sandbox |
| * Provide model inputs over safe mojo types |
| * Validate the size and format of input |
| * Use a pipeline that only tokenizes then performs inference |
| * Ensure input is in the same format as training data |
| * Disable custom ops that might parse complex formatted data |
| * Limit the size of the model output |
| * Fuzz exposed APIs |
| |
| # Side-Channels |
| |
| Large models will necessarily be reused for several purposes. Where this happens |
| it is important that appropriate sessionization is used. It is likely that side |
| channels will exist that could leak some information about previous inputs. |
| |
| # Model APIs |
| |
| Models themselves are complex formats that represent complex graphs of |
| computation. APIs that allow web sites to specify and run models should be |
| designed so that these graphs and model inputs can be provided safely. Model |
| hosting should be managed by a trusted process to ensure only the right set of |
| operations can be reached by an untrustworthy model. |
| |
| If a model's provenance can be verified (such as with Chrome's Component |
| Updater) then we can assume it is as safe as other Chrome code. This means that |
| where it runs is determined by what the model does, and the safety of the data |
| it consumes. Googlers should refer to internal guidelines for approved delivery |
| mechanisms in Chrome (go/tf-security-in-chrome, |
| go/chrome-genai-security-prompts). |
| |
| # Other safety considerations |
| |
| Models can output very convincing text. They may be used to summarize important |
| information (e.g. translating a legal form), or to produce writing for people |
| using Chrome (e.g. a letter to a bank). Models can produce incorrect output even |
| if they are not being deliberately steered to do so. People using Chrome should |
| have obvious indications that model output is being used, information about the |
| source of its inputs, and opportunity to review any text generated on their |
| behalf before it is submitted to a third party. |
| |
| Models may output inappropriate material and where possible their output should |
| be filtered using reasonable safety filters and people should have mechanisms to |
| report and improve model outputs. |
| |
| Model weights trained from on-device data may embody information about a person |
| using Chrome and should be treated like other sensitive data. |