)]}'
{
  "log": [
    {
      "commit": "7d63684a681d242fb3939c4a7980ebd3fb532088",
      "tree": "3aeb388ae3e3fb448365634719f514169b4740ee",
      "parents": [
        "644944a85df5e1f232c69486c512eda9917cce01"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Tue Apr 28 17:58:29 2026"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Apr 28 18:01:55 2026"
      },
      "message": "Use virtual siso paths for prompt evals\n\nUpdates the prompt eval test runner to enable virtual paths in siso when\nrunning tests with btrfs. This works around an issue where compilation\nin individual tests would erroneously no-op due to the snapshotted\noutput directory referencing the original checkout\u0027s output directory due\nto the use of absolute paths within siso.\n\nAlso re-adds the fix_broken_test test to the list of stable tests since\nthis issue seems to have been the reason why it was failing more\nfrequently on Gemini 3 Flash. This issue was hit on the Pro models as\nwell, but those were more likely to work around the issue automatically\nby cleaning the output directory and recompiling.\n\nBug: b/502322531, b/504993568\nChange-Id: I07b0441155b80646489055a8c583d38d00cce5c6\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7796812\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1621905}\nGitOrigin-RevId: 2c2245f3ab549ec3cad8799459ba8d4daace3c9a\n"
    },
    {
      "commit": "644944a85df5e1f232c69486c512eda9917cce01",
      "tree": "17e27c9bd1303986a71bd5fa640ea025a6feea96",
      "parents": [
        "e75f14a85a9a26403babb0ecd912e826295b561c"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Wed Apr 15 20:12:41 2026"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Wed Apr 15 20:16:07 2026"
      },
      "message": "Switch prompt eval tests to Gemini 3 Flash\n\nUpdates the prompt eval tests to use Gemini 3 Flash instead of Gemini\n2.5 Pro. While the latter seems to perform better for these tests,\nGemini 2.5 is going away in the near future.\n\nIn order to help keep tests green until we can improve stability,\na couple of tests are moved into the unstable test category, as they\nfail much more frequently with Gemini 3 Flash.\n\nBug: b/502322531\nChange-Id: I63d6eec72269ca818e665f4d515c997ac7f20710\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7732272\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1615361}\nGitOrigin-RevId: e3e9cdeb183d35e9b63c586e56c0e88ccd1a8b9f\n"
    },
    {
      "commit": "e75f14a85a9a26403babb0ecd912e826295b561c",
      "tree": "708b7bb0936aa019e0b0fe290b544087ef870d70",
      "parents": [
        "dbd343f94898297d05452b3f68e361626851b3c3"
      ],
      "author": {
        "name": "Greg Thompson",
        "email": "grt@chromium.org",
        "time": "Fri Mar 06 21:50:43 2026"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Fri Mar 06 21:56:13 2026"
      },
      "message": "Support launching gemini.ps1 on Windows\n\nChange gemini_helpers.get_gemini_executable to get_gemini_command and\nteach it to search for gemini.ps1 in addition to .exe and friends. If\ngemini.ps1 is found in PATH, return a command that will launch it via\nPowerShell.\n\nThis allows commands such as:\n  \"vpython3 agents/extensions/install.py list\"\nto work on Windows when Gemini CLI is run via a .ps1 script in the\nuser\u0027s PATH.\n\nBug: 469100192\nChange-Id: I0220ec71f86295519a8fc499e26e36cdfeb60eed\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7633743\nReviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nAuto-Submit: Greg Thompson \u003cgrt@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1595675}\nGitOrigin-RevId: 0c765f74922a5930069a18b154dd484c217c4065\n"
    },
    {
      "commit": "dbd343f94898297d05452b3f68e361626851b3c3",
      "tree": "63b21cb06e17df6b7522ebec1932f83007db442c",
      "parents": [
        "2f460eef6e3bfcce31ee3478e41f14ccffd6a699"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Fri Mar 06 19:52:30 2026"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Fri Mar 06 19:59:45 2026"
      },
      "message": "Kill promptfoo process group\n\nUpdates the prompt eval test framework to kill the entire process group\nfor promptfoo after a test is complete. This is a speculative fix for\na flaky failure to delete the temporary home directory that is created\nfor each test. The current suspicion is that a child process started\nby promptfoo can racily write a file to that directory while it is\nbeing cleaned up, which causes cleanup to raise an exception.\n\nThere is an option to ignore errors during directory cleanup, but this\napproach should hopefully solve the underlying issue instead of working\naround the errors.\n\nBug: b/485634720\nChange-Id: I69d42d8aebdb6e188e757f1a473311704200e3fe\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7640009\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1595584}\nGitOrigin-RevId: 3fca9619e7334a8e86b17f1a0e0c9f5642a2ca26\n"
    },
    {
      "commit": "2f460eef6e3bfcce31ee3478e41f14ccffd6a699",
      "tree": "ef329badeaf3b8b975f53ec7f425da72386fe6a4",
      "parents": [
        "b116256749e519e20a2a3683f2e7f24d5c473675"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Tue Mar 03 23:57:17 2026"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Wed Mar 04 00:02:30 2026"
      },
      "message": "Enable prompt eval fetch error retries\n\nEnables general.retryFetchErrors in Gemini CLI when running prompt eval\ntests. These are indicative of a timeout when communicating with Gemini\nand are flakily happening on the CI builder. Since these are a general\nissue with Gemini rather than anything wrong with the tests, add retries\nto try to work around the issue.\n\nBug: b/485634720\nChange-Id: I2f6c2ac93e998651379f0175e63c3c58020f88f9\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7630159\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1593588}\nGitOrigin-RevId: 81e2abb820ac6075eea2498b6998626307b40943\n"
    },
    {
      "commit": "b116256749e519e20a2a3683f2e7f24d5c473675",
      "tree": "9e78963ad8eedeb0aafa7657a725e52709dc3a61",
      "parents": [
        "58141cfffb6ac67a8fb9ca2a2960cb23de4eed0d"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Thu Feb 26 20:45:59 2026"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Thu Feb 26 20:50:36 2026"
      },
      "message": "Switch prompt eval model to 2.5 pro\n\nSwitches the Gemini model used for prompt eval tests back to 2.5 Pro. 3\nFlash no longer runs into the capacity issues that 3.1 Pro had, but the\nmodel is incapable of performing all of the tasks for evals reliably.\n\nBug: b/485634720\nChange-Id: I38bc7d68e51c22fc951e1501d5e24d9ecaebb077\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7614507\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1591028}\nGitOrigin-RevId: 8595fe65de9f39503aa25b929260d45490044a3c\n"
    },
    {
      "commit": "58141cfffb6ac67a8fb9ca2a2960cb23de4eed0d",
      "tree": "f8056cae4ec655f9133dd4290772f7c7845ca222",
      "parents": [
        "5e0808cc8f574f2ef46bb5e6424a48f88561c3a7"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Thu Feb 26 00:42:37 2026"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Thu Feb 26 00:46:32 2026"
      },
      "message": "Change prompt eval model to flash\n\nChanges the model used for prompt eval tests from gemini-3.1-pro-preview\nto gemini-3-flash-preview. Gemini 3.1 seems to be severely over\ncapacity, causing the CI builder to regularly fail to run tests.\n\nBug: b/485634720\nChange-Id: I51c09b8dfd58ec5f8928e18f26f0a6218ebd45dd\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7610178\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1590513}\nGitOrigin-RevId: 34b934f0efac71c4ab699931c32b3766641e3802\n"
    },
    {
      "commit": "5e0808cc8f574f2ef46bb5e6424a48f88561c3a7",
      "tree": "c8883ee3a1a9dcd88bc384003b5f25e0c595c090",
      "parents": [
        "0b5fbfb8ca300afd89574eae9e3782c71e8ee910"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Wed Feb 25 20:01:50 2026"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Wed Feb 25 20:07:32 2026"
      },
      "message": "Update prompt eval model\n\nUpdates the model used for prompt eval tests from Gemini 2.5 to\nGemini 3.1 preview. The underlying issue that was preventing us from\nusing Gemini 3 models on the builders has been fixed.\n\nBug: b/485634720\nChange-Id: I59ba737f01c8d9985e450bd2235e000d3eed3b74\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7604619\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1590328}\nGitOrigin-RevId: 389366e1905ea8cef753bd238ed2c280b0024bd1\n"
    },
    {
      "commit": "0b5fbfb8ca300afd89574eae9e3782c71e8ee910",
      "tree": "5f7e874f5399fb5ab712d8c968a05a441288a104",
      "parents": [
        "500ecdad7276e31b68f7aa5d04aa5dde9696056f"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Fri Feb 20 01:15:40 2026"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Fri Feb 20 01:22:27 2026"
      },
      "message": "Specify model for prompt eval tests\n\nSpecifies gemini-2.5-pro for the model since the model that is being\nselected by default is running into issues on the builders.\n\nBug: 485634720\nChange-Id: I130288cee01e2b38fe04655eef81661b42b163a4\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7596574\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Erik Staab \u003cestaab@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1587537}\nGitOrigin-RevId: e17bc296e45a4f5993d52a644306175c1a011e23\n"
    },
    {
      "commit": "500ecdad7276e31b68f7aa5d04aa5dde9696056f",
      "tree": "17e338fb7a3e7b97be5e1ee0c07aed3df634ef7a",
      "parents": [
        "ca859ac09c899a694ecbfdc6b350830ebc23f4fc"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Thu Feb 19 22:35:45 2026"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Thu Feb 19 22:41:19 2026"
      },
      "message": "Update prompt eval dependencies\n\nUpdates the version of NodeJS and Gemini CLI used by the prompt eval\ntests. The GCLI version that was used was sufficiently old that the\nrequested model name no longer existed.\n\nAs a side effect, the relevant tests are also updated to not check for\nspecific versions since this causes them to be unhelpful change detector\ntests.\n\nBug: 485634720\nChange-Id: I0a80705fc2837bfb0485cdeccd8eade245aaf993\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7596173\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1587425}\nGitOrigin-RevId: 718efe09f5e285fd2a1a1e36fe58ff3f46b579c1\n"
    },
    {
      "commit": "ca859ac09c899a694ecbfdc6b350830ebc23f4fc",
      "tree": "009e78b218c8548a98a29224eec26a8d8e5cbc72",
      "parents": [
        "60caadfe193b60315649301b16b6703ba2ae4169"
      ],
      "author": {
        "name": "Jon Toohill",
        "email": "jtoohill@google.com",
        "time": "Thu Jan 29 23:39:47 2026"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Thu Jan 29 23:48:06 2026"
      },
      "message": "Populate fuzzing skill\n\nThe content is derived from the [Chromium FUZZ_TEST\ndocumentation](https://chromium.googlesource.com/chromium/src/+/main/testing/libfuzzer/getting_started.md),\nas well as the fuzztest docs.\n\nIt tries to adhere to the current best practices for skills described\nhere:\nhttps://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices\n\nAlso adds an eval which fails prior to the introduction of this skill.\n\nBug: b:478225077\nChange-Id: I5e9009d29d78f77b005da5fce0557b886a6a6964\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7517948\nCommit-Queue: Jon Toohill \u003cjtoohill@google.com\u003e\nAuto-Submit: Jon Toohill \u003cjtoohill@google.com\u003e\nReviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Erik Staab \u003cestaab@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1576907}\nGitOrigin-RevId: 80d7745fe1cba2016f017ff60cdea5458b6218f5\n"
    },
    {
      "commit": "60caadfe193b60315649301b16b6703ba2ae4169",
      "tree": "e91a84a22bf70837b48f6410cd8b1d91d4ffce77",
      "parents": [
        "a7ff07f005d931c0d014c2aaf75befeb0a016ceb"
      ],
      "author": {
        "name": "Sven Zheng",
        "email": "svenzheng@google.com",
        "time": "Wed Jan 28 18:03:28 2026"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Wed Jan 28 18:09:13 2026"
      },
      "message": "Add --list-tests to eval_prompts\n\nList all the eval test file and description.\nThis would be useful for people to get an idea\nof what all the tests are.\n\nTest: tested locally.\n$ vpython3 agents/testing/eval_prompts.py --list-tests\nagents/extensions/build-information/tests/host_arch.promptfoo.yaml:\nagents/extensions/build-information/tests/host_os.promptfoo.yaml:\nagents/prompts/eval/add_gtest_coverage/eval.promptfoo.yaml: Add gtest unit test coverage for a few methods with low coverage.\nagents/prompts/eval/build_file/eval.promptfoo.yaml: Verify that the correct target is found and built for the given file.\n\nagents/prompts/eval/build_target/eval.promptfoo.yaml: Verify the correct build target is provided for a source file\nagents/prompts/eval/class_recommendation/eval.promptfoo.yaml: Recommendation for class names from a user prompt.\nagents/prompts/eval/class_refactor/eval.promptfoo.yaml: Owner: dmurph@\nDescription: Refactor a class in prompt.md\n\nagents/prompts/eval/find_function/eval.promptfoo.yaml: Find which function call to use based on a user prompt.\nagents/prompts/eval/find_implementation_file/eval.promptfoo.yaml: Find the implementation of a header file.\nagents/prompts/eval/fix_broken_test/eval.promptfoo.yaml: Fix a broken DummyTest\n\nagents/prompts/eval/run_tests_in_file/eval.promptfoo.yaml: Verify that only tests in the given file are triggered.\nagents/prompts/eval/search_class/eval.promptfoo.yaml: Verify the correct file path is provided for searching a target class name in the Chromium code base.\n\nChange-Id: I3d31bddfa3303590f7a8428d70dd4517ab57b6e7\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7524650\nCommit-Queue: Sven Zheng \u003csvenzheng@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1576040}\nGitOrigin-RevId: 7f635d45dc0b6491d403c407a14b420d97e721c4\n"
    },
    {
      "commit": "a7ff07f005d931c0d014c2aaf75befeb0a016ceb",
      "tree": "8ca4bb62563db768432385d3fe2ec237460f6258",
      "parents": [
        "65cda24c47f59a3879bb0270fa6e292244541f57"
      ],
      "author": {
        "name": "Ashwin Verleker",
        "email": "ashwinpv@google.com",
        "time": "Thu Dec 18 18:36:12 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Thu Dec 18 18:42:59 2025"
      },
      "message": "Add support for extra test paths in eval_prompts.py\n\n- This allows us to use the same eval execution framework but have tests defined in other paths (eg: internal only evals).\n\nBug: 461531766\nChange-Id: I931bd4c60710126e6c0aca01383e6476e02718a8\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7271569\nCommit-Queue: Ashwin Verleker \u003cashwinpv@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1560674}\nGitOrigin-RevId: e0fd47cf8c1d5743db7a2d5d4dffff15cf71536c\n"
    },
    {
      "commit": "65cda24c47f59a3879bb0270fa6e292244541f57",
      "tree": "15e785de277dcbbabd218214ec6b6caab6721f36",
      "parents": [
        "5d780c0bb3e12e2e6b21d634fe643160c0bebe11"
      ],
      "author": {
        "name": "Jie Sheng",
        "email": "jiesheng@google.com",
        "time": "Tue Nov 25 20:05:29 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 25 20:10:39 2025"
      },
      "message": "Add a new evaluation for class refactoring.\n\nThis CL introduces a new promptfoo evaluation to test refactoring a nested struct into a separate class. It includes a patch to set up the initial files, a prompt detailing the refactoring steps, and adds a new `check_file_content` assertion to verify the changes in the refactored files. The evaluation checks for file creation, content updates, and build success.\n\nBug: 454962881\nChange-Id: If46e3a28ab2f6b5d32387c91b8219d08b206a78e\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7103486\nCommit-Queue: Jie Sheng \u003cjiesheng@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1549982}\nGitOrigin-RevId: 33c5ccb2776fe15528aeebdace759c40c7a5c7bc\n"
    },
    {
      "commit": "5d780c0bb3e12e2e6b21d634fe643160c0bebe11",
      "tree": "232a954b809ec61f8e428a1da94844c42ff2615e",
      "parents": [
        "f3e3b91641d3a56669822db33275ed7c5e84a54f"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Mon Nov 24 23:26:52 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Mon Nov 24 23:32:38 2025"
      },
      "message": "Set system prompts correctly\n\nWe support setting the system prompt in the test config but actually\njust prepend it to the user prompt. Instead this should be set to the\ngemini-cli system prompt when set. Currently no tests are utilizing this\nso it should be a no-op\n\nBug: 449776537\nChange-Id: I76ef609e5c05f733771bd5b63a770b533d55bb25\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7125578\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1549442}\nGitOrigin-RevId: 390b72e2018828be09e69e9af4eedce355cdd0aa\n"
    },
    {
      "commit": "f3e3b91641d3a56669822db33275ed7c5e84a54f",
      "tree": "a1cc520d67fc4c4d14975f4a0734ba3d0a21e941",
      "parents": [
        "2609ee73c092855c5352569ac3ff200389899e98"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Fri Nov 21 23:20:36 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Fri Nov 21 23:44:07 2025"
      },
      "message": "Revert \"Update promptfoo cipd tags\"\n\nThis reverts commit 5ca7352976ab766c90537ab25b33e934dae842f6.\n\nReason for revert: Suspect that promptfoo has issues with running\nin multiple processes at this revision. Reverting to verify\n\nOriginal change\u0027s description:\n\u003e Update promptfoo cipd tags\n\u003e\n\u003e This updated version appears to get context in the model graded asserts.\n\u003e Some more work might be needed to pass information correctly between the\n\u003e initial provider call and the assert provider call.\n\u003e\n\u003e Bug: None\n\u003e Change-Id: I9519b4e51793af8df225dfb515742ab665c3c9a8\n\u003e Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7178422\n\u003e Commit-Queue: Jie Sheng \u003cjiesheng@google.com\u003e\n\u003e Commit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\n\u003e Reviewed-by: Jie Sheng \u003cjiesheng@google.com\u003e\n\u003e Auto-Submit: Struan Shrimpton \u003csshrimp@google.com\u003e\n\u003e Cr-Commit-Position: refs/heads/main@{#1547880}\n\nBug: None\nChange-Id: I7162d33cf6565664cc3b43e680b2c08ff59f5920\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7188460\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nBot-Commit: Rubber Stamper \u003crubber-stamper@appspot.gserviceaccount.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Jie Sheng \u003cjiesheng@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1548755}\nGitOrigin-RevId: dad84b1fdcedbd05f8ab33895af74f33e82b22d3\n"
    },
    {
      "commit": "2609ee73c092855c5352569ac3ff200389899e98",
      "tree": "0bab78bbf56798244e07229ae0be4298a517fdd2",
      "parents": [
        "607be8badfb89684ac21467a490f644941b2af45"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Thu Nov 20 17:19:37 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Thu Nov 20 17:38:56 2025"
      },
      "message": "Update promptfoo cipd tags\n\nThis updated version appears to get context in the model graded asserts.\nSome more work might be needed to pass information correctly between the\ninitial provider call and the assert provider call.\n\nBug: None\nChange-Id: I9519b4e51793af8df225dfb515742ab665c3c9a8\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7178422\nCommit-Queue: Jie Sheng \u003cjiesheng@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Jie Sheng \u003cjiesheng@google.com\u003e\nAuto-Submit: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1547880}\nGitOrigin-RevId: 5ca7352976ab766c90537ab25b33e934dae842f6\n"
    },
    {
      "commit": "607be8badfb89684ac21467a490f644941b2af45",
      "tree": "a1cc520d67fc4c4d14975f4a0734ba3d0a21e941",
      "parents": [
        "e55a048de94dc7a22152386d641784bbc39db01d"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Thu Nov 13 00:49:54 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Thu Nov 13 00:56:19 2025"
      },
      "message": "Include the prompt/response in the rdb test result\n\nThese can potentially be long (particularly the responses) so add these\nas artifacts for now. These will be used immediately for displaying on\nthe dashboard\n\nBug: None\nChange-Id: I07daf7dbe61aa9d8e905179e8224eef3dcae7e07\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7146497\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Jie Sheng \u003cjiesheng@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1544058}\nGitOrigin-RevId: cdc4695d385639d39a848102da87cddb0e5ac330\n"
    },
    {
      "commit": "e55a048de94dc7a22152386d641784bbc39db01d",
      "tree": "cd8cf7824dec2aee04e54904843945e3652895b0",
      "parents": [
        "46d60b6ba13634292f8878f86894c471af13a498"
      ],
      "author": {
        "name": "Jie Sheng",
        "email": "jiesheng@google.com",
        "time": "Wed Nov 12 23:12:27 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Wed Nov 12 23:17:36 2025"
      },
      "message": "Support tool call checks in gemini cli eval framework.\n\nOutput all the tool call information from the telemetry output file and\nsave it into the gemini provider. Create a new check to verify this tool\ncall check. Example run:\nhttps://ci.chromium.org/ui/p/chromium/builders/try/linux-prompt-evals/197/test-results\n\nBug: 454962881\nChange-Id: Ife4c332c5f77c4e3677be70017c67a10c1bbea7a\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7127536\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Jie Sheng \u003cjiesheng@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1543988}\nGitOrigin-RevId: 3317fa490f3338cf0f35ff76888850535c9d781a\n"
    },
    {
      "commit": "46d60b6ba13634292f8878f86894c471af13a498",
      "tree": "606d1cdc83484a4b56c27fe5a3d676c6b4f795ba",
      "parents": [
        "776b148f16aa2c50550868be8a07d2fe8bc5c145"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Wed Nov 12 01:29:25 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Wed Nov 12 01:55:36 2025"
      },
      "message": "Make owners explicit\n\nOwners have been added by convention. We want to display the owner on\nthe dashboard though. Add the owner as a part of the test config and\ntag results with the owner\n\nBug: None\nChange-Id: I876c58284eea5f1f01ea5edd62de6a0380dd2fdb\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7143578\nAuto-Submit: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Jie Sheng \u003cjiesheng@google.com\u003e\nReviewed-by: Jie Sheng \u003cjiesheng@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1543456}\nGitOrigin-RevId: 185ac8ccc74c940d4f88e14816a47be090aadb6d\n"
    },
    {
      "commit": "776b148f16aa2c50550868be8a07d2fe8bc5c145",
      "tree": "f53f3551dbba199b66a52daf1335e538704ef7f3",
      "parents": [
        "3d5bab8043d4c73dd679d95c5479da488d387be0"
      ],
      "author": {
        "name": "Jie Sheng",
        "email": "jiesheng@google.com",
        "time": "Mon Nov 10 23:18:16 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Mon Nov 10 23:44:26 2025"
      },
      "message": "Improve the test log output in gemini cli eval framework.\n\nOutput the test log from prompt foo and fix the index of out bond error\nwhen checking results. Example failure with\noutput:https://ci.chromium.org/ui/p/chromium/builders/try/linux-prompt-evals/191/infra\n\nBug: 454962881\nChange-Id: Iff970dd0d00fa8b48e6f9835fbcc266a18e59600\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7139004\nCommit-Queue: Jie Sheng \u003cjiesheng@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1542809}\nGitOrigin-RevId: 7a5fa74d3afa2a1b461b96d01106a0070cb537f7\n"
    },
    {
      "commit": "3d5bab8043d4c73dd679d95c5479da488d387be0",
      "tree": "27ebb96c1725bd6bb458012a48f2b49cda4d4231",
      "parents": [
        "81c4be35dd1ef1c9ffe628d8c731dec5492b8e49"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Mon Nov 10 21:28:17 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Mon Nov 10 21:33:09 2025"
      },
      "message": "Include the test tags in the rdb tags\n\nIncluding these test tags in rdb tags will let up filter the\ndashboard more easily\n\nBug: None\nChange-Id: Ied5bb2ac7afcca427f9118fc03846de8d04c14d2\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7137032\nReviewed-by: Jie Sheng \u003cjiesheng@google.com\u003e\nCommit-Queue: Jie Sheng \u003cjiesheng@google.com\u003e\nAuto-Submit: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1542759}\nGitOrigin-RevId: 4a55309c3a32e7e45fca251e4bb8550f09f0851c\n"
    },
    {
      "commit": "81c4be35dd1ef1c9ffe628d8c731dec5492b8e49",
      "tree": "8b747bbbc3404202275cab5aae797f740f6cb18d",
      "parents": [
        "4fabd70acf339450cc1d76d8aa4852a25cd3ea56"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Fri Nov 07 21:13:53 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Fri Nov 07 21:19:42 2025"
      },
      "message": "Update prompt eval metric reporting\n\nMakes the following changes related to metric reporting to Skia perf\ndashboards in prompt eval tests:\n\n1. Makes several arguments no longer have a default since the builder\n   now passes these.\n2. Splits the test name from the metric name since we are no longer\n   required to fit within Chrome\u0027s legacy perf reporting hierarchy\n3. Adjusts several keys that are reported.\n\nBug: 449818513\nChange-Id: I1ced39e28de203d5c5880a4d620c084914c9b4ee\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7132630\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1541985}\nGitOrigin-RevId: 583c1198d87b1bc3a28d82b133d6060b9e753f99\n"
    },
    {
      "commit": "4fabd70acf339450cc1d76d8aa4852a25cd3ea56",
      "tree": "1c8b71d767e25259892656c3e239640a5ce35fba",
      "parents": [
        "e856061ecd74f00adce8ed06bf5c04b28b3f2c2b"
      ],
      "author": {
        "name": "Jie Sheng",
        "email": "jiesheng@google.com",
        "time": "Fri Nov 07 17:10:15 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Fri Nov 07 17:15:05 2025"
      },
      "message": "Create a new gemini cli eval to build a file.\n\nThis test case will try to find the correct build target of an input file from the user\u0027s prompt, and successfully build the target.\n\nBug: 454962881\nChange-Id: Id44850bef111e031531d8733d50f77158339ad7a\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7120587\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Jie Sheng \u003cjiesheng@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1541856}\nGitOrigin-RevId: 77f05606867e1814d7477d256fc3193985d7dcf6\n"
    },
    {
      "commit": "e856061ecd74f00adce8ed06bf5c04b28b3f2c2b",
      "tree": "380b5bbd3bb7c8fa1a95ed19e8f3a67c329ead01",
      "parents": [
        "f34305330d7a1faa458383f402e0b376440e8f2c"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Fri Nov 07 00:53:46 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Fri Nov 07 01:21:58 2025"
      },
      "message": "Tag rdb results with their metrics\n\nWe mostly care about the score metric but adding all of them is easy\nenough and will give us flexibility in our analysis.\n\nBug: None\nChange-Id: Ie2938db42b521e8a960cebcfafc5f47c345aa63f\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7128137\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1541529}\nGitOrigin-RevId: f64d86850cc61cee5d26498d6f303aa305c581a1\n"
    },
    {
      "commit": "f34305330d7a1faa458383f402e0b376440e8f2c",
      "tree": "dfc8efda5847d80d6aa273c3e050aac24ca81d6a",
      "parents": [
        "0c04b809cb30c7784e19776ea1169fc6d2fa7a6c"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Thu Nov 06 19:44:23 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Thu Nov 06 20:06:26 2025"
      },
      "message": "Update run_tests_in_file\n\nModel graded asserts need to have our provider set. Also update some\nminor changes. We should probably add some linting to verify these and\nlook into using either the vertex or gemini promptfoo providers. These\nwill need to be verified on the builder as well\n\nBug: 454962881\nChange-Id: Idae1b4cf10c03e3801a9ffd7850c2ce798d8c193\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7125573\nReviewed-by: Jie Sheng \u003cjiesheng@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1541354}\nGitOrigin-RevId: a533b0e53f1fc6a87a2147933eccf0b9f5cf355c\n"
    },
    {
      "commit": "0c04b809cb30c7784e19776ea1169fc6d2fa7a6c",
      "tree": "4b087aa8c2db2eebc20a87626919bededdb5514a",
      "parents": [
        "a135030e6d0b862b3ff99817c63ac6cad598c543"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Wed Nov 05 22:41:56 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Wed Nov 05 23:03:33 2025"
      },
      "message": "Clean workdirs regardless of forced\n\nParticularly when we have a large number of existing workdirs it is a\npain to clean them up manually while still not wanting to use --forced.\nIt seems reasonable for the expectation to be that the workdir will\nnot exist if run again.\n\nBug: None\nChange-Id: Iad9f36d0994809cba619bd0d179bf59745ac06ce\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7125125\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nAuto-Submit: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1540887}\nGitOrigin-RevId: 85109ee9baa8d502a499e9bf0f505504b191a715\n"
    },
    {
      "commit": "a135030e6d0b862b3ff99817c63ac6cad598c543",
      "tree": "9010f19001111c28cbf0e56283c30a4078e6d745",
      "parents": [
        "17cd0aa60b9dfd2d43cbebdaa35220e803cdad99"
      ],
      "author": {
        "name": "Jonathan Lee",
        "email": "jonathanjlee@google.com",
        "time": "Tue Nov 04 21:47:44 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 23:41:47 2025"
      },
      "message": "[agents][eval] Store system prompt in `//GEMINI.md`\n\nCurrently, the system and user prompts are concatenated in that order\nand fed to gemini-cli\u0027s stdin as one query. This breaks testing of slash\ncommands with system context because the command parser won\u0027t find the\nleading `/` in the query that identifies the command [0].\n\nAllow test cases to test slash commands by writing the system prompt to\nthe fake per-test `//GEMINI.md` [1], which gemini-cli will discover and\nhandle separately from the user prompt.\n\nReversing the prompt order won\u0027t work either because the parser will\ntreat the system prompt as one of the command `{{args}}` [2].\n\n[0]: https://github.com/google-gemini/gemini-cli/blob/460c3debf5ec73f0652a496254ad9b5b3622caf7/packages/cli/src/nonInteractiveCliCommands.ts#L38\n[1]: https://github.com/google-gemini/gemini-cli/blob/main/docs/cli/gemini-md.md#understand-the-context-hierarchy\n[2]: https://github.com/google-gemini/gemini-cli/blob/460c3debf5ec73f0652a496254ad9b5b3622caf7/packages/cli/src/utils/commands.ts#L68\n\nBug: 454049467\nTest: Check telemetry output file. Command is expanded with this change.\nChange-Id: Iff1fb0b95f882dfc073b2dd1da6c937d289e20d1\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7120319\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Jonathan Lee \u003cjonathanjlee@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1540313}\nGitOrigin-RevId: c9af86befcd070dbedf31433876ef3f0d51193d4\n"
    },
    {
      "commit": "17cd0aa60b9dfd2d43cbebdaa35220e803cdad99",
      "tree": "14c83e77929846d809633e259f576a4565df5058",
      "parents": [
        "c092057cb0cf785f724bcf043cdfac983b027d3a"
      ],
      "author": {
        "name": "Jie Sheng",
        "email": "jiesheng@google.com",
        "time": "Tue Nov 04 19:48:55 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 23:41:18 2025"
      },
      "message": "Create a new gemini cli eval to verify run tests in file.\n\nThis test case will trigger the tests only in a gtest file in the\nbase_unittest suite. After the test run, assert the target test target,\ntotal test number and a model check to see if the tests are actually\ntriggered.\n\nBug: 454962881\nChange-Id: Iec198198783b187e9cac3e4dd53f7eb35babe032\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7097595\nCommit-Queue: Jie Sheng \u003cjiesheng@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1540218}\nGitOrigin-RevId: 9bb8d5a7c9b2864d9b5ada14875e9ee1762f869f\n"
    },
    {
      "commit": "c092057cb0cf785f724bcf043cdfac983b027d3a",
      "tree": "9061fa2ca48bf981c22dbd0d53aa843bc1357e33",
      "parents": [
        "2930733e36c69faab8b9fde4f86968fadee1b38d"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Tue Nov 04 17:06:43 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 23:35:08 2025"
      },
      "message": "Move perf uploading to handler\n\nMoves the prompt eval perf dashboard uploading to a user-provided\nhandler instead of directly tying it to the result thread. This should\nbe a functional no-op.\n\nBug: 456827244\nChange-Id: I0b405245af5b2707b6fa23c4ec644e7ec4aaa677\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7114818\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1540100}\nGitOrigin-RevId: 6f7fd1ad320f25741b8936d19bb4d2e8c4e5dd26\n"
    },
    {
      "commit": "2930733e36c69faab8b9fde4f86968fadee1b38d",
      "tree": "731f9067088d4870a06c9531e4be9a830954e871",
      "parents": [
        "65da9b613badd1a33f6e6e6ede9b31b3023d65b8"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Mon Nov 03 21:19:04 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 23:34:48 2025"
      },
      "message": "eval: Add negative tag filtering for tests\n\nThis CL updates the prompt evaluation test runner to support negative\ntag filtering. Users can now exclude tests by prefixing a tag with a\nhyphen (e.g., `-slow`).\n\nBug: 456720346\nChange-Id: I4e05b21bb2af8656ed09f3c3690dbe3d22e625f9\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7113645\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: James Woo \u003cjewoo@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1539587}\nGitOrigin-RevId: 1fa83eb9b63c0cd25f4efcd0fcc8f17b87f3bdb2\n"
    },
    {
      "commit": "65da9b613badd1a33f6e6e6ede9b31b3023d65b8",
      "tree": "c9570100687338bbadab0edbc219fdc39dab1fab",
      "parents": [
        "2389503b89c1c8b11bd1d70001c505dad1c627ed"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Mon Nov 03 18:23:32 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 23:34:32 2025"
      },
      "message": "Move ResultDB reporting to handler\n\nMoves prompt eval ResultDB reporting to be in a result handler instead\nof being tied to the ResultThread. This should functionally be a no-op.\n\nBug: 456827244\nChange-Id: Ib67bee354cc3620053b711171a6483ccdc425dc2\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7104013\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1539448}\nGitOrigin-RevId: 45664be500911a74bdb365584b83974da52dde80\n"
    },
    {
      "commit": "2389503b89c1c8b11bd1d70001c505dad1c627ed",
      "tree": "eda086bd5299b824f4a96dc8a5733c2ff84fbd07",
      "parents": [
        "ba2d83a5f5ae72c2f62174706d8bd73f26cf3129"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Mon Nov 03 16:37:19 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 23:34:11 2025"
      },
      "message": "Support user-provided result handlers\n\nSupports the use of user-provided result handlers while processing\nresults for prompt eval tests. This is currently unused, but existing\nprocessing that is tied to the ResultThread will be moved to handlers\nin follow-up CLs.\n\nBug: 456827244\nChange-Id: If416af6551faa64795d021fc98515f0a7386069f\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7108301\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1539375}\nGitOrigin-RevId: b2009c6b55127d40c1576c855ec23197e7f41e39\n"
    },
    {
      "commit": "ba2d83a5f5ae72c2f62174706d8bd73f26cf3129",
      "tree": "69a7c8ed632fe0353e07f24d0e89c09488c7aef7",
      "parents": [
        "04cbefcfec96f7ec7f62109c6130b2ea592ff778"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Fri Oct 31 19:34:04 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 23:33:52 2025"
      },
      "message": "Separate reporting for each iteration\n\nThis is probably going to lead to a confusing situation where fails\nand passes look flaky in the test results tab. Landing this first is\nboth preferable so I can start writing some queries based on the\nresults and will make the bad UX corner cases more obvious. We might\nwant to create a super-test result for the combined iterations that\nwill make the overall status more obvious\n\nBug: None\nChange-Id: I24b8430f1015e0fb6c43d74333b54153c58989d3\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7098822\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1538788}\nGitOrigin-RevId: 715fa7f70e7c6959372b99470c60b3e54492b25a\n"
    },
    {
      "commit": "04cbefcfec96f7ec7f62109c6130b2ea592ff778",
      "tree": "98189ea1f9555095f4e66d5f82c4fac9b828b0af",
      "parents": [
        "d2778a502781c9bea47878df122620638db6cab4"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Fri Oct 31 18:19:31 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 23:33:34 2025"
      },
      "message": "eval: Add test filtering on metadata\n\nThis change introduces a generic metadata filtering mechanism for\nprompt evaluation tests. Users can now specify a comma-separated list of\ntags using the `--metadata-filter` argument to filter tests. Only tests\nwith at least one matching tag will be executed.\n\nBug: 449785879\nChange-Id: Id38165e51466f77265d8978fc81ead57a686dca3\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7101060\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1538738}\nGitOrigin-RevId: 47bb4bc655e5a4ae7f2aea0da50a12d98a803930\n"
    },
    {
      "commit": "d2778a502781c9bea47878df122620638db6cab4",
      "tree": "2ef91a80f3ee428c44ac972b6865811c5b156b50",
      "parents": [
        "3a58f9172505b0a8997dc7f8ed99919886af4e13"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Thu Oct 30 00:29:02 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 23:33:03 2025"
      },
      "message": "Adjust prompt eval perf upload location\n\nMakes the following adjustments to the path that prompt eval tests\nupload perf data to:\n\n1. Switch to the \"ingest\" subdirectory. This is consistent with other\n   uploaders and avoids the need to add a special directory to the\n   ingestion allowlist.\n2. Adds builder_group/builder/build_number after the time-based\n   directories. This is consistent with other uploaders and will make\n   it easier to find data if we ever need to dig through the GCS bucket\n   ourselves.\n\nThe new --builder-group and --build-number arguments currently have\ndefaults set, but these will be removed once the recipe is updated to\npass these in.\n\nBug: 449818513, 450054252\nChange-Id: Ibb4ccbcdc991863390465d3107820c07674a8331\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7098880\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1537672}\nGitOrigin-RevId: 842369b0f2d66e40f60634d284710af269e8ebfb\n"
    },
    {
      "commit": "3a58f9172505b0a8997dc7f8ed99919886af4e13",
      "tree": "61cc98e056e0c8f96550dff0e0af08d6b11e05cd",
      "parents": [
        "e634b2f7e146f102c085d991655284b1d23c94bd"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Wed Oct 29 23:49:58 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 23:32:37 2025"
      },
      "message": "Reland \"Eval Prompts: Add custom node support and cipd option for gcli\"\n\nThis is a reland of commit a5f9e5d5fcfeb19b8b0bff542c36f2be6aad5af9\n\nThe cipd install had to be mocked like the promptfoo one and the\n.cipd flag change propagated to the rest of the framework.\n\nOriginal change\u0027s description:\n\u003e Eval Prompts: Add custom node support and cipd option for gcli\n\u003e\n\u003e The cipd version requires running with the cipd version of node as well\n\u003e so this plumbing is necessary to pin a prebuilt gemini cli. It\u0027s unclear\n\u003e at the moment if/when the cipd\u0027d version of gemini cli should be used\n\u003e though since we can\u0027t control what version users are running and the\n\u003e plans to do so don\u0027t involve a cipd version.\n\u003e\n\u003e Bug: 445459853\n\u003e Change-Id: I9650787435695bab59b4d11dabc6173c01b0f622\n\u003e Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7075971\n\u003e Commit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\n\u003e Reviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\n\u003e Cr-Commit-Position: refs/heads/main@{#1537621}\n\nBug: 445459853\nChange-Id: Ic6e496f527efdfb7c809a086ca3a3bdada0fa992\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7099059\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1537650}\nGitOrigin-RevId: 9998fe6ea8779c196fcbc6059c9061e2b6ce29a7\n"
    },
    {
      "commit": "e634b2f7e146f102c085d991655284b1d23c94bd",
      "tree": "b455fded3a4ee57e20a2e9532847bb3de14e3401",
      "parents": [
        "4a0111a758bf4bbe3428e7baeedcd0baccb49396"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Wed Oct 29 23:14:09 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 23:32:08 2025"
      },
      "message": "Revert \"Eval Prompts: Add custom node support and cipd option for gcli\"\n\nThis reverts commit a5f9e5d5fcfeb19b8b0bff542c36f2be6aad5af9.\n\nReason for revert: Unit tests didn\u0027t mock the cipd install and\nthe flag update wasn\u0027t complete\n\nOriginal change\u0027s description:\n\u003e Eval Prompts: Add custom node support and cipd option for gcli\n\u003e\n\u003e The cipd version requires running with the cipd version of node as well\n\u003e so this plumbing is necessary to pin a prebuilt gemini cli. It\u0027s unclear\n\u003e at the moment if/when the cipd\u0027d version of gemini cli should be used\n\u003e though since we can\u0027t control what version users are running and the\n\u003e plans to do so don\u0027t involve a cipd version.\n\u003e\n\u003e Bug: 445459853\n\u003e Change-Id: I9650787435695bab59b4d11dabc6173c01b0f622\n\u003e Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7075971\n\u003e Commit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\n\u003e Reviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\n\u003e Cr-Commit-Position: refs/heads/main@{#1537621}\n\nBug: 445459853\nNo-Presubmit: true\nNo-Tree-Checks: true\nNo-Try: true\nChange-Id: I80074ece16b4715947121e8a8ee777f920a45b1d\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7098879\nAuto-Submit: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Rubber Stamper \u003crubber-stamper@appspot.gserviceaccount.com\u003e\nBot-Commit: Rubber Stamper \u003crubber-stamper@appspot.gserviceaccount.com\u003e\nCr-Commit-Position: refs/heads/main@{#1537636}\nGitOrigin-RevId: 94ede239dee789fe0db90ea8ce11d2284d09e119\n"
    },
    {
      "commit": "4a0111a758bf4bbe3428e7baeedcd0baccb49396",
      "tree": "714c42a9ea6c537d0759531323694a4c2a2737eb",
      "parents": [
        "91b5c413db55ef1b48d84eea72678e074dd57542"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Wed Oct 29 23:02:08 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:54:47 2025"
      },
      "message": "Fix perf uploading location\n\nFixes minutes being used instead of hours for the prompt eval perf\ndashboard GCS path.\n\nBug: 450054252, 449818513\nChange-Id: Ieb210e9dd26385ecc6c8591231ecc1a0a07f878b\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7098659\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1537628}\nGitOrigin-RevId: a1d3e0dcc8f75fa184848bb4c550df37bcb9cd0a\n"
    },
    {
      "commit": "91b5c413db55ef1b48d84eea72678e074dd57542",
      "tree": "ac83ccd2c29eb871caad7e188ef61569746e8468",
      "parents": [
        "0e4368bbd080a9f506fc367159c3e53fca169e24"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Wed Oct 29 22:54:28 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:54:28 2025"
      },
      "message": "Eval Prompts: Add custom node support and cipd option for gcli\n\nThe cipd version requires running with the cipd version of node as well\nso this plumbing is necessary to pin a prebuilt gemini cli. It\u0027s unclear\nat the moment if/when the cipd\u0027d version of gemini cli should be used\nthough since we can\u0027t control what version users are running and the\nplans to do so don\u0027t involve a cipd version.\n\nBug: 445459853\nChange-Id: I9650787435695bab59b4d11dabc6173c01b0f622\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7075971\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1537621}\nGitOrigin-RevId: a5f9e5d5fcfeb19b8b0bff542c36f2be6aad5af9\n"
    },
    {
      "commit": "0e4368bbd080a9f506fc367159c3e53fca169e24",
      "tree": "581af6b29652c970c49b6b944c257106e294146f",
      "parents": [
        "c401587c6169e8410d92bcf3a84048a372ee945c"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Wed Oct 29 16:15:34 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:54:02 2025"
      },
      "message": "Adjust perf upload format\n\nAdjusts the format used for uploading to the perf dashboard in\n//agents/testing to include `bot` and `benchmark` keys. This should\nhopefully be the last change necessary to get the prompt eval tests\ncompatible with the perf dashboard.\n\nBug: 449818513\nChange-Id: Id2f29f1b9df84c6817411269d9d0917f84ffaab0\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7092459\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1537376}\nGitOrigin-RevId: 528a3d8cb13f1c83a489619639bb8b270d41114a\n"
    },
    {
      "commit": "c401587c6169e8410d92bcf3a84048a372ee945c",
      "tree": "7dc8ef974f4bf89c10a3c7e5df9860701c9d160b",
      "parents": [
        "228eed9ede9c8dbc5cf6f7b0494b189fd6dca696"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Tue Oct 28 18:01:54 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:53:35 2025"
      },
      "message": "Add prompt eval metric uploading\n\nMakes the necessary changes to support merging and uploading collected\nmetrics from prompt eval tests. The vast majority of this functionality\nis hidden behind the --enable-perf-uploading flag, so it should not\naffect any currently running tests.\n\nSome details may change as we receive additional information from the\nperf dashboard team, but any future changes should be relatively minor,\ne.g. changes to the GCS path.\n\nTested locally up to the point where gsutil is run, which fails as\nexpected since a fake bucket was provided.\n\nBug: 449818513\nChange-Id: I12efe58b1ad968a6e06a9629d0feeed2337e1c52\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7088074\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1536788}\nGitOrigin-RevId: 6688b9b0aec412b971d5c3c96d77f477498137b6\n"
    },
    {
      "commit": "228eed9ede9c8dbc5cf6f7b0494b189fd6dca696",
      "tree": "62e09b4367bfaf9db2286583cd4eac5d15c405e4",
      "parents": [
        "934b4d7edc534ede734acfbac909a9c48afe0b34"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Mon Oct 27 19:38:20 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:53:12 2025"
      },
      "message": "Prompt Evals: specify precompile_targets on the test\n\nThis will make compiles much faster and remove most uses for the\n--no-build flag\n\nBug: None\nChange-Id: I9cbd175b6efebdc0df2de5fd021dc7c2ac467e3a\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7081866\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1536166}\nGitOrigin-RevId: 9f6777f3df8b48aee3f028b982406601b3f56d10\n"
    },
    {
      "commit": "934b4d7edc534ede734acfbac909a9c48afe0b34",
      "tree": "fdf45ff9855a56775b4b9456f58899a8759c5e41",
      "parents": [
        "18d80c28a8baa6ebf1e3dfbfd8efd0293f5f9cab"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Fri Oct 24 20:45:11 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:52:41 2025"
      },
      "message": "Eval Prompts: Remove promptfoo npm/src\n\nNow that we have a pinned version of promptfoo to use and we can pass\na preinstalled version, the npm/src versions are unlikely to be used so\nwe don\u0027t need to keep maintaining this code. This simplifies the install\nconsiderably and reduces the number of flags we need to worry about\n\nBug: None\nChange-Id: I94e8eb7309e218067936c7306b2cfead5410c7e3\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7075243\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1535287}\nGitOrigin-RevId: d0af3bb221d88f7db1edc072a5c1a8055fdde820\n"
    },
    {
      "commit": "18d80c28a8baa6ebf1e3dfbfd8efd0293f5f9cab",
      "tree": "7aa40043d6036481d9cffd9c49766c9af674c1cc",
      "parents": [
        "3641d75955538cf6186b0c73f4f7742c974c9b5a"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Fri Oct 24 19:02:24 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:52:15 2025"
      },
      "message": "Prompt Eval: Improve test reliability\n\nApply test reruns for the flaky tests. They both pass ~30% of the time\nso 1 in 5 passes should be a reasonable starting point.\n\nRemove the excessive attempts on host_arch to improve eval runtime.\nThis test should never fail.\n\nThe assert for host_platform is also failing occasionally due to case\nsensitivity.\n\nThe build_target test appears to have changed as well. Update it for\nnow but this will probably require adding a patch or finding a more\nstable file/target\n\nHopefully after this lands we\u0027ll start seeing some green builds again\n\nBug: 443097039\nChange-Id: I695bf0f89b3fab2b380d7aafd8efe18f34fc7ce5\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7081776\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1535214}\nGitOrigin-RevId: ba116c0df8c116243b20bd177ee85b59925ec999\n"
    },
    {
      "commit": "3641d75955538cf6186b0c73f4f7742c974c9b5a",
      "tree": "19874d9c0ac1456f3c6dde4b03abd11667b812a1",
      "parents": [
        "bbec6dfa57f1a86ddbc97954b49160ebd5b1001d"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Thu Oct 23 23:39:41 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:51:48 2025"
      },
      "message": "Eval Prompts: Improve readability of test results\n\nThe assertion failures are easy to see locally with `promptfoo view`\nbut that isn\u0027t available on the builder. Including the prompt and\nresponse in a more formatted and digestible way will hopefully\nimprove the UX over the excessive output we have currently.\n\nBug: None\nChange-Id: Ie5f09544a7500833804fc1a94306f60d2c65d870\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7081088\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1534680}\nGitOrigin-RevId: ee04ef510907065e9aa8305215bbf66545853d8d\n"
    },
    {
      "commit": "bbec6dfa57f1a86ddbc97954b49160ebd5b1001d",
      "tree": "1997ddeb54898ad3ddb676f69c511011a70dcefe",
      "parents": [
        "a09ab25e045da3d99460c6389bbafd5f33e098a1"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Thu Oct 23 18:28:10 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:51:24 2025"
      },
      "message": "Refactor test/iteration results\n\nRefactors prompt eval test results to better reflect the fact that a\nsingle reported test result can contain multiple underlying iterations\nas part of the pass@k functionality.\n\nThis should functionally be a no-op, but unblocks us from reporting\nmetrics from all iterations.\n\nBug: 449818513\nChange-Id: If6464c41b5e2454928fcadd9ab0b75532d25440d\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7075878\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1534484}\nGitOrigin-RevId: f32a75d183cf2f1591c867823e00cb525a326d9d\n"
    },
    {
      "commit": "a09ab25e045da3d99460c6389bbafd5f33e098a1",
      "tree": "b1c041afc1d7e6b3036fb501f38426ea91c18471",
      "parents": [
        "d2c74f5fd82c1dbf666be282911617a7d4c98411"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Wed Oct 22 19:11:43 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:50:54 2025"
      },
      "message": "Add pass@k support to prompt evals\n\nThis CL adds support for pass@k to the prompt evaluation framework.\nThis allows tests to be run multiple times and considered successful if\nthey pass a certain threshold.\n\nThe following features are included:\n\n-   `runs_per_test`: The number of times to run each test.\n-   `pass_k_threshold`: The minimum number of successful runs required\n    for a test to be considered passing.\n-   Early exit: Tests will exit early if they have already passed the\n    threshold or can no longer pass.\n\nBug: 449150877\nChange-Id: I494693689071e71a82ba146c747945b7228279d9\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7036741\nCommit-Queue: James Woo \u003cjewoo@google.com\u003e\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1533829}\nGitOrigin-RevId: f8ebad25b62ed592457ee43d470409a03c18e341\n"
    },
    {
      "commit": "d2c74f5fd82c1dbf666be282911617a7d4c98411",
      "tree": "90220fb1948888cbdd04535e6a4c15decd474ff3",
      "parents": [
        "2fa6db2edf9eabb739248fc0a2cba2a03cb55fe4"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Wed Oct 22 16:36:41 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:50:27 2025"
      },
      "message": "Adjust args for perf dashboard uploading\n\nAdjusts the way arguments are handled for ResultThread to better support\narguments related to uploading to the perf dashboard. These new\narguments are currently unused, so this CL should not result in any\nfunctional changes. Use of these new args will be handled in a follow-up\nCL.\n\nBug: 449818513\nChange-Id: I042b17edeeab5c8a80d62c50615cb9667238c1e0\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7070236\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1533682}\nGitOrigin-RevId: b294c44cfa27e2f3728446562aa4d3fb06236fab\n"
    },
    {
      "commit": "2fa6db2edf9eabb739248fc0a2cba2a03cb55fe4",
      "tree": "73ed97ca45e4201004137832c80660cea1be20e3",
      "parents": [
        "498a5bd4b2a932d758740da4fd30f6cee375ea41"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Tue Oct 21 22:51:22 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:50:02 2025"
      },
      "message": "Prevent unit tests from pulling cipd packages\n\nThe unit tests are pulling cipd packages because the functions haven\u0027t\nbeen mocked. This was the root cause of cipd package unit tests getting\nrun in presubmit\n\nBug: None\nChange-Id: Ic6f778583c51c98e17cf9e9b59cbced53fcbb94f\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7070084\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nAuto-Submit: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1533289}\nGitOrigin-RevId: 22156139ab6a7b778a11430a7f6f365f4d7a90a2\n"
    },
    {
      "commit": "498a5bd4b2a932d758740da4fd30f6cee375ea41",
      "tree": "c81f97044dc35dc34b87d918bc3d174700aa0682",
      "parents": [
        "4dbddcb730a52862b73c3272a354b00fb394fd18"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Tue Oct 21 22:33:12 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:49:41 2025"
      },
      "message": "Eval prompts: fix default gemini bin for 1P\n\nThe alias which is how the 1p wrapper is installed does not get passed\nthrough to the provider which requires passing the gemini-cli-bin flag.\nInstead, when that flag isn\u0027t present, use the same helper function\nthe install.py script uses\n\nAlso remove cipd packages from PRESUBMIT\n\nBug: None\nChange-Id: I74afe9cf352de7b8022a25787a8aecf7dfa6a873\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7067936\nAuto-Submit: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Erik Staab \u003cestaab@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1533278}\nGitOrigin-RevId: daabcd0db5a59b9c6d82821978f43a9e00fae9e4\n"
    },
    {
      "commit": "4dbddcb730a52862b73c3272a354b00fb394fd18",
      "tree": "67be0fb1716889d97cb80046e11aaff8bc60b388",
      "parents": [
        "02bdd58c0e6f57f892498bba35cfc8f16fd43c60"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Tue Oct 21 18:41:50 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:49:21 2025"
      },
      "message": "Eval Prompts: Add trusted folders for temp HOME dir\n\nThe internal 1P gemini-cli tool is failing due to \"untrusted\" folders\nduring the extension installation. This is probably because that\nversion is more up to date than the 3p version we use. The temporary\nHOME folders need this trustedfolders file to be updated to allow\ninstalling the extensions\n\nAlso update eval_prompts.py to use vpython3. This is how all our\nunit tests are run which was creating a difference between test coverage\nand the actual code being run. e.g. TemporaryDirectory(delete) does\nnot exist in our vpython3 version\n\nBug: 443097039\nChange-Id: Ied726d2165050f8853619807fea451786f110d3a\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7068002\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1533107}\nGitOrigin-RevId: 314bd8337a229604c361f51cb01afee8d4b64074\n"
    },
    {
      "commit": "02bdd58c0e6f57f892498bba35cfc8f16fd43c60",
      "tree": "2aa44f1cac47014f0c53cc6b9400d48289b974df",
      "parents": [
        "356a0f3aa02c5ef31232f6c4aa25333af2e5c112"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Tue Oct 21 17:32:23 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:49:03 2025"
      },
      "message": "Add and use by default cipd promptfoo\n\nPull the cipd packages from the runner itself rather than DEPS files so\nwe don\u0027t require devs who won\u0027t use the eval framework from getting the\nfiles or a gclient variable to control it. gemini cli might eventually\nbe something we want to include in DEPS but that\u0027s for another CL.\n\nBug: 445459853\nChange-Id: I2b75872c5fabfb9dcea9690dd0aa07b70e0e6222\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7062129\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nAuto-Submit: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1533065}\nGitOrigin-RevId: d58b26c566ee4d3c0e897522820e156142abf1ce\n"
    },
    {
      "commit": "356a0f3aa02c5ef31232f6c4aa25333af2e5c112",
      "tree": "83abdcc5f850443731c16d4633b24f3cd980fbae",
      "parents": [
        "8b4da35886f3502e3795362bc4f1817453d8e9c6"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Tue Oct 21 15:40:49 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:48:43 2025"
      },
      "message": "Use settings.json for gemini-cli telemetry\n\nUpdates gemini_provider.py to use settings.json for configuring\ngemini-cli\u0027s telemetry instead of command line flags since the latter\nis deprecated.\n\nBug: 449818513\nChange-Id: I85510b75df884edbbb6d82bd5688ebc317207a95\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7061942\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1532986}\nGitOrigin-RevId: 31bdac9b448141bfbc5c81b840986ce4b3051ff9\n"
    },
    {
      "commit": "8b4da35886f3502e3795362bc4f1817453d8e9c6",
      "tree": "52015489d05b06bc598889a734af9f46634ff39b",
      "parents": [
        "4220cc7f7ccdc56533b15338257c99ba58b21a62"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Thu Oct 16 17:53:43 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:48:13 2025"
      },
      "message": "Extract prompt eval scores\n\nExtracts the test score from promptfoo results and surfaces them in\nthe same way as token usage. To better accommodate this and any future\nmetrics, both score and token usage have been combined into a single\n\"metrics\" field of tests results.\n\nBug: 449818513\nChange-Id: I125ca9f11e9e4f93d9963b63f74f0bddc72ada14\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7047471\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1530946}\nGitOrigin-RevId: cc79a219e9eaa3c1e7a347b52fdfe5df90b01f05\n"
    },
    {
      "commit": "4220cc7f7ccdc56533b15338257c99ba58b21a62",
      "tree": "249daf15634066b97c26686522ba8d8617ef45e0",
      "parents": [
        "eb5f995b2e2cee1199903772b596ae06128df753"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Thu Oct 16 00:19:39 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:47:47 2025"
      },
      "message": "Update //agents/testing/README\n\nThis file is a little dated, e.g. referring to the script as\nexperimental and local only. Update it since we are expecting\ncontributions to the evals soon.\n\nBug: None\nChange-Id: Ifa5678f1c8c0a2f7ec0c4186a1c338ba6baeaf6d\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7047383\nAuto-Submit: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1530551}\nGitOrigin-RevId: fa5e01ed93db0f85062f07a2a80f6a8ed316e508\n"
    },
    {
      "commit": "eb5f995b2e2cee1199903772b596ae06128df753",
      "tree": "7b7f7877ab429bf9a36c0a11e9287f15a0f98acf",
      "parents": [
        "f37d63cb681f3ad6b1ab6af8372ae49f3b397676"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Tue Oct 14 19:55:07 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:47:20 2025"
      },
      "message": "Surface gemini-cli token usage\n\nUpdates the code under //agents/testing/ to surface gemini-cli\u0027s token\nusage during tests. This currently just gets logged when reporting\nresults, but will eventually be uploaded to either ResultDB or the\nperformance dashboard.\n\nBug: 449818513\nChange-Id: I8fffc499e0ec69f60758541b377bb50fc56a94b4\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7038209\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1529726}\nGitOrigin-RevId: 64e9189f9a07c514d8a71c407386e8b2af24449b\n"
    },
    {
      "commit": "f37d63cb681f3ad6b1ab6af8372ae49f3b397676",
      "tree": "1fb126fa8f6f448af8deb64b742114cb7fcac962",
      "parents": [
        "f9cc4215d200eefb1aa24a36de128fd21ebeef12"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Mon Oct 13 06:03:23 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:46:58 2025"
      },
      "message": "Update eval_prompts for structured test ids\n\nIt looks like the result sink library already has support for structured\ntest ids so we should just be able to supply test_id_structure\n\nBug: 445458673\nChange-Id: Ib0be7efeb491c5d4317506a6e51e02938918808a\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7032448\nAuto-Submit: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Patrick Meiring \u003cmeiring@google.com\u003e\nReviewed-by: Patrick Meiring \u003cmeiring@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1528755}\nGitOrigin-RevId: 944d14b90e5e4d0d6fe924b1c656ecc982acc844\n"
    },
    {
      "commit": "f9cc4215d200eefb1aa24a36de128fd21ebeef12",
      "tree": "4bdcac08bd4dee5eeaec02c42019619122a1a3b4",
      "parents": [
        "183dad92f24e8f36a266f0b9dda32536e8ebb570"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Fri Oct 10 19:54:11 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:46:36 2025"
      },
      "message": "agents: Add pass@k configuration parsing\n\nThis CL introduces the initial components for pass@k test evaluation.\n\nIt adds a `PassKConfig` dataclass and a `_read_pass_k_config`\nfunction to parse pass@k parameters (`runs_per_test` and\n`pass_k_threshold`) from the metadata of promptfoo test files.\n\nBug: 449150877\nChange-Id: Id260dd95f3335c2d89611fefa78840a340f5f451\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7031198\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nCommit-Queue: James Woo \u003cjewoo@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1528312}\nGitOrigin-RevId: 0189840757b5cb78dee3d4abf327bf0759e239a1\n"
    },
    {
      "commit": "183dad92f24e8f36a266f0b9dda32536e8ebb570",
      "tree": "dcc2d3e87750bdadcd8b496180487ac98499e48d",
      "parents": [
        "cc9199603aca48551e2d42c3034ab84c5c0134b1"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Fri Oct 10 16:16:21 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:46:12 2025"
      },
      "message": "Fix gemini_provider console width parsing\n\nFixes gemini_provider.py parsing the console width as a string. The\ncast to an int was accidentally dropped during a recent refactor.\n\nBug: 449818513\nChange-Id: I485c16329d6deed35ef1f7e0d15ae86c9b961013\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7030699\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1528177}\nGitOrigin-RevId: 814321f9e42458e6a6bdf91d3d4f5c1b7f3c1884\n"
    },
    {
      "commit": "cc9199603aca48551e2d42c3034ab84c5c0134b1",
      "tree": "2a657dfef85d3b2d4a2c5f6ca38c443f7b5bbeca",
      "parents": [
        "fad9ca4cd9a6d8192f3c1d179ef495bfd95f8354"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Thu Oct 09 22:31:58 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:45:55 2025"
      },
      "message": "Refactor gemini_provider\u0027s call_api\n\nRefactors gemini_provider.py\u0027s call_api implementation to be less\nmonolithic. Additionally, missing test coverage is added for all touched\ncode.\n\nThis should be a functional no-op except for these fixes:\n1. The output thread is set to be daemonic so that it cannot accidentally\n   block process shutdown\n2. Failure to parse a timeout results in an error instead of silently\n   using the default timeout\n\nBug: 449818513\nChange-Id: I248b6ddd48c539bbc4d7c3dc048e341b6581e46d\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7030040\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1527804}\nGitOrigin-RevId: fb20cfd020148f64bdd9872534d1632f395251d1\n"
    },
    {
      "commit": "fad9ca4cd9a6d8192f3c1d179ef495bfd95f8354",
      "tree": "c2f601ca32f144accc44df8f0c50ca1b0266b42e",
      "parents": [
        "52c65dc1fec88e6eab4b8488306ddc283bac872b"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Thu Oct 09 15:39:48 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:45:38 2025"
      },
      "message": "Docs: Improve btrfs setup instructions\n\nAdds an optional step to the btrfs setup guide for making the mount\npermanent via `/etc/fstab`.\n\nBug: 450339979\nChange-Id: I5ad5797bfd48d68c29cf6741f5021da05e1bd495\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7026674\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1527547}\nGitOrigin-RevId: 2e548f32b2bc554267cb6cef19b80cf8c3f897fe\n"
    },
    {
      "commit": "52c65dc1fec88e6eab4b8488306ddc283bac872b",
      "tree": "0730e6f1a2228619d0bf823362f42996c9aac581",
      "parents": [
        "9a59afd76ed80763d4f704b1bd211556e185c40f"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Wed Oct 08 20:35:54 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:45:05 2025"
      },
      "message": "agents: Centralize Gemini helper functions\n\nThis CL moves the functions for finding the Gemini executable and its\nversion into a shared `agents.common.gemini_helpers` module.\n\nThis avoids code duplication between the `install.py` and\n`eval_prompts.py` scripts. The project root finding logic in\n`install.py` is also simplified.\n\nBug: 448648646\nChange-Id: Id0ea1ba2a71b407980f1e7f59664f04bee831e1e\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7015972\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: James Woo \u003cjewoo@google.com\u003e\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1527133}\nGitOrigin-RevId: 5b52aeb077ed8371a89b233ab76e757fafc1784a\n"
    },
    {
      "commit": "9a59afd76ed80763d4f704b1bd211556e185c40f",
      "tree": "8038b2d38575f3444d49802316445d516a2f8309",
      "parents": [
        "e5537eb623c4514b10ce15904aa298f1f39710a7"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Wed Oct 08 20:27:46 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:44:40 2025"
      },
      "message": "feat(eval): Set sandbox PATH from container image\n\nConfigures the PATH environment variable for the Gemini agent\u0027s sandbox.\n\nPreviously, the sandbox PATH was not correctly set, preventing tools\nfrom being found. This change introduces a mechanism to inspect the\nsandbox container image, retrieve its default PATH, and then prepend\nthe `depot_tools` directory.\n\nBug: 448648646\nChange-Id: I6525b064656cf3cbf39e4a6d061a2f487cbb6b36\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7017109\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Erik Staab \u003cestaab@chromium.org\u003e\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nReviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1527128}\nGitOrigin-RevId: bb85a2d8627764e23160bce1e8ad836838a67e8c\n"
    },
    {
      "commit": "e5537eb623c4514b10ce15904aa298f1f39710a7",
      "tree": "103abf5213d8d22c644d14db03db49bc5618a9cf",
      "parents": [
        "069ef681b918ef0317230867e617f19c06b4f87a"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Wed Oct 08 18:18:09 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:44:14 2025"
      },
      "message": "Add an unrestricted parallel option\n\nThe recipe doesn\u0027t know how many tests need to run and with the overhead\nfor creating workdirs being so low it makes sense to just launch\neverything in parallel. Make it so a value of -1 will just make a thread\nfor each test\n\nBug: 445458673\nChange-Id: I578155a97ec8545cb83afb042b1fd09eeab651a6\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7022333\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nAuto-Submit: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1527032}\nGitOrigin-RevId: 9d37065390be95c98f18fab3bed6f9a245f96938\n"
    },
    {
      "commit": "069ef681b918ef0317230867e617f19c06b4f87a",
      "tree": "f2a8cdbccd4222dea71b6c5fdbe9221e800be482",
      "parents": [
        "935f99d26f36fdaf9b4a5296a77c36078817d3df"
      ],
      "author": {
        "name": "Jonathan Lee",
        "email": "jonathanjlee@google.com",
        "time": "Wed Oct 08 00:36:56 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:43:53 2025"
      },
      "message": "[agents][eval] Introduce reusable `check_gtests.py` assert\n\nBug: 441944053\nTest: agents/testing/eval_prompts.py --verbose --force\nChange-Id: I4300ba041b29ded2ced0d71e65d406a48c69ad9a\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7018873\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\nCommit-Queue: Jonathan Lee \u003cjonathanjlee@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1526617}\nGitOrigin-RevId: 7344c328fd41b7200590708b1176df7afebe625d\n"
    },
    {
      "commit": "935f99d26f36fdaf9b4a5296a77c36078817d3df",
      "tree": "303b0a4603049663adb56d76dd15dfd15bada0b4",
      "parents": [
        "f7162c19a5fc3353bdbf9eae288470f508454216"
      ],
      "author": {
        "name": "Jonathan Lee",
        "email": "jonathanjlee@google.com",
        "time": "Tue Oct 07 21:49:04 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:43:30 2025"
      },
      "message": "[agents][eval] Enable `use_remoteexec` to speed up builds\n\nBuilds default to local execution [0], which is slow.\n\n[0]: https://source.chromium.org/chromium/chromium/src/+/main:build/toolchain/rbe.gni;l\u003d26-27;drc\u003d72a2697389211ef9e02d66d9349cd8a530877774;bpv\u003d0;bpt\u003d0\n\nBug: None\nChange-Id: Ibe6efa8e5cf40eb1d9976080fa9a7ec277097b64\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7018793\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Jonathan Lee \u003cjonathanjlee@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1526540}\nGitOrigin-RevId: d0cdf30fa8aa5d381b6808b4cfdea75f1f2eb58b\n"
    },
    {
      "commit": "f7162c19a5fc3353bdbf9eae288470f508454216",
      "tree": "6e0b8b008fc2a03fb2903e8e5cfc36fb8a7bae42",
      "parents": [
        "583c967b108cf4f3c1adbb8d42a82bd8b601dadd"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Tue Oct 07 17:08:15 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:43:00 2025"
      },
      "message": "feat(eval): Mount depot_tools in sandbox\n\nWhen sandboxing, automatically find and mount the `depot_tools`\ndirectory. This ensures essential development tools are available.\n\nIf `depot_tools` cannot be located, the provider now returns a clear\nerror instead of proceeding with a misconfigured sandbox.\n\nBug: 448648646\nChange-Id: If544709a3a458cc180240874a64e6ba617bd6102\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7013810\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1526378}\nGitOrigin-RevId: 3f96c6cdb0fbdb5adf74de2efb16bc003ea9dadd\n"
    },
    {
      "commit": "583c967b108cf4f3c1adbb8d42a82bd8b601dadd",
      "tree": "5f31b90be13c601e3e0f54df4381f7e0c5ad53a4",
      "parents": [
        "cb6d3358938403391df6e2d5506a7c944acb1bf4"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Tue Oct 07 16:32:51 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:42:37 2025"
      },
      "message": "Add missing prompt eval arg\n\nAdds --isolated-script-test-launcher-retry-limit, which was missed when\noriginally adding the isolated script test args to\n//agents/testing/eval_prompts.py. This is just an alias for the existing\n--retries argument.\n\nBug: 441944907\nChange-Id: I9ded170c5fdc627a7d00d38352b82838da164110\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7015028\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1526343}\nGitOrigin-RevId: caf5356ef5b04f967732597768c75add0b1c4672\n"
    },
    {
      "commit": "cb6d3358938403391df6e2d5506a7c944acb1bf4",
      "tree": "5fecc1982b67c014d93f04cbff30a072cefbd5b9",
      "parents": [
        "458560cd1f13de9b188867f6694f02fcbd11dd08"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Mon Oct 06 21:14:31 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:42:05 2025"
      },
      "message": "refactor(eval): Directly fetch sandbox image\n\nThe sandbox pre-fetch logic no longer relies on the side effect of\nrunning a `gemini --sandbox no-op` command.\n\nThe script now determines the required image version via\n`gemini --version` and fetches it directly with `docker pull`. This\napproach is more efficient and explicit.\n\nBug: 449204868\nChange-Id: I0ed88a76a5c14084d0039204315fba35038d32a2\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7008411\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1525861}\nGitOrigin-RevId: c0778d463f13666efc2d9d17caef2acaa5c34ff0\n"
    },
    {
      "commit": "458560cd1f13de9b188867f6694f02fcbd11dd08",
      "tree": "38da365fc42e1e07b65f7d73b5fe56a8a31a39e3",
      "parents": [
        "e692bd3107ee12823065cc4881c3855c2c90842a"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Thu Oct 02 23:25:06 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:41:39 2025"
      },
      "message": "Add remaining isolated script args + validation\n\nMakes the following changes to //agents/testing/eval_prompts.py:\n* Adds --isolated-script-test-output and\n  --isolated-script-test-perf-output. These args are currently unused,\n  but need to be parsed since they are standard isolated script test\n  args.\n* Adds _validate_args to catch cases of invalid args ASAP instead of\n  waiting until they cause issues later on in the script.\n\nFixed: 441944907\nChange-Id: I8ecfc82f81eed7f31d6f3f224832f8d758bf913c\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7006553\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1524506}\nGitOrigin-RevId: ce81febbcdc65a9e0a4c1f5c85f307c1876428a6\n"
    },
    {
      "commit": "e692bd3107ee12823065cc4881c3855c2c90842a",
      "tree": "ab7efd5c40c319a48a2364754f326233d70e8544",
      "parents": [
        "850bc8544dc6edd835dd6cfee1d9bd28fbb2d617"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Wed Oct 01 21:48:50 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:41:11 2025"
      },
      "message": "Add prompt eval --isolated-script-test-repeat\n\nAdds the --isolated-script-test-repeat argument to\n//agents/testing/eval_prompts.py which causes tests to be run multiple\ntimes when set.\n\nBug: 441944907\nChange-Id: I57b7b4ce0df2e41ae9b3d6b247a11c5597bb82b3\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7004367\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1523854}\nGitOrigin-RevId: 6adca808c16aba48e8003d8a1713b6a4a2c4cbd6\n"
    },
    {
      "commit": "850bc8544dc6edd835dd6cfee1d9bd28fbb2d617",
      "tree": "6fcc435b9cd8e2884d0eabe10e6c94f58ef48e24",
      "parents": [
        "45403f3f6cb0888d0cb1ee82083241bcbf58a71f"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Wed Oct 01 21:30:58 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:40:48 2025"
      },
      "message": "feat(eval): Use temp home for test environments\n\nThis change introduces a dedicated home directory for each test\nworker. This ensures that tests are hermetic and do not interfere\nwith each other or the user\u0027s global configuration.\n\nKey changes:\n- Each test worker now gets a dedicated `home_dir`.\n- Agent extensions are installed into this `home_dir`.\n- The Gemini CLI is invoked with this `home_dir`.\n\nBug: 447583599\nChange-Id: If312a271fab32dbbe7f07afd9c56220cc7afd9e0\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7004264\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1523842}\nGitOrigin-RevId: a1534816c8e0ff166405bb240901639a60b87045\n"
    },
    {
      "commit": "45403f3f6cb0888d0cb1ee82083241bcbf58a71f",
      "tree": "fa27247844242011d333d284fc4f75882da2d1f3",
      "parents": [
        "858723aac579c0eabe618b76511182f9443cb806"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Wed Oct 01 18:49:58 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:40:24 2025"
      },
      "message": "Update prompt eval test filtering\n\nMakes the following changes related to test filtering in\n//agents/testing/eval_prompts.py:\n\n1. The filter changes from a single string that is used for substring\n   matching to a string containing a ::-separated list of globs to use\n   for filtering. This is both more flexible (since multiple separate\n   tests can be filtered to even if they share no substrings in common)\n   and conforms to the isolated script test standard.\n2. Adds --isolated-script-test-filter as an alias for --filter to\n   conform to the isolated script test standard.\n\nBug: 441944907\nChange-Id: I2c22b08442e40d205219a79660050f385f8198ad\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7001804\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1523757}\nGitOrigin-RevId: fb7a8d544afe010f7ab56405dd8faf2ef812c574\n"
    },
    {
      "commit": "858723aac579c0eabe618b76511182f9443cb806",
      "tree": "c606feb1dd0855e395d523c55caa70f1426f305e",
      "parents": [
        "76e79dc01917ce88c8421407a87d3aa8f8a51a74"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Wed Oct 01 17:15:26 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:39:56 2025"
      },
      "message": "Rename example and test extensions with underscores\n\nExtensions can no longer have underscores. Replace these with hyphens.\nNote, test_landmines is still referenced in the provider. I will update\nthat after build-information lands to avoid dealing with merge\nconflicts\n\nBug: 448417182\nChange-Id: Ibf768c6f9748b559425970606d80520af53081eb\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7001145\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1523677}\nGitOrigin-RevId: 4bd492807fa836daa0642d1f5907a5b513ecf2a9\n"
    },
    {
      "commit": "76e79dc01917ce88c8421407a87d3aa8f8a51a74",
      "tree": "ba1cacdbf4913ac6a568b3ba90166205e6a0e9c1",
      "parents": [
        "ad78f25c019ecab2b066e51759ed0c8aadb64380"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Wed Oct 01 16:41:28 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:39:35 2025"
      },
      "message": "Refactor prompt eval argument parsing\n\nOrganizes arguments in //agents/testing/eval_prompts.py into related\ngroups and adds unittest coverage. There should not be any functional\nchanges from the test runner\u0027s point of view other than that\n--promptfoo-bin is now correctly in a mutually exclusive group with\nother promptfoo installation arguments. This does make the --help\noutput more user-friendly and the arguments a bit easier to maintain,\nthough.\n\nBug: 441944907\nChange-Id: I66cbf3e9cb000915590e2166eac7809548ca5f3a\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7001190\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1523641}\nGitOrigin-RevId: 68f49f9a80deb713c8c100f9e15d2fed8c49b999\n"
    },
    {
      "commit": "ad78f25c019ecab2b066e51759ed0c8aadb64380",
      "tree": "6fef10b8232efaa9ad7b5a8f7e1fde07f764528c",
      "parents": [
        "f91b76207157aff5a941533b84df557481ef0027"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Wed Oct 01 16:05:08 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:39:10 2025"
      },
      "message": "Update build_information MCP to build-information\n\nExtensions cannot have underscores anymore and due to the way we wrote\ninstall.py we need the dir to match the extension name. The linter\nalso needs to be updated to not lint deleted/renamed files\n\nBug: 448417182\nChange-Id: I0df7c736f56deb16a2e0403233248de21084d957\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7001142\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1523607}\nGitOrigin-RevId: 14dcb9fd78fda1b39494a0bdd496fcf4ff29b3dc\n"
    },
    {
      "commit": "f91b76207157aff5a941533b84df557481ef0027",
      "tree": "fccb37d91886580d6d133d1475f7f4f774be101e",
      "parents": [
        "74e36b127147ee54617c4a59dd843123b57b46e6"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Tue Sep 30 22:06:07 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:38:44 2025"
      },
      "message": "Add OWNERS to extensions and testing\n\nAdd myself and bsheedy to agents/extensions and myself to agents/testing\nso we don\u0027t have to add estaab to so many CLs\n\nBug: None\nChange-Id: I2d06e6d37aa836b61da9196c212bce15b7d792c2\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7001451\nReviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1523175}\nGitOrigin-RevId: fcebe4a1bb592cec6e4ae0e82ae9376489b27fe1\n"
    },
    {
      "commit": "74e36b127147ee54617c4a59dd843123b57b46e6",
      "tree": "0732e0e6a3d47e8caec734d0a92d338b15af2c56",
      "parents": [
        "cd484d4486881c1d3a4a14544e4e2fe47b68567a"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Tue Sep 30 16:44:53 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:38:15 2025"
      },
      "message": "Add missing //agents/testing tests\n\nAdds test coverage that was missing for code under //agents/testing.\n\nChange-Id: I5be23634827f627407ec5f6df36aba449b0b96e6\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6997327\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1522934}\nGitOrigin-RevId: a53471cedbb868443155e495e88d43fb6995b825\n"
    },
    {
      "commit": "cd484d4486881c1d3a4a14544e4e2fe47b68567a",
      "tree": "e56cffa325179e714188a7d316bfa9d7b0434230",
      "parents": [
        "8e5c3111cfc0ae8bcc86c4051690458ab59944eb"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Tue Sep 30 16:04:29 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:37:53 2025"
      },
      "message": "feat(eval): Add support for local dev binaries\n\nThis change enhances the local development workflow for prompt\nevaluation tests by allowing developers to use their own local builds\nof `gemini-cli` and `promptfoo`.\n\nTwo new flags are introduced to `agents/testing/eval_prompts.py`:\n- `--gemini-cli-bin`: Specifies the path to a local `gemini-cli`\n  executable.\n- `--promptfoo-bin`: Specifies the path to a local `promptfoo`\n  executable.\n\nTo support this, a new `PreinstalledPromptfooInstallation` class has\nbeen added, and the custom `gemini-cli` path is passed through the\nworker configuration to the Gemini provider.\n\nBug: 441943501\nChange-Id: I8405e28476258527750d6900802eff34fc938781\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6990858\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1522900}\nGitOrigin-RevId: 9aaa7d9b0560a35660a056eb2b6f9ae9cb511862\n"
    },
    {
      "commit": "8e5c3111cfc0ae8bcc86c4051690458ab59944eb",
      "tree": "16e441115a7132b4f9296b296f0079c30bb81c00",
      "parents": [
        "8f6a2b568be27ef17d7a04492b00d530d7632b1b"
      ],
      "author": {
        "name": "Jiamei Liu",
        "email": "jiameil@google.com",
        "time": "Tue Sep 30 00:23:02 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:37:27 2025"
      },
      "message": "Reland \"Add presubmit checks for `promptfoo.yaml` files.\"\n\nThis is a reland of commit 4de67983344598fb815ee4c9fea7d91465955f0d\n\nChanges since original:\n1. For the file paths we are checking we are\nalways using the path relative as of chromium/src.\n2. Tested in 2 ways\n  a) duplicate the current template in\nhost: https://screenshot.googleplex.com/8AaUrZNqpL5Vh5Q and local\npresubmit check passes\n  b) Change the current template point to a non-existing file: https://screenshot.googleplex.com/EfZVeUUFkTZjX5c, presubmit check failed.\n\nThe reverted CL only did local test on b) not a), causing some of the\ntest to fail.\n\nOriginal change\u0027s description:\n\u003e Add presubmit checks for `promptfoo.yaml` files.\n\u003e\n\u003e The checks ensure that `promptfoo.yaml` files are valid YAML and that\n\u003e 1.providers.config.changes.*.apply points to an array of valid files.\n\u003e 2.providers.config.templates points to an array of valid files.\n\u003e 3.providers.config.extensions points to an array of valid files.\n\u003e\n\u003e Separate the IO part and the lint part into different segments, the\n\u003e lint_promptfoo_testcases.py only contains the lint logic. Unit test also\n\u003e added for the lint logic part. Unit test formats are based on\n\u003e go/chromium-automated-prompt-eval-dd example yaml file.\n\u003e\n\u003e Change-Id: Ie20f402e7442144eea4a7a1dc8077bb89bb5f3c7\n\u003e Bug: 443353637\n\u003e Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6941324\n\u003e Commit-Queue: Jiamei Liu \u003cjiameil@google.com\u003e\n\u003e Reviewed-by: Jonathan Lee \u003cjonathanjlee@google.com\u003e\n\u003e Reviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\n\u003e Cr-Commit-Position: refs/heads/main@{#1518910}\n\nBug: 443353637\nChange-Id: I5df0696c2ef0e947ee34f15798d677dd4aae327a\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6977674\nCommit-Queue: Jiamei Liu \u003cjiameil@google.com\u003e\nReviewed-by: Jonathan Lee \u003cjonathanjlee@google.com\u003e\nReviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1522593}\nGitOrigin-RevId: e219362e3a914533d11562d50792f374cd0846b6\n"
    },
    {
      "commit": "8f6a2b568be27ef17d7a04492b00d530d7632b1b",
      "tree": "08ffb3719eaaecd2c5a1f53513527c2a84f65bea",
      "parents": [
        "3ced77530aa3b96ef651c28f66eb6cfd99be7cd0"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Fri Sep 26 21:48:52 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:37:05 2025"
      },
      "message": "[6/6] Parallel worker cleanup\n\nThis is part of a chain of CLs to add support for multiple parallel\nworkers to //agents/testing/eval_prompts.py.\n\nThis CL performs some code cleanup now that parallel worker support has\nbeen added. Specifically:\n\n1. Moves _check_btrfs and _get_gclient_root into a new\n   checkout_helpers.py file.\n2. Uses these moved functions to avoid having to pass root_path and\n   is_btrfs all the way down to worker threads.\n3. Updates ResultThread to create its own queues/counter instead of\n   taking them as constructor parameters.\n\nBug: 445459870\nChange-Id: Iab91bd6c919c01b16e6365b4cd3e07c8833e4f01\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6987084\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1521582}\nGitOrigin-RevId: 3e7ea8f45ea1b1ef0618871c54bb97c3c18f3b0e\n"
    },
    {
      "commit": "3ced77530aa3b96ef651c28f66eb6cfd99be7cd0",
      "tree": "cf174a7bd4d297be0f7a399ebb57a112e75d3b01",
      "parents": [
        "6cba49a3c56898669145acc12f64fc852876afb4"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Fri Sep 26 01:22:29 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:36:41 2025"
      },
      "message": "[5/6?] Support multiple parallel workers\n\nThis is part of a chain of CLs to add support for multiple parallel\nworkers to //agents/testing/eval_prompts.py.\n\nThis CL creates new WorkerPool and WorkerThread classes which together\nabstract away parallelized tests from the test runner. The runner just\nneeds to place tests it wants to run into a queue and block on all\nqueued tests to complete.\n\nBug: 445459870\nChange-Id: I485394043895d22435def68ac131f9078e9e81cc\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6986513\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1520999}\nGitOrigin-RevId: 40536ec79d7b29117543aac3a29c3919a54660bc\n"
    },
    {
      "commit": "6cba49a3c56898669145acc12f64fc852876afb4",
      "tree": "c8a66db64fbfd7f385818487940b87dff6a1531d",
      "parents": [
        "68b0b0db17ec8ffd883fbad7f5ee2ddb9dd1a9f8"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Thu Sep 25 19:57:02 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:36:10 2025"
      },
      "message": "feat(testing): Add test retries for flaky tests\n\nTo handle the non-deterministic nature of LLMs and reduce flakiness\nin CI/CQ, this change introduces a mechanism to automatically retry\nfailed prompt evaluation tests.\n\nA new `--retries` command-line argument allows specifying the number\nof times a failed test should be re-run. The test will be marked as\nsuccessful if any of its attempts pass.\n\nBug: 441941580\nChange-Id: I996205a3b7ad7c6faae8bf2fb76fabb8beb4b079\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6985543\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1520823}\nGitOrigin-RevId: 6d3271ffbed857d8abb4f896a10acc88b80eca7c\n"
    },
    {
      "commit": "68b0b0db17ec8ffd883fbad7f5ee2ddb9dd1a9f8",
      "tree": "a8c700f6a9046cc8676dc1dfd47688fbea941590",
      "parents": [
        "3572449e2c57d6329673696e8ea6b3a1bcba71e0"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Thu Sep 25 16:16:02 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:25:09 2025"
      },
      "message": "[4/?] Move promptfoo installation code to new file\n\nThis is part of a chain of CLs to add support for multiple parallel\nworkers to //agents/testing/eval_prompts.py.\n\nThis CL moves promptfoo installation-related code into a new\npromptfoo_installation.py file. This does not provide any immediate\nbenefit other than making eval_prompts.py less monolithic, but this\nwill be necessary for future changes since the worker pool will depend\non a promptfoo installation.\n\nAlso drive-by updates the moved tests to share mock setup between tests\ninstead of using repeated mock.patch annotations.\n\nBug: 445459870\nChange-Id: I92d57e552949d9b37b5bf22f077f5ad8b04d244d\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6981296\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1520668}\nGitOrigin-RevId: e565fbd586461bc8031ea9f94c8ca5b87609dccb\n"
    },
    {
      "commit": "3572449e2c57d6329673696e8ea6b3a1bcba71e0",
      "tree": "97633267ecdb9519105aeb99912a3449288df9cf",
      "parents": [
        "124fe1b0455f5e3a5d9f51cdec339a4ff545f58b"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Thu Sep 25 16:13:04 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:24:50 2025"
      },
      "message": "[3/?] Move WorkDir to new file\n\nThis is part of a chain of CLs to add support for multiple parallel\nworkers to //agents/testing/eval_prompts.py.\n\nThis CL moves the WorkDir class into a new workers.py file. This does\nnot currently provide any real benefit other than making eval_prompts.py\nless monolithic, but additional worker-related code will be added to\nthis file in follow-up CLs.\n\nAlso performs two drive-by improvements:\n1. Makes the result thread daemonic so that it does not block the\n   Python process from exiting.\n2. Updates the moved tests for WorkDir to share mock setup between tests\n   instead of using repeated mock.patch annotations.\n\nBug: 445459870\nChange-Id: I37562b2e31577972ec61d8e24dd6e2309aa773ab\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6981947\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1520664}\nGitOrigin-RevId: 154dfd9c0e22f8a369b125d98dea5ce219d4b1ff\n"
    },
    {
      "commit": "124fe1b0455f5e3a5d9f51cdec339a4ff545f58b",
      "tree": "63f88353aefcdcebb6e5e8933d14467b64331726",
      "parents": [
        "ecd6686d3fb7cb2790eb724025228da3119e691d"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Thu Sep 25 04:55:47 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:24:21 2025"
      },
      "message": "feat(agents): Add flag for including test extensions\n\nAdds the `--include-test-extensions` flag to the extension installation\nscript (`agents/extensions/install.py`).\n\nThis allows developers to install and test extensions located in the\n`agents/testing/extensions` directory, which are not intended for\ngeneral use.\n\n- The `gemini_provider` is updated to use this flag when setting up\n  the testing environment.\n- The `test_landmines` extension has been moved to the new test\n  extensions directory.\n- The `gemini_provider` is also updated to include `test_landmines` as a\ndefault extension.\n\nBug: 441944775\nChange-Id: I7e99533212f29f5b88cd74f519dbbfa8a4d92200\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6974283\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nCommit-Queue: James Woo \u003cjewoo@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1520249}\nGitOrigin-RevId: c4e86b75440b0083587f4a9c0fc22ef4e252b020\n"
    },
    {
      "commit": "ecd6686d3fb7cb2790eb724025228da3119e691d",
      "tree": "23b2d10b7cd51ef8aa9b678d7e464459b3e82753",
      "parents": [
        "a352339d8234f681a6ec629d09c8a518c8ab4ad3"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Thu Sep 25 02:07:38 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:23:59 2025"
      },
      "message": "fix(eval): Fix input prompt when fetching sandbox\n\nPass \u0027no-op\u0027 instead of an empty string to fix a failure in the\n`_fetch_sandbox_image` function. If the `gemini` subprocess fails, its\nstdout and stderr are captured and logged.\n\nAdditionally, the default for the `--sandbox` flag has been changed\nto `False` to make local testing simpler.\n\nBug: 446965021\nChange-Id: Ia75c4e1bb7cf7ce055945acdb80ddea007f8584f\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6972629\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nCommit-Queue: James Woo \u003cjewoo@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1520152}\nGitOrigin-RevId: dbdd47078a2b105adfa34d5e9f70f500663a884d\n"
    },
    {
      "commit": "a352339d8234f681a6ec629d09c8a518c8ab4ad3",
      "tree": "9d6df15231d61d8152f9eadcb3ffd571c8927e6e",
      "parents": [
        "0ba62c18a390cfbeeb1d4419b5e0f4644bee2f85"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Wed Sep 24 00:35:52 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:23:39 2025"
      },
      "message": "[2/?] Move result reporting to separate thread\n\nThis is part of chain of CLs to add support for multiple parallel\nworkers to //agents/testing/eval_prompts.py.\n\nThis CL moves result reporting out from the main test loop and into a\nseparate thread that is started before any tests are run. This does not\ncurrently have any real benefit, but is a prerequisite to supporting\nmultiple parallel workers. With this approach, each independent worker\nshould be able to send their test results into a shared queue and\neverything else will be handled automatically.\n\nBug: 445459870\nChange-Id: Ie0ae966b068882088f9f36f52226365f5b10beeb\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6975188\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1519697}\nGitOrigin-RevId: 682279059827584a78ad377ea7fd38b3f035b767\n"
    },
    {
      "commit": "0ba62c18a390cfbeeb1d4419b5e0f4644bee2f85",
      "tree": "058075e75adc97339b5fa4b8077d7e298b57ed73",
      "parents": [
        "2415af31eed1955a441770b1c7b68995fa1e9f60"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Tue Sep 23 16:36:19 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:23:16 2025"
      },
      "message": "[1/?] Move agents result-related code\n\nThis is part of chain of CLs to add support for multiple parallel\nworkers to //agents/testing/eval_prompts.py.\n\nThis CL moves the existing result-related code to a new results.py file\nand switches to using a new TestResult dataclass instead of multiple\narguments. This is a prerequisite to refactoring result reporting to be\ndone in a separate thread, which in turn is a prerequisite to supporting\nmultiple workers.\n\nBug: 445459870\nChange-Id: I439d948e22b66e375bdbf201d2a1c91bfb3ea5fa\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6974428\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1519404}\nGitOrigin-RevId: 9c2dcbe639477c3e7263fd2e3e5a042ca6dd5dfd\n"
    },
    {
      "commit": "2415af31eed1955a441770b1c7b68995fa1e9f60",
      "tree": "a071fc19ae5934517810a6997407e5d5cc593d16",
      "parents": [
        "82d4265080151551af741bb56cb8f1d54ae8767c"
      ],
      "author": {
        "name": "Wenbo Jie",
        "email": "wenbojie@chromium.org",
        "time": "Tue Sep 23 01:30:18 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:22:57 2025"
      },
      "message": "Revert \"Add presubmit checks for `promptfoo.yaml` files.\"\n\nThis reverts commit 4de67983344598fb815ee4c9fea7d91465955f0d.\n\nReason for revert: presubmit failure\nhttps://ci.chromium.org/ui/p/chromium/builders/ci/linux-presubmit/32465/overview\n\nOriginal change\u0027s description:\n\u003e Add presubmit checks for `promptfoo.yaml` files.\n\u003e\n\u003e The checks ensure that `promptfoo.yaml` files are valid YAML and that\n\u003e 1.providers.config.changes.*.apply points to an array of valid files.\n\u003e 2.providers.config.templates points to an array of valid files.\n\u003e 3.providers.config.extensions points to an array of valid files.\n\u003e\n\u003e Separate the IO part and the lint part into different segments, the\n\u003e lint_promptfoo_testcases.py only contains the lint logic. Unit test also\n\u003e added for the lint logic part. Unit test formats are based on\n\u003e go/chromium-automated-prompt-eval-dd example yaml file.\n\u003e\n\u003e Change-Id: Ie20f402e7442144eea4a7a1dc8077bb89bb5f3c7\n\u003e Bug: 443353637\n\u003e Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6941324\n\u003e Commit-Queue: Jiamei Liu \u003cjiameil@google.com\u003e\n\u003e Reviewed-by: Jonathan Lee \u003cjonathanjlee@google.com\u003e\n\u003e Reviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\n\u003e Cr-Commit-Position: refs/heads/main@{#1518910}\n\nBug: 443353637\nNo-Presubmit: true\nNo-Tree-Checks: true\nNo-Try: true\nChange-Id: Ie08bc39de25834a0deb6c706283dc95a79eb84a3\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6975384\nCommit-Queue: Rubber Stamper \u003crubber-stamper@appspot.gserviceaccount.com\u003e\nBot-Commit: Rubber Stamper \u003crubber-stamper@appspot.gserviceaccount.com\u003e\nAuto-Submit: Wenbo Jie \u003cwenbojie@chromium.org\u003e\nOwners-Override: Wenbo Jie \u003cwenbojie@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1519068}\nGitOrigin-RevId: 59ea40112c8d4cf4b88052eacb58e750b11ddc1a\n"
    },
    {
      "commit": "82d4265080151551af741bb56cb8f1d54ae8767c",
      "tree": "f96ffb37912834f83b395af43350bd1ecdbe9e7b",
      "parents": [
        "5f8c6d41a56a91b6fcc59f45a4ecdbbf5cbb9422"
      ],
      "author": {
        "name": "Jiamei Liu",
        "email": "jiameil@google.com",
        "time": "Mon Sep 22 20:20:35 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:22:27 2025"
      },
      "message": "Add presubmit checks for `promptfoo.yaml` files.\n\nThe checks ensure that `promptfoo.yaml` files are valid YAML and that\n1.providers.config.changes.*.apply points to an array of valid files.\n2.providers.config.templates points to an array of valid files.\n3.providers.config.extensions points to an array of valid files.\n\nSeparate the IO part and the lint part into different segments, the\nlint_promptfoo_testcases.py only contains the lint logic. Unit test also\nadded for the lint logic part. Unit test formats are based on\ngo/chromium-automated-prompt-eval-dd example yaml file.\n\nChange-Id: Ie20f402e7442144eea4a7a1dc8077bb89bb5f3c7\nBug: 443353637\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6941324\nCommit-Queue: Jiamei Liu \u003cjiameil@google.com\u003e\nReviewed-by: Jonathan Lee \u003cjonathanjlee@google.com\u003e\nReviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1518910}\nGitOrigin-RevId: 4de67983344598fb815ee4c9fea7d91465955f0d\n"
    },
    {
      "commit": "5f8c6d41a56a91b6fcc59f45a4ecdbbf5cbb9422",
      "tree": "a071fc19ae5934517810a6997407e5d5cc593d16",
      "parents": [
        "fd967e27096a83dad98cb35cf0754513224e54ba"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Fri Sep 19 23:45:39 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:22:05 2025"
      },
      "message": "Consolidate RunPromptEvalTestsUnittest mocking\n\nRefactors the mocking used for RunPromptEvalTestsUnittest in\n//agents/testing/eval_prompts_unittest.py. Previously, each test\nspecified a list of mocks via annotations which were largely identical\nbetween all tests.\n\nWith this CL, mocking is instead handled as part of shared test setup.\nThis means that tests automatically get everything mocked to reasonable\ndefaults and only need to explicitly change behavior that they care\nabout.\n\nChange-Id: I6c9a2485fb21e549bc6fce9a5a2d330dc47ebd7f\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6966513\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1518239}\nGitOrigin-RevId: f5ce43b57cddec3d2a5e7799b4abd8e0e97b7ba8\n"
    },
    {
      "commit": "fd967e27096a83dad98cb35cf0754513224e54ba",
      "tree": "1720a0602b5990d2c7d1d89867b8d5e0f60dc102",
      "parents": [
        "8fb46201fc242deecbcc0de0b485fa9824f02d30"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Fri Sep 19 21:56:05 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:21:35 2025"
      },
      "message": "Add prompt eval ResultDB integration\n\nAdds native ResultDB integration to //agents/testing/eval_prompts.py\nusing the existing ResultSink implementation from //build/util.\n\nAs a side effect of capturing the output from tests to report to\nResultDB, the following improvements have also been made:\n\n1. Output from successful tests is automatically hidden unless\n   --print-output-on-success is passed in.\n2. Always passes in --no-table to `promptfoo eval` since it did not\n   provide any benefit with the way we are using promptfoo and did\n   not display properly in captured logs.\n\nBug: 441944061\nChange-Id: I27438feabb861d1f68b36db41c8aa4b5d0e29286\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6966507\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Erik Staab \u003cestaab@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1518184}\nGitOrigin-RevId: a7c253bc6007a3accd862817b2d1add44353387b\n"
    },
    {
      "commit": "8fb46201fc242deecbcc0de0b485fa9824f02d30",
      "tree": "29d29de603ce1c8fc6c86a18c92dbdf8f2c26e42",
      "parents": [
        "9b5374fe1a0edadd9b570ed426821f25b06aa46b"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Thu Sep 18 19:07:58 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:21:13 2025"
      },
      "message": "feat(eval): Add test_landmines extension\n\nThis extension helps provide a hermetic environment for automated prompt\nevaluation tests. It prevents the model from uploading changes to\nGerrit.\n\nThe extension is designed to be loaded by the eval test runner and\ncomplements the existing `landmines` extension.\n\nBug: 441944775\nChange-Id: Ie6cdbb6ab159a3864f335f86a5a953457b5c7f49\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6966317\nCommit-Queue: James Woo \u003cjewoo@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1517475}\nGitOrigin-RevId: 8720771736229de1603a9fd301e37545f0d327c8\n"
    },
    {
      "commit": "9b5374fe1a0edadd9b570ed426821f25b06aa46b",
      "tree": "21cb55d80202c7c9ad29e390d6e2e7ad590ea326",
      "parents": [
        "93a1f19af39ad58e6203ff307a61d0153e0f3e43"
      ],
      "author": {
        "name": "James Woo",
        "email": "jewoo@google.com",
        "time": "Thu Sep 18 18:24:05 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:20:51 2025"
      },
      "message": "feat(testing): Enable sandboxed prompt evaluations\n\nTo ensure prompt evaluation tests are hermetic, this change runs\ngemini-cli within a sandbox by default. This prevents tests from\nhaving side effects on the host system, which is critical for\nrunning on CI/CQ bots.\n\nThe test runner now pre-fetches the sandbox image and adds the\n--sandbox flag to gemini-cli calls.\n\nA --no-sandbox flag has been added to allow developers to run tests\nlocally without a container runtime.\n\nBug: 441944057\nChange-Id: If8567383f519b7027e2970b44965b9f5eb8a2033\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6955305\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: James Woo \u003cjewoo@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1517443}\nGitOrigin-RevId: 6661d9b2cf29df839cec7c88d046a6d0ead37995\n"
    },
    {
      "commit": "93a1f19af39ad58e6203ff307a61d0153e0f3e43",
      "tree": "48a139b59b811edd15e1f8c33bdb02ac205e5169",
      "parents": [
        "279456fd83a63b009babfacb6b0d10156a494f44"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Tue Sep 16 16:18:15 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:20:29 2025"
      },
      "message": "Refactor eval_prompts.py main() and add test coverage\n\nRefactors the main() function of eval_prompts.py to call several helpers\nthat run related code instead of having everything inline. Adds unittest\ncoverage of the new functions.\n\nThis should bring the test coverage for //agents/testing/eval_prompts.py\nup to ~100% except for the argument parsing-related code.\n\nBug: 444294617\nChange-Id: Id74da75c5fe561077a2831f84cabdd6f6778b5b6\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6950389\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1516091}\nGitOrigin-RevId: 5fac519d198ee86ce9b754342b172e01c0ea3128\n"
    },
    {
      "commit": "279456fd83a63b009babfacb6b0d10156a494f44",
      "tree": "8da4568bbda02dfc3fdab82c663689078b5e4b34",
      "parents": [
        "27315738fd70a9203ee98cdfa7f189cf5e8f3461"
      ],
      "author": {
        "name": "Struan Shrimpton",
        "email": "sshrimp@google.com",
        "time": "Mon Sep 15 23:50:29 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:20:09 2025"
      },
      "message": "Fix build_information tests for multiple platforms\n\nbuild_information can\u0027t be hard coded if this test is to be run on\nmultiple platforms. This modifies the tests to check that the value\nreturned by the MCP tool is contained in the response from the agent.\nThis still does not guarantee that the tool is called, however, and\nthe platform test was still passing without the extension installed.\nThat will need to be fixed eventually\n\nBug: 443097039\nChange-Id: Ie1c67cdf9fa6c3ba85c5464593fefc002ff0c677\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6950386\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1515766}\nGitOrigin-RevId: dcaaeecf19d2e882591c00a1430a8bfe95aa6e8b\n"
    },
    {
      "commit": "27315738fd70a9203ee98cdfa7f189cf5e8f3461",
      "tree": "d45a45740234513ced36f71bb087e39f3b606850",
      "parents": [
        "ca6072921e7999b90d0c608ae28c1f5a94645b61"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Mon Sep 15 21:36:41 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:19:47 2025"
      },
      "message": "Add //agents/testing helper function unittests\n\nAdds unittests for the remaining uncovered helper functions in\n//agents/testing/eval_prompts.py.\n\nBug: 444294617\nChange-Id: I9c1dda283eb248f3cfa291f17d902f749d60b80e\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6951772\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCommit-Queue: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCr-Commit-Position: refs/heads/main@{#1515719}\nGitOrigin-RevId: b584bb8eb1ad052ff789d0695c55d25f43d2bf50\n"
    },
    {
      "commit": "ca6072921e7999b90d0c608ae28c1f5a94645b61",
      "tree": "3441ad01b1e93239a5424dcdc3017d4fea028576",
      "parents": [
        "d5ab98773ea7cb28aa1d0b71fbe644e1ff232d31"
      ],
      "author": {
        "name": "Brian Sheedy",
        "email": "bsheedy@chromium.org",
        "time": "Mon Sep 15 21:19:35 2025"
      },
      "committer": {
        "name": "Copybara-Service",
        "email": "copybara-worker@google.com",
        "time": "Tue Nov 04 22:19:26 2025"
      },
      "message": "Add WorkDir unittests\n\nAdds unittests to //agents/testing/eval_prompts_unittest.py related to\nthe WorkDir class.\n\nBug: 444294617\nChange-Id: Icbdb6919af25594ed4403bc0d80a290b882a26c9\nReviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6951636\nAuto-Submit: Brian Sheedy \u003cbsheedy@chromium.org\u003e\nCommit-Queue: Struan Shrimpton \u003csshrimp@google.com\u003e\nReviewed-by: Struan Shrimpton \u003csshrimp@google.com\u003e\nCr-Commit-Position: refs/heads/main@{#1515713}\nGitOrigin-RevId: 966a9dfd780e909a61fd4584350d8d1fcd032cd9\n"
    }
  ],
  "next": "d5ab98773ea7cb28aa1d0b71fbe644e1ff232d31"
}
