1. a7ff07f Add support for extra test paths in eval_prompts.py by Ashwin Verleker · 13 days ago main
  2. 65cda24 Add a new evaluation for class refactoring. by Jie Sheng · 5 weeks ago
  3. 5d780c0 Set system prompts correctly by Struan Shrimpton · 5 weeks ago
  4. f3e3b91 Revert "Update promptfoo cipd tags" by Struan Shrimpton · 6 weeks ago
  5. 2609ee7 Update promptfoo cipd tags by Struan Shrimpton · 6 weeks ago
  6. 607be8b Include the prompt/response in the rdb test result by Struan Shrimpton · 7 weeks ago
  7. e55a048 Support tool call checks in gemini cli eval framework. by Jie Sheng · 7 weeks ago
  8. 46d60b6 Make owners explicit by Struan Shrimpton · 7 weeks ago
  9. 776b148 Improve the test log output in gemini cli eval framework. by Jie Sheng · 7 weeks ago
  10. 3d5bab8 Include the test tags in the rdb tags by Struan Shrimpton · 7 weeks ago
  11. 81c4be3 Update prompt eval metric reporting by Brian Sheedy · 8 weeks ago
  12. 4fabd70 Create a new gemini cli eval to build a file. by Jie Sheng · 8 weeks ago
  13. e856061 Tag rdb results with their metrics by Struan Shrimpton · 8 weeks ago
  14. f343053 Update run_tests_in_file by Struan Shrimpton · 8 weeks ago
  15. 0c04b80 Clean workdirs regardless of forced by Struan Shrimpton · 8 weeks ago
  16. a135030 [agents][eval] Store system prompt in `//GEMINI.md` by Jonathan Lee · 8 weeks ago
  17. 17cd0aa Create a new gemini cli eval to verify run tests in file. by Jie Sheng · 8 weeks ago
  18. c092057 Move perf uploading to handler by Brian Sheedy · 8 weeks ago
  19. 2930733 eval: Add negative tag filtering for tests by James Woo · 8 weeks ago
  20. 65da9b6 Move ResultDB reporting to handler by Brian Sheedy · 8 weeks ago
  21. 2389503 Support user-provided result handlers by Brian Sheedy · 8 weeks ago
  22. ba2d83a Separate reporting for each iteration by Struan Shrimpton · 8 weeks ago
  23. 04cbefc eval: Add test filtering on metadata by James Woo · 8 weeks ago
  24. d2778a5 Adjust prompt eval perf upload location by Brian Sheedy · 8 weeks ago
  25. 3a58f91 Reland "Eval Prompts: Add custom node support and cipd option for gcli" by Struan Shrimpton · 8 weeks ago
  26. e634b2f Revert "Eval Prompts: Add custom node support and cipd option for gcli" by Struan Shrimpton · 8 weeks ago
  27. 4a0111a Fix perf uploading location by Brian Sheedy · 8 weeks ago
  28. 91b5c41 Eval Prompts: Add custom node support and cipd option for gcli by Struan Shrimpton · 8 weeks ago
  29. 0e4368b Adjust perf upload format by Brian Sheedy · 8 weeks ago
  30. c401587 Add prompt eval metric uploading by Brian Sheedy · 8 weeks ago
  31. 228eed9 Prompt Evals: specify precompile_targets on the test by Struan Shrimpton · 8 weeks ago
  32. 934b4d7 Eval Prompts: Remove promptfoo npm/src by Struan Shrimpton · 8 weeks ago
  33. 18d80c2 Prompt Eval: Improve test reliability by Struan Shrimpton · 8 weeks ago
  34. 3641d75 Eval Prompts: Improve readability of test results by Struan Shrimpton · 8 weeks ago
  35. bbec6df Refactor test/iteration results by Brian Sheedy · 8 weeks ago
  36. a09ab25 Add pass@k support to prompt evals by James Woo · 8 weeks ago
  37. d2c74f5 Adjust args for perf dashboard uploading by Brian Sheedy · 8 weeks ago
  38. 2fa6db2 Prevent unit tests from pulling cipd packages by Struan Shrimpton · 8 weeks ago
  39. 498a5bd Eval prompts: fix default gemini bin for 1P by Struan Shrimpton · 8 weeks ago
  40. 4dbddcb Eval Prompts: Add trusted folders for temp HOME dir by Struan Shrimpton · 8 weeks ago
  41. 02bdd58 Add and use by default cipd promptfoo by Struan Shrimpton · 8 weeks ago
  42. 356a0f3 Use settings.json for gemini-cli telemetry by Brian Sheedy · 8 weeks ago
  43. 8b4da35 Extract prompt eval scores by Brian Sheedy · 8 weeks ago
  44. 4220cc7 Update //agents/testing/README by Struan Shrimpton · 8 weeks ago
  45. eb5f995 Surface gemini-cli token usage by Brian Sheedy · 8 weeks ago
  46. f37d63c Update eval_prompts for structured test ids by Struan Shrimpton · 8 weeks ago
  47. f9cc421 agents: Add pass@k configuration parsing by James Woo · 8 weeks ago
  48. 183dad9 Fix gemini_provider console width parsing by Brian Sheedy · 8 weeks ago
  49. cc91996 Refactor gemini_provider's call_api by Brian Sheedy · 8 weeks ago
  50. fad9ca4 Docs: Improve btrfs setup instructions by James Woo · 8 weeks ago
  51. 52c65dc agents: Centralize Gemini helper functions by James Woo · 8 weeks ago
  52. 9a59afd feat(eval): Set sandbox PATH from container image by James Woo · 8 weeks ago
  53. e5537eb Add an unrestricted parallel option by Struan Shrimpton · 8 weeks ago
  54. 069ef68 [agents][eval] Introduce reusable `check_gtests.py` assert by Jonathan Lee · 8 weeks ago
  55. 935f99d [agents][eval] Enable `use_remoteexec` to speed up builds by Jonathan Lee · 8 weeks ago
  56. f7162c1 feat(eval): Mount depot_tools in sandbox by James Woo · 8 weeks ago
  57. 583c967 Add missing prompt eval arg by Brian Sheedy · 8 weeks ago
  58. cb6d335 refactor(eval): Directly fetch sandbox image by James Woo · 8 weeks ago
  59. 458560c Add remaining isolated script args + validation by Brian Sheedy · 8 weeks ago
  60. e692bd3 Add prompt eval --isolated-script-test-repeat by Brian Sheedy · 8 weeks ago
  61. 850bc85 feat(eval): Use temp home for test environments by James Woo · 8 weeks ago
  62. 45403f3 Update prompt eval test filtering by Brian Sheedy · 8 weeks ago
  63. 858723a Rename example and test extensions with underscores by Struan Shrimpton · 8 weeks ago
  64. 76e79dc Refactor prompt eval argument parsing by Brian Sheedy · 8 weeks ago
  65. ad78f25 Update build_information MCP to build-information by Struan Shrimpton · 8 weeks ago
  66. f91b762 Add OWNERS to extensions and testing by Struan Shrimpton · 8 weeks ago
  67. 74e36b1 Add missing //agents/testing tests by Brian Sheedy · 8 weeks ago
  68. cd484d4 feat(eval): Add support for local dev binaries by James Woo · 8 weeks ago
  69. 8e5c311 Reland "Add presubmit checks for `promptfoo.yaml` files." by Jiamei Liu · 8 weeks ago
  70. 8f6a2b5 [6/6] Parallel worker cleanup by Brian Sheedy · 8 weeks ago
  71. 3ced775 [5/6?] Support multiple parallel workers by Brian Sheedy · 8 weeks ago
  72. 6cba49a feat(testing): Add test retries for flaky tests by James Woo · 8 weeks ago
  73. 68b0b0d [4/?] Move promptfoo installation code to new file by Brian Sheedy · 8 weeks ago
  74. 3572449 [3/?] Move WorkDir to new file by Brian Sheedy · 8 weeks ago
  75. 124fe1b feat(agents): Add flag for including test extensions by James Woo · 8 weeks ago
  76. ecd6686 fix(eval): Fix input prompt when fetching sandbox by James Woo · 8 weeks ago
  77. a352339 [2/?] Move result reporting to separate thread by Brian Sheedy · 8 weeks ago
  78. 0ba62c1 [1/?] Move agents result-related code by Brian Sheedy · 8 weeks ago
  79. 2415af3 Revert "Add presubmit checks for `promptfoo.yaml` files." by Wenbo Jie · 8 weeks ago
  80. 82d4265 Add presubmit checks for `promptfoo.yaml` files. by Jiamei Liu · 8 weeks ago
  81. 5f8c6d4 Consolidate RunPromptEvalTestsUnittest mocking by Brian Sheedy · 8 weeks ago
  82. fd967e2 Add prompt eval ResultDB integration by Brian Sheedy · 8 weeks ago
  83. 8fb4620 feat(eval): Add test_landmines extension by James Woo · 8 weeks ago
  84. 9b5374f feat(testing): Enable sandboxed prompt evaluations by James Woo · 8 weeks ago
  85. 93a1f19 Refactor eval_prompts.py main() and add test coverage by Brian Sheedy · 8 weeks ago
  86. 279456f Fix build_information tests for multiple platforms by Struan Shrimpton · 8 weeks ago
  87. 2731573 Add //agents/testing helper function unittests by Brian Sheedy · 8 weeks ago
  88. ca60729 Add WorkDir unittests by Brian Sheedy · 8 weeks ago
  89. d5ab987 Add promptfoo installation unittests by Brian Sheedy · 8 weeks ago
  90. 676d4e3 feat(testing): Add custom promptfoo.yaml options by James Woo · 8 weeks ago
  91. 8e5d0d5 Add prompt eval unittests by Brian Sheedy · 8 weeks ago
  92. 6f76b5b Print provider output to console by Struan Shrimpton · 8 weeks ago
  93. 5a3dba7 Build out/Default in eval_prompts by Struan Shrimpton · 8 weeks ago
  94. 9253315 Add //agents pylint coverage by Brian Sheedy · 8 weeks ago
  95. 1c2a112 Add sharding support for prompt evaluation by Brian Sheedy · 8 weeks ago
  96. dbcd0d4 Add a source check to the eval prompts by Struan Shrimpton · 8 weeks ago
  97. 4867887 Improve eval_prompt workdirs by Struan Shrimpton · 8 weeks ago
  98. 36630d7 Automate prompt eval test discovery by Brian Sheedy · 8 weeks ago
  99. 8ea3f96 Check for stale workdirs and add --force flag by Struan Shrimpton · 8 weeks ago
  100. e6475a2 Fix cwd for promptfoo from source by Struan Shrimpton · 8 weeks ago