Sign in
chromium
/
germanium
/
HEAD
0b5fbfb
Specify model for prompt eval tests
by Brian Sheedy
· 2 days ago
main
500ecda
Update prompt eval dependencies
by Brian Sheedy
· 2 days ago
ca859ac
Populate fuzzing skill
by Jon Toohill
· 3 weeks ago
60caadf
Add --list-tests to eval_prompts
by Sven Zheng
· 3 weeks ago
a7ff07f
Add support for extra test paths in eval_prompts.py
by Ashwin Verleker
· 9 weeks ago
65cda24
Add a new evaluation for class refactoring.
by Jie Sheng
· 3 months ago
5d780c0
Set system prompts correctly
by Struan Shrimpton
· 3 months ago
f3e3b91
Revert "Update promptfoo cipd tags"
by Struan Shrimpton
· 3 months ago
2609ee7
Update promptfoo cipd tags
by Struan Shrimpton
· 3 months ago
607be8b
Include the prompt/response in the rdb test result
by Struan Shrimpton
· 3 months ago
e55a048
Support tool call checks in gemini cli eval framework.
by Jie Sheng
· 3 months ago
46d60b6
Make owners explicit
by Struan Shrimpton
· 3 months ago
776b148
Improve the test log output in gemini cli eval framework.
by Jie Sheng
· 3 months ago
3d5bab8
Include the test tags in the rdb tags
by Struan Shrimpton
· 3 months ago
81c4be3
Update prompt eval metric reporting
by Brian Sheedy
· 4 months ago
4fabd70
Create a new gemini cli eval to build a file.
by Jie Sheng
· 4 months ago
e856061
Tag rdb results with their metrics
by Struan Shrimpton
· 4 months ago
f343053
Update run_tests_in_file
by Struan Shrimpton
· 4 months ago
0c04b80
Clean workdirs regardless of forced
by Struan Shrimpton
· 4 months ago
a135030
[agents][eval] Store system prompt in `//GEMINI.md`
by Jonathan Lee
· 4 months ago
17cd0aa
Create a new gemini cli eval to verify run tests in file.
by Jie Sheng
· 4 months ago
c092057
Move perf uploading to handler
by Brian Sheedy
· 4 months ago
2930733
eval: Add negative tag filtering for tests
by James Woo
· 4 months ago
65da9b6
Move ResultDB reporting to handler
by Brian Sheedy
· 4 months ago
2389503
Support user-provided result handlers
by Brian Sheedy
· 4 months ago
ba2d83a
Separate reporting for each iteration
by Struan Shrimpton
· 4 months ago
04cbefc
eval: Add test filtering on metadata
by James Woo
· 4 months ago
d2778a5
Adjust prompt eval perf upload location
by Brian Sheedy
· 4 months ago
3a58f91
Reland "Eval Prompts: Add custom node support and cipd option for gcli"
by Struan Shrimpton
· 4 months ago
e634b2f
Revert "Eval Prompts: Add custom node support and cipd option for gcli"
by Struan Shrimpton
· 4 months ago
4a0111a
Fix perf uploading location
by Brian Sheedy
· 4 months ago
91b5c41
Eval Prompts: Add custom node support and cipd option for gcli
by Struan Shrimpton
· 4 months ago
0e4368b
Adjust perf upload format
by Brian Sheedy
· 4 months ago
c401587
Add prompt eval metric uploading
by Brian Sheedy
· 4 months ago
228eed9
Prompt Evals: specify precompile_targets on the test
by Struan Shrimpton
· 4 months ago
934b4d7
Eval Prompts: Remove promptfoo npm/src
by Struan Shrimpton
· 4 months ago
18d80c2
Prompt Eval: Improve test reliability
by Struan Shrimpton
· 4 months ago
3641d75
Eval Prompts: Improve readability of test results
by Struan Shrimpton
· 4 months ago
bbec6df
Refactor test/iteration results
by Brian Sheedy
· 4 months ago
a09ab25
Add pass@k support to prompt evals
by James Woo
· 4 months ago
d2c74f5
Adjust args for perf dashboard uploading
by Brian Sheedy
· 4 months ago
2fa6db2
Prevent unit tests from pulling cipd packages
by Struan Shrimpton
· 4 months ago
498a5bd
Eval prompts: fix default gemini bin for 1P
by Struan Shrimpton
· 4 months ago
4dbddcb
Eval Prompts: Add trusted folders for temp HOME dir
by Struan Shrimpton
· 4 months ago
02bdd58
Add and use by default cipd promptfoo
by Struan Shrimpton
· 4 months ago
356a0f3
Use settings.json for gemini-cli telemetry
by Brian Sheedy
· 4 months ago
8b4da35
Extract prompt eval scores
by Brian Sheedy
· 4 months ago
4220cc7
Update //agents/testing/README
by Struan Shrimpton
· 4 months ago
eb5f995
Surface gemini-cli token usage
by Brian Sheedy
· 4 months ago
f37d63c
Update eval_prompts for structured test ids
by Struan Shrimpton
· 4 months ago
f9cc421
agents: Add pass@k configuration parsing
by James Woo
· 4 months ago
183dad9
Fix gemini_provider console width parsing
by Brian Sheedy
· 4 months ago
cc91996
Refactor gemini_provider's call_api
by Brian Sheedy
· 4 months ago
fad9ca4
Docs: Improve btrfs setup instructions
by James Woo
· 4 months ago
52c65dc
agents: Centralize Gemini helper functions
by James Woo
· 4 months ago
9a59afd
feat(eval): Set sandbox PATH from container image
by James Woo
· 4 months ago
e5537eb
Add an unrestricted parallel option
by Struan Shrimpton
· 4 months ago
069ef68
[agents][eval] Introduce reusable `check_gtests.py` assert
by Jonathan Lee
· 4 months ago
935f99d
[agents][eval] Enable `use_remoteexec` to speed up builds
by Jonathan Lee
· 4 months ago
f7162c1
feat(eval): Mount depot_tools in sandbox
by James Woo
· 4 months ago
583c967
Add missing prompt eval arg
by Brian Sheedy
· 4 months ago
cb6d335
refactor(eval): Directly fetch sandbox image
by James Woo
· 4 months ago
458560c
Add remaining isolated script args + validation
by Brian Sheedy
· 4 months ago
e692bd3
Add prompt eval --isolated-script-test-repeat
by Brian Sheedy
· 4 months ago
850bc85
feat(eval): Use temp home for test environments
by James Woo
· 4 months ago
45403f3
Update prompt eval test filtering
by Brian Sheedy
· 4 months ago
858723a
Rename example and test extensions with underscores
by Struan Shrimpton
· 4 months ago
76e79dc
Refactor prompt eval argument parsing
by Brian Sheedy
· 4 months ago
ad78f25
Update build_information MCP to build-information
by Struan Shrimpton
· 4 months ago
f91b762
Add OWNERS to extensions and testing
by Struan Shrimpton
· 4 months ago
74e36b1
Add missing //agents/testing tests
by Brian Sheedy
· 4 months ago
cd484d4
feat(eval): Add support for local dev binaries
by James Woo
· 4 months ago
8e5c311
Reland "Add presubmit checks for `promptfoo.yaml` files."
by Jiamei Liu
· 4 months ago
8f6a2b5
[6/6] Parallel worker cleanup
by Brian Sheedy
· 4 months ago
3ced775
[5/6?] Support multiple parallel workers
by Brian Sheedy
· 4 months ago
6cba49a
feat(testing): Add test retries for flaky tests
by James Woo
· 4 months ago
68b0b0d
[4/?] Move promptfoo installation code to new file
by Brian Sheedy
· 4 months ago
3572449
[3/?] Move WorkDir to new file
by Brian Sheedy
· 4 months ago
124fe1b
feat(agents): Add flag for including test extensions
by James Woo
· 4 months ago
ecd6686
fix(eval): Fix input prompt when fetching sandbox
by James Woo
· 4 months ago
a352339
[2/?] Move result reporting to separate thread
by Brian Sheedy
· 4 months ago
0ba62c1
[1/?] Move agents result-related code
by Brian Sheedy
· 4 months ago
2415af3
Revert "Add presubmit checks for `promptfoo.yaml` files."
by Wenbo Jie
· 4 months ago
82d4265
Add presubmit checks for `promptfoo.yaml` files.
by Jiamei Liu
· 4 months ago
5f8c6d4
Consolidate RunPromptEvalTestsUnittest mocking
by Brian Sheedy
· 4 months ago
fd967e2
Add prompt eval ResultDB integration
by Brian Sheedy
· 4 months ago
8fb4620
feat(eval): Add test_landmines extension
by James Woo
· 4 months ago
9b5374f
feat(testing): Enable sandboxed prompt evaluations
by James Woo
· 4 months ago
93a1f19
Refactor eval_prompts.py main() and add test coverage
by Brian Sheedy
· 4 months ago
279456f
Fix build_information tests for multiple platforms
by Struan Shrimpton
· 4 months ago
2731573
Add //agents/testing helper function unittests
by Brian Sheedy
· 4 months ago
ca60729
Add WorkDir unittests
by Brian Sheedy
· 4 months ago
d5ab987
Add promptfoo installation unittests
by Brian Sheedy
· 4 months ago
676d4e3
feat(testing): Add custom promptfoo.yaml options
by James Woo
· 4 months ago
8e5d0d5
Add prompt eval unittests
by Brian Sheedy
· 4 months ago
6f76b5b
Print provider output to console
by Struan Shrimpton
· 4 months ago
5a3dba7
Build out/Default in eval_prompts
by Struan Shrimpton
· 4 months ago
9253315
Add //agents pylint coverage
by Brian Sheedy
· 4 months ago
1c2a112
Add sharding support for prompt evaluation
by Brian Sheedy
· 4 months ago
dbcd0d4
Add a source check to the eval prompts
by Struan Shrimpton
· 4 months ago
Next »