Portable evaluations
for AI agents.
ECP is a vendor-neutral protocol for testing agent outputs, tool calls, and evaluator-visible audit context — across frameworks, models, eval platforms, and CI systems.
$ pip install "ecp-runtime==0.3.1" "ecp-sdk==0.3.1"
$ ecp init
$ ecp run --manifest ecp_eval/manifest.yaml --json
# 3 scenarios · 7 graders · all passed ✓MCP is for tools. ECP is for evals.
MCP gives agents a common way to use tools. ECP gives evaluators a common way to inspect what an agent returned, what tools it used, and what audit evidence it exposed — independent of the framework that built the agent or the platform that runs the test.
Beyond the final answer.
Most evals start with the final answer. ECP also checks the behavior behind it.
public_outputtool_callsevaluation_contextecp run --manifestRuns anywhere
Run evals locally or wire ecp run into your CI. Exits non-zero on failure, so a regression breaks the build.
Framework neutral
Wrap agents built with plain Python, LangChain, LlamaIndex, CrewAI, or PydanticAI behind one evaluation contract.
JSON-RPC contract
Implement the protocol in any language: agent/initialize, agent/step, agent/reset over stdio or Streamable HTTP.
Your existing agent stack.
Start grading agents in five commands.
Install the runtime, initialize a starter manifest, and run your first eval.