v0.5.0 · JSON-RPC 2.0

Portable evaluations
for AI agents.

ECP is a vendor-neutral protocol for testing agent outputs, tool calls, and evaluator-visible audit context — across frameworks, models, eval platforms, and CI systems.

Get started Read the spec GitHub

terminal

$ pip install "ecp-runtime==0.5.0" "ecp-sdk==0.5.0"
$ ecp init
$ ecp run --manifest ecp_eval/manifest.yaml --json

# 3 scenarios · 7 graders · all passed ✓

The evaluation contract layer

MCP is for tools. ECP is for evals.

MCP gives agents a common way to use tools. ECP gives evaluators a common way to inspect what an agent returned, what tools it used, and what audit evidence it exposed — independent of the framework that built the agent or the platform that runs the test.

What ECP checks

Beyond the final answer.

Most evals start with the final answer. ECP also checks the behavior behind it.

public_output

Did the user-visible answer satisfy the task?

tool_calls

Did the agent call the required tool with the right arguments?

evaluation_context

Did the agent expose evaluator-safe audit evidence?

ecp run --manifest

Can this run in CI and fail a build?

Runs anywhere

Run evals locally or wire ecp run into your CI. Exits non-zero on failure, so a regression breaks the build.

Framework neutral

Wrap agents built with plain Python, LangChain, LlamaIndex, CrewAI, or PydanticAI behind one evaluation contract.

JSON-RPC contract

Implement the protocol in any language: agent/initialize, agent/step, agent/reset over stdio or Streamable HTTP.

Works with

Your existing agent stack.

Plain PythonLangChainLlamaIndexCrewAIPydanticAIStreamable HTTP

Start grading agents in five commands.

Install the runtime, initialize a starter manifest, and run your first eval.

Quickstart Read the docs

Portable evaluationsfor AI agents.