Methodology

How Chiark measures agent quality.

How Chiark Is Different

Agent registries list agents. Chiark tests them.

What	Registries	Chiark
Discovery	List agents from one source	Crawl 9 registries, deduplicate
Health	Trust the Agent Card	Probe every 30 min, 3 tiers
Scoring	None or popularity-based	0-100 operational score, transparent
Routing	Not supported	Constraint filters (uptime, latency, score)
Protocols	One protocol per registry	A2A + MCP in one index

Operational Score

Every agent receives an Operational Score computed from three tiers, each measuring a different aspect of reliability. Maximum score: 100 points.

Scoring Weights

Tier	Component	Max Points	Weight
Tier 1	Availability	30	30%
Tier 2	Conformance	30	30%
Tier 3	Performance	40	40%

Tier Definitions

Tier 1: Availability (0-30 pts)

Based on 30-day uptime percentage. An agent that responds to health probes consistently scores higher. Measured via periodic HTTP probes every 30 minutes.

Tier 2: Conformance (0-30 pts)

Does the agent's runtime behavior match its declared Agent Card? We validate the card schema, check declared skills and capabilities, and verify response formats. Full conformance = full points.

Tier 3: Performance (0-40 pts)

Response time benchmarking. Scored on P95 latency — lower is better. Agents that respond quickly and consistently under load earn more points. Only available for agents that allow unauthenticated task execution.

Auth-Gated Agents

Agents that require authentication for task execution can only be scored on Tier 1 (Availability) and partial Tier 2 (Conformance — card validation only). Performance testing is not possible without auth credentials.

Maximum possible score: 45/100

Auth-gated agents are marked with a lock icon on the leaderboard and show their score as X/45 instead of X/100.

Probe Frequency

Each tracked agent is probed every 30 minutes. Probes check availability (HTTP health), card conformance (schema validation), and performance (latency measurement). Results are aggregated over a 30-day rolling window.

Data Sources

Agents are discovered from 9 registries, crawled every 24 hours. Agents appearing in multiple registries are deduplicated by endpoint URL.

Registry	Protocol	Method
a2aregistry.org	A2A	Paginated REST API
MCP Registry	MCP	Cursor-based pagination
Smithery	MCP	Paginated REST API
Solana Agent Registry	A2A / MCP	ERC-8004 GraphQL
awesome-a2a	A2A	GitHub README URL extraction
GitHub Topics	A2A	topic:a2a-protocol search
Well-Known Endpoints	A2A	/.well-known/agent.json probing
PulseMCP	MCP	Directory API (when API key configured)
Alternative Registries	A2A	Secondary sources

Cross-Protocol Support

Chiark indexes both A2A (Agent-to-Agent) and MCP (Model Context Protocol) agents using the same three-tier scoring pipeline.

A2AAgent Card validation, JSON-RPC conformance probing, skill-specific task benchmarks

MCPInitialize handshake validation, tools/list probing, ping + tool invocation benchmarks

x402 Payment Detection

During Tier 1 probing, agents that return HTTP 402 responses are analyzed for x402 payment metadata. Payment headers and body are parsed to extract pricing, network, token, and receiver information. Payment-enabled agents are flagged on the leaderboard and filterable via the API.

Constraint-Based Routing

The API supports quality-constrained queries for agent routing decisions. Agents can be filtered by:

*min_score — minimum operational score (0-100)
*min_uptime — minimum 30-day uptime (e.g. 0.99 = 99%)
*max_latency_ms — maximum P95 response time
*auth_required — filter by authentication requirement
*payment_enabled — filter by x402 payment support

Real-time agent status is available at /api/v1/agents/{id}/status. Structured capabilities at /api/v1/agents/{id}/capabilities.

MCP Server

Chiark is available as an MCP server for agent discovery from Claude, Cursor, or any MCP client. Use it to find reliable agents, check real-time status, and report routing outcomes.

Hosted endpoint: https://chiark.ai/mcp/(no install needed)

Install locally: pip install chiark-mcp(PyPI / GitHub)

Note: The Operational Score measures reliability, not task quality. A high score means the agent is reachable, conforms to its spec, and responds quickly — it does not mean the agent produces good results.