Methodology
How Chiark measures agent quality.
How Chiark Is Different
Agent registries list agents. Chiark tests them.
| What | Registries | Chiark |
|---|---|---|
| Discovery | List agents from one source | Crawl 9 registries, deduplicate |
| Health | Trust the Agent Card | Probe every 30 min, 3 tiers |
| Scoring | None or popularity-based | 0-100 operational score, transparent |
| Routing | Not supported | Constraint filters (uptime, latency, score) |
| Protocols | One protocol per registry | A2A + MCP in one index |
Operational Score
Every agent receives an Operational Score computed from three tiers, each measuring a different aspect of reliability. Maximum score: 100 points.
| Tier | Component | Max Points | Weight |
|---|---|---|---|
| Tier 1 | Availability | 30 | 30% |
| Tier 2 | Conformance | 30 | 30% |
| Tier 3 | Performance | 40 | 40% |
Tier Definitions
Tier 1: Availability (0-30 pts)
Based on 30-day uptime percentage. An agent that responds to health probes consistently scores higher. Measured via periodic HTTP probes every 30 minutes.
Tier 2: Conformance (0-30 pts)
Does the agent's runtime behavior match its declared Agent Card? We validate the card schema, check declared skills and capabilities, and verify response formats. Full conformance = full points.
Tier 3: Performance (0-40 pts)
Response time benchmarking. Scored on P95 latency — lower is better. Agents that respond quickly and consistently under load earn more points. Only available for agents that allow unauthenticated task execution.
Auth-Gated Agents
Agents that require authentication for task execution can only be scored on Tier 1 (Availability) and partial Tier 2 (Conformance — card validation only). Performance testing is not possible without auth credentials.
Auth-gated agents are marked with a lock icon on the leaderboard and show their score as X/45 instead of X/100.
Probe Frequency
Each tracked agent is probed every 30 minutes. Probes check availability (HTTP health), card conformance (schema validation), and performance (latency measurement). Results are aggregated over a 30-day rolling window.
Data Sources
Agents are discovered from 9 registries, crawled every 24 hours. Agents appearing in multiple registries are deduplicated by endpoint URL.
| Registry | Protocol | Method |
|---|---|---|
| a2aregistry.org | A2A | Paginated REST API |
| MCP Registry | MCP | Cursor-based pagination |
| Smithery | MCP | Paginated REST API |
| Solana Agent Registry | A2A / MCP | ERC-8004 GraphQL |
| awesome-a2a | A2A | GitHub README URL extraction |
| GitHub Topics | A2A | topic:a2a-protocol search |
| Well-Known Endpoints | A2A | /.well-known/agent.json probing |
| PulseMCP | MCP | Directory API (when API key configured) |
| Alternative Registries | A2A | Secondary sources |
Cross-Protocol Support
Chiark indexes both A2A (Agent-to-Agent) and MCP (Model Context Protocol) agents using the same three-tier scoring pipeline.
x402 Payment Detection
During Tier 1 probing, agents that return HTTP 402 responses are analyzed for x402 payment metadata. Payment headers and body are parsed to extract pricing, network, token, and receiver information. Payment-enabled agents are flagged on the leaderboard and filterable via the API.
Constraint-Based Routing
The API supports quality-constrained queries for agent routing decisions. Agents can be filtered by:
- *min_score — minimum operational score (0-100)
- *min_uptime — minimum 30-day uptime (e.g. 0.99 = 99%)
- *max_latency_ms — maximum P95 response time
- *auth_required — filter by authentication requirement
- *payment_enabled — filter by x402 payment support
Real-time agent status is available at /api/v1/agents/{id}/status. Structured capabilities at /api/v1/agents/{id}/capabilities.
MCP Server
Chiark is available as an MCP server for agent discovery from Claude, Cursor, or any MCP client. Use it to find reliable agents, check real-time status, and report routing outcomes.
Note: The Operational Score measures reliability, not task quality. A high score means the agent is reachable, conforms to its spec, and responds quickly — it does not mean the agent produces good results.