Leaderboard/codeitall
MCP ServerScored via MCP protocol probing: initialize handshake, tools/list conformance, and ping + tool invocation performance.

codeitall

Behavioral oracle for Rust crate APIs: runtime behavior, yank/advisory, deprecation, sig search

89/100
Operational Score
Score Breakdown
Availability30/30
Conformance30/30
Performance29/40
Key Metrics
Uptime 30d
100.0%
P95 Latency
300.1ms
Conformance
Pass
Trend
What's Being Tested
Availability
HTTP health check to the service endpoint
Responded with HTTP 405 in 121ms
Conformance
MCP initialize handshake + tools/list
Valid MCP server info returned, tools/list responded
Performance
MCP ping + zero-arg tool invocation benchmarking
P95 latency: 300ms, task completion: 100%
Skills
answer_api_question

Use this when an agent asks open-ended Rust API questions like 'how do I parse a duration string', 'which crate gives me HMAC verification', 'is this function deprecated', 'what's the current way to do JWT auth', or 'compare base64 crates'. Hand off free-text questions verbatim; the substrate routes them deterministically via rule-based intent detection (no LLM in the request path) and dispatches into signature_search, behavior_lookup, compare_implementations, or find_modern_equivalent as appropriate, falling back to bge-m3 cosine retrieval if no rule matches. Returns a structured verdict with routed_via, the primary recommendation, evidence, and caveats.

signature_search

Use when the agent already knows the signature shape it wants — e.g. `fn(&str) -> Result<Vec<u8>, _>`, `fn(&[u8]) -> Result<String, std::str::Utf8Error>`, or `impl Iterator<Item=Result<...>>`. Returns ranked candidate functions with the crate they live in, downloads, yank status, and a compact behavior summary if probed. Pass the exact signature verbatim; the substrate normalises whitespace + identifiers on its end.

behavior_lookup

Use when the agent has a specific (crate, fn_name) pair and wants to know what inputs it actually accepts at runtime — e.g. `(crate='jiff', fn_name='Timestamp::from_str')` for methods, `(crate='ascii85', fn_name='decode')` for free functions. `fn_name` accepts BOTH qualified (`Type::method` / `module::fn`) and bare (`method` / `fn`) forms — the matcher tries the exact input first, then the alternate form; the `matched_fn_name` response field records the substitution when one happened. Returns the probe observation table verbatim from the substrate: each row is (input, outcome=ok|err|panic, value or error variant). Pass an optional `inputs` array to filter to specific input strings. On a zero-hit the response carries a `diagnostics` block (`received_crate`, `received_fn_name`, `closest_crates`, `closest_fns_in_crate`, `hint`) so the agent can self-correct without a dead-end round-trip. The substrate's discrimination findings live here — runtime behaviour the docs are silent or wrong about.

compare_implementations

Use when the agent asks about a task category — e.g. 'how do I parse JSON in rust', 'compare base64 crates', 'which datetime library handles RFC 3339 timezones right' — and wants the cross-implementation behavior table. The substrate returns side-by-side observations on the canonical input set: for each implementation (crate, fn_name), each input in the family's input set is paired with the observed (outcome, value_or_error_variant). Optional `crates` / `fns` arrays restrict the returned set; optional `summary=true` replaces per-input `observations` with an `n_observations` count for index-only listings (bounds response size by family-member count, not member × input count). Optional `subfamily` narrows to a registered sub-tag (e.g. `task='base64', subfamily='base64'` returns only canonical base64 crates, not ascii85 / base58 / hex / …) — call `list_families` to see available tags. Non-core family members (per a small hand-authored allowlist) carry an advisory `caveat` field warning that the entry was probed on shared inputs that may not reflect its typical usage. On a zero-hit family (`n_attempted=0`) the response carries a `diagnostics` block (`received_family`, `available_families`, `closest_families`, `hint`) so the agent can recover without guessing. Discrimination signal lives here — the docs-silent runtime behaviour pattern Guiding Principle #8 names.

find_modern_equivalent

Use when the agent suspects the code it's about to write uses a deprecated API — e.g. `tokio::Runtime::new()` (now `tokio::runtime::Runtime::new()`), `chrono::DateTime::from_str` (now feature-gated), or any path that points to a yanked or RustSec-advisory'd crate version. The substrate returns the current canonical alternative with a behavior summary if probed, and surfaces yank-status / advisory rows verbatim. For TypeScript/Next.js pass language='ts' with a deprecated Next.js API — e.g. `next/router` (now `next/navigation`), `getServerSideProps` (now an async Server Component), `@next/font` (now `next/font`), `next/legacy/image` (now `next/image`) — and the substrate returns the modern equivalent with the official upgrade-guide citation and how prevalent the deprecated vs modern form is across the live Next.js corpus.

list_families

Use when the agent wants to enumerate the probe families codeitall has covered — typically before constructing a `compare_implementations(task=…)` call, or to recover from a typo flagged by `compare_implementations`'s near-miss diagnostics. Returns the canonical family list with one-line descriptions, per-family counts, and the registered sub-family tags (Phase 1.12 W5; pass any as `subfamily` to `compare_implementations`): `{ families: [{ name, description, n_implementations, n_observations, sub_families }, …] }`. The family names returned here are exactly the strings accepted by `compare_implementations`. No required parameters.

pattern_consensus

Use when the agent wants the cross-project CONSENSUS — 'how do most current Next.js projects actually do X?' — rather than a single example. Returns precomputed consensus records ('X% of projects matching predicate P do Y') for a category: `auth-library` (which auth library projects use), `data-fetching-style` (legacy page-level data fetching vs Server Actions vs Server Components), or `routing-hooks` (modern next/navigation vs legacy next/router). Each record carries its honest denominator N, the full outcome distribution with percentages, and the exact SQL predicate behind every number. Pass `category` to filter, or omit it to get all. Next.js corpus — pass language='ts'.

package_trust

Use before recommending or installing an npm package — to check whether it is safe and current. Pass the package name (and optionally a version, e.g. `4.17.0`) with language='ts'. The substrate returns every OSV/GHSA security advisory affecting it (with severity, CVE aliases, the exact affected version range, and the advisory URL) and, if you pass a version, whether THAT version is affected; plus npm registry signals — whether the package is deprecated (and the maintainer's deprecation message), its latest version and last-publish date, and a maintained flag; plus how many real Next.js projects in the corpus depend on it and which declared version ranges the advisories actually bite. This is the cross-referenced security + maintenance signal a stale model cannot reliably know.

Tools
8 tools verified via live probe
verified 2h ago
Server: codeitall-serverVersion: phase-2.C/v0.4.0Protocol: 2024-11-05
Recent Probe Results
TimestampStatusLatencyConformance
Jun 10, 2026success121.4msPass
Jun 9, 2026success103.3msPass
Jun 5, 2026success252.7msPass
Jun 5, 2026success300.1msPass
Jun 4, 2026success253.1msPass
Jun 3, 2026success109.7msPass
May 30, 2026success161.7msPass
May 29, 2026success172.8msPass
May 29, 2026success156.4msPass
Source Registries
mcp-registry
First Seen
May 27, 2026
Last Seen
Jun 9, 2026
Last Probed
Jun 10, 2026