The verification and attribution layer for ML kernels across heterogeneous silicon. Catches silicon errors before they corrupt frontier training runs.
Silent data corruption interrupted Meta's Llama 3 training six times in 54 days on a 16K-H100 fleet. Google has acknowledged similar events on Gemini training roughly weekly. As frontier compute diversifies across GPUs, ASICs, photonic inference, and beyond, attributing numerical failures across silicon — not just detecting them — becomes the binding bottleneck between a stable training run and a nine-figure restart.
Ashiba Compute builds the verification and attribution layer. The methodology is presented in Kernel Contracts (April 2026), with an open-source reference verifier benchmarked at sub-1% overhead on NVIDIA H100 and AMD MI300X. Engagements are designed for 48-hour turnaround on production numerical incidents.
Every kernel ships with an implicit contract about what it computes — numerical tolerance, determinism, shape limits, composition semantics. Until Kernel Contracts, those contracts were never written down. When two implementations disagreed, nothing in the software stack arbitrated which was right.
The contract object closes that gap. Eight parts: identifier, scope, precondition, postcondition, tolerance, reference oracle, measurement protocol, violation signature. Twelve contract classes group under three physically-grounded failure primitives: path dependence, domain violation, resource contention. The taxonomy is small enough for a kernel author to internalize in an afternoon and rich enough to express the failure modes documented across forty years of silicon-test engineering.
That is the compressible abstraction Ashiba Compute contributes to the substrate problem: a formal artifact silicon vendors and frontier training operators can share with each other as evidence, rather than re-arguing every numerical disagreement from first principles.
ashiba-verify is the open-source reference implementation. ~500 lines of Python around CUDA / Apple Metal MPS / ROCm backends. Implements Freivalds' 1979 probabilistic matrix-multiply verifier at O(n²) cost with 2⁻ᵏ false-positive probability after k iterations. At k=10 on a 4096×4096 FP16 GEMM on H100, verification is roughly 0.4% of kernel cost.
The reason this wasn't shipped earlier is not technical novelty. The algorithm is forty-six years old. The work was porting it from theoretical computer science to production ML kernels and committing to the contract semantics.
github.com/cv700/ashiba-verify — Apache 2.0. Patches welcome.
The detailed framework: contract anatomy, primitive taxonomy, conformance protocol, attribution structure, and the mapping from Shewchuk / ReproBLAS / Wilkinson / Bates / Shewhart-SPC / ATPG / Huang-Abraham-ABFT / Freivalds into the ML kernel domain. Read the methodology →
Three engagement shapes for frontier training operators, sovereign compute operators, enterprise AI teams, and silicon vendors. Diagnostic is a two-week written attribution report. Retainer is annual embedded harness with 48-hour turnaround on numerical incidents. Substrate Conformance is third-party attestation that silicon meets declared numerical tolerances against reference implementations. Engagement options →
Kernel Contracts is being seeded into the standards-body conversations that gate enterprise procurement of silicon and ML services. The certification artifact is designed to be shareable in the register of an ISASecure or ASIL assessment — reviewable, citable, intelligible to a buyer's procurement team rather than just to internal kernel engineers.
The Ashiba Alignment program (see Lab) frames alignment as downstream of verification infrastructure across heterogeneous substrates. Whether AI heterogeneity becomes plurality or fragmentation depends on whether the substrate is verifiable at the silicon layer; Kernel Contracts is the empirical anchor for that argument.