Technical research for when problems move faster than institutional research can handle.
Ashiba Research is an applied research lab. We pick programs by clock, not by curiosity — each one addresses a failure mode or compliance forcing function active in production now, on a horizon where academic publication cycles and large-company research roadmaps are too slow to be useful.
What ties the programs together is the same shape of problem: the gap between what an AI system claims and what is measurably supportable. Different domains, same proof-debt structure.
Five programs, related but distinct. Kernel Contracts is the current public focus.
The verification and attribution layer for ML kernels across heterogeneous silicon. Eight-part contract object, twelve contract classes, three published case studies. Open-source reference verifier (ashiba-verify) benchmarked at sub-1% overhead on NVIDIA H100 and AMD MI300X.
Clock: silent-correctness failures already costing nine-figure training restarts. Meta's Llama 3 hit six SDC events in 54 days on a 16K-H100 fleet; Google has acknowledged comparable incidents on Gemini training.
Research on the gap between automated output and supportable claims. Applied first to outbound customer trust: security questionnaires, DDQs, RFP responses, trust-center updates — where AI drafting now produces more answers than supportability review can verify. Methodology grounded in Good Tech, Messy Spec (April 2026): seventeen environments measuring whether coding agents follow document-grounded operational requirements.
Clock: vendor security review now gates 75%+ of enterprise deals; 81% of questionnaire responses claim near-perfect compliance, only 34% of TPRM professionals believe them.
Paper:
Inquiries by email: cv@ashibaresearch.com
The dominant alignment framing asks how to engineer values into individual models. Ashiba Alignment works the other direction: alignment is downstream of verification infrastructure across heterogeneous substrates and the construction of legible moral environments in which agents can be mutually accountable. Verification at the silicon layer is environmental work in the deepest sense — work that maintains the conditions under which moral life is possible across a heterogeneous, plural, AI-mediated world.
Clock: the AI decision-making layer increasingly runs on heterogeneous, opaque substrates, landing on populations who have no way to verify what was computed. Whether heterogeneity becomes plurality or fragmentation is being decided now.
Paper:
Inquiries by email: cv@ashibaresearch.com
Three things, one site. A CLAUDE.md tuned to the failure modes vanilla models exhibit on technical consulting work — in the lineage of Karpathy's code-engineering CLAUDE.md, but specific to standards-grounded engineering consulting. A searchable index of the technical standards and codes that real consulting work depends on (ISA-99 / IEC 62443, OWASP ASVS, ISO 42001, IEC 61508, NIST SP 800 series, and beyond), with public-mirror links where they exist. A framework library of structured problem-solving methods — TRIZ, fault-tree analysis, Goldratt, others — written for agent use.
Built around one observation: agents do consulting work badly when they hallucinate the relevant standard, miss it entirely, or skip past spec reading to implementation. Domain-grounded CLAUDE.md plus standards-aware tooling fixes the failure shape.
Clock: agentic deployment for engineering and consulting work is happening now; the spec-grounding infrastructure is not. v2 at probspec.com.
Live at deeptechtools.com: a deep-tech ecosystem map, AI tools repository, deep-tech calendar, and forthcoming standards classifier, innovation-call aggregator, and playbook. Aimed at deep-tech operators, founders, and the funders, accelerators, and tech-transfer offices around them.
v2 in development: a worker-data marketplace. Deep-tech operators — semiconductors, biotech, advanced manufacturing, energy, materials — hold the most valuable untapped AI training data on the planet. Frontier labs have already eaten the public web; the next training-data frontier is operator data from the real economy. v2 closes the loop: capture the data your work already produces, sell it to labs that will pay for it, run your operation on better tools. v2 ships June 15, 2026.
Clock: whoever builds the rails for operator-data flow first shapes how labs train on deep-tech expertise for the next decade.
Ashiba is small on purpose. The work is calibrated to questions that have a clock, where the marginal hour spent on framing-by-committee is an hour the failure mode keeps producing in production. Engagements ship in hours, not quarters. Papers are published when they are useful, not when the cycle permits.
We are not a foundation lab and not a consulting shop. The intermediate is intentional: independent applied research, with the publication record of a lab and the response time of an operator or elite newsman.