methodology note · v1.0 · published 2026-07-03

How the Scoreboard works.

This note is public. It travels with every Scoreboard report and every issue of the State of Mid-Market Industrial AI index. If we change the method, we change the version number and say what moved.

Why this instrument exists

Most AI assessments are sales documents. A vendor or a consultancy scores your "readiness," finds you almost ready, and — conveniently — sells the thing that closes the gap. The diagnosis is written by the party that profits from a bigger treatment. That structure, not the technology, is why the failure numbers look the way they do.

The numbers are worth sitting with:

RAND interviewed 65 practitioners and put the failure rate of AI projects above 80% — roughly twice the rate of IT projects that don't involve AI. Their #1 cause: "misunderstandings and miscommunications about the intent and purpose of the project." Not the models. The problem statement. (rand.org, RRA2680-1)
MIT's NANDA project found that roughly 95% of organizations show no measured P&L impact from their generative-AI spend — on an estimated $30–40 billion of investment. That study is preliminary and contested, so we use it in exactly that bounded form: not "95% fail," but "almost nobody can show the money on a P&L." (primary PDF)
Gartner predicted at least 30% of generative-AI projects would be abandoned after proof-of-concept by end-2025 (republished release), and now predicts more than 40% of agentic-AI projects will be canceled by end-2027 (via MarTech).
S&P Global found the share of companies abandoning most of their AI initiatives jumped from 17% to 42% in a single year, with the average organization scrapping 46% of its proofs-of-concept (via CIO Dive).
The Manufacturing Institute and PwC surveyed manufacturing leaders in April 2026: 58% describe their own executive leadership's AI use as "limited," and 54% have low or very low confidence in their frontline leaders' ability to lead AI change. (nam.org)

Every one of those failures was, at some point, a proposal on an executive's desk that nobody was equipped to interrogate. The Scoreboard exists to make that interrogation routine — twenty scored questions an operator can answer in a few minutes, scored against the documented ways these projects actually die. It is free, it stays free, and it is not a funnel into an implementation contract, because we don't sell implementation. That last part is in writing; see the charter section below.

I spent my career in audited, expensive-to-get-wrong industrial delivery, where no structure gets built from a drawing nobody checked. The Scoreboard applies that discipline — check the drawing before you build — to AI spending decisions. That background is the proof behind the instrument, not its market: Tektari serves mid-market manufacturers, distributors, logistics, and industrial-services operators.

What it measures

Four disciplines, five questions each, mapped one-to-one onto the failure evidence:

Problem definition. Does the business problem exist on paper before the technology does — a written problem statement, a sponsor-signed baseline, a pass/fail line, a kill rule? This section instruments RAND's #1 failure cause.
Data plumbing. Can your own team locate, trust, and trace the records the AI would run on? AI fitted to miscoded records repeats the miscoding faster and with more confidence.
Vendor exposure. Who holds the pen — who wrote the scope, who defined "delivered," who measures it, who carries the cost of failure, and what do you keep if you walk? Most mid-market AI arrives on vendor paper (MIT NANDA found external partnerships behind roughly two-thirds of deployments), so this is where most of the risk gets signed.
Workforce readiness. Can your leadership team defend the initiative under questioning, and can your frontline supervisors carry the change? This is the MI/PwC gap, measured on your own organization.

What it deliberately does not measure

Your technology. No questions about models, platforms, GPUs, or which vendor's logo is on the tool. Those change quarterly; the disciplines don't.
AI spend or ambition. Spending more scores nothing. The instrument is indifferent to whether you run zero pilots or ten.
"Maturity." This is not a maturity model and there is no journey. A low score is not an early "stage" you graduate from by waiting — it is a description of concrete, present exposure: who can define your problems for you, and who pays when a pilot fails.
Anything we can't score from your answers. The Scoreboard is self-reported and self-selected, and we verify none of it. It reads your answers back to you in a structured way. It does not predict your success, and a strong score is not a guarantee — it means the standard failure modes will have to find another way in.

How scoring and banding work

Each question has four answers scored 0 to 3, worst to best, in the open — the options aren't shuffled or disguised, because a respondent who games a free self-assessment only cheats themselves. Sections score 0–15; the total scores 0–60. All four sections weigh equally in version 1.0: we have no empirical basis yet for claiming one discipline predicts failure more than another, so we don't. If the data ever supports weighting, that becomes a new version, not a quiet edit.

Four result bands, by total score: Vendor's market (0–15), Pilot graveyard (16–30), Instrumented, with gaps (31–45), Buyer in control (46–60). The band descriptions in your report say what the score exposes you to in plain terms — for example, that your vendor can currently define your problem, grade their own work, and get paid either way. Every report also names your lowest-scoring section explicitly, because that is where your next dollar of failure is most likely to come from.

How peer comparison works — and why yours might say "not yet"

We compare you only against qualified peers: executive respondents (CEO/President, COO, CFO, or equivalent) at $50M–$500M manufacturers, distributors, logistics, and industrial-services operators. And we publish the pool size — n — every single time a comparison appears.

Under 25 qualified respondents: no peer comparison at all. You get your scores against the published bands, and the current n, and that's it.
25 to 99: a directional read only — top, middle, or bottom third of the pool, with n shown.
100 and up: exact percentiles, with n shown.

The reason is simple: a percentile computed on a pool of 30 is a costume, not a statistic. We would rather show you a coarse true number than a precise fake one, and an instrument about verification discipline doesn't get to fake its own statistical authority.

Anonymization and aggregation rules

Your individual answers and identity are never published, shared, or sold. Ever.
Your email is used to send your report and, if you opted in, the quarterly index. Nothing else. Human follow-up happens only if you ticked the separate, optional "you may contact me" box.
The quarterly State of Mid-Market Industrial AI index publishes aggregates only: band distributions, section medians, and a few single-question findings — always with n, never with a cut small enough to identify anyone (any slice under n = 5 is suppressed, and no cross-tabs are published until the pool passes 100).
Every published number is computed by spreadsheet formula from the raw scores and can be re-derived. No statistic in the index is generated or estimated by AI. We use AI tooling to help draft report prose — a person reviews every report before it ships — but the numbers are arithmetic, not language-model output.
One more disclosure: Tektari does not sell to construction/AEC organizations, to avoid a conflict of interest. Respondents from those industries are welcome to take the Scoreboard and get the same report; they will never receive commercial follow-up from us, and they are not counted in the qualified peer pool.

The charter this sits under

The Scoreboard is part of Tektari's verification line, which operates under a published charter: we sell no implementation, we take no vendor commissions or referral fees, and verification fees never credit toward any other Tektari offer. Our prices are published. The paid step after the Scoreboard, for those who want it, is the Problem-Definition Audit — three weeks, $12,500 fixed, producing one AI use case, a sponsor-signed baseline metric, and a written vendor acceptance test your team can run without us; $2,500 of that fee is invoiced only when the scoped pilot passes its acceptance test. If the benchmark told you to buy something from the benchmark's author, you'd be right not to trust the benchmark. This one can't, by charter. Read the Verification Charter.

Versioning

This is instrument and methodology v1.0. The rules:

Any change to a question, an option, a score value, a band boundary, or the comparison method bumps the version number.
Scores are never compared across versions, and the index never mixes them.
Each index issue states the version it was collected under and carries a change log of what moved since the prior version, and why.
If someone challenges the methodology publicly, we answer with data in the next issue. If we got something wrong, the correction gets the same prominence the error did.

Questions about the method: hello@tektari.com. The full question set and scoring rubric are visible inside the instrument itself — there is nothing hidden to reverse-engineer.

Take the Scoreboard