Most Banks Optimise the Wrong Decisions and the Three Reasons the Right Ones Still Fail

Somewhere between 70% and 85% of AI projects in financial services fail to reach production or deliver measurable business value, depending on which research you consult — Gartner, FinTellect, and the RAND Corporation all put the figure in that range. The number has become so familiar that it functions almost as a disclaimer, a caveat that precedes the next round of investment approvals before being quietly set aside. It should not be set aside. It describes an industry that is, on aggregate, spending enormous sums to produce proofs of concept that go nowhere. And the reason it keeps happening is not that the technology is immature. It is that organisations are making the same two mistakes in sequence: they are targeting the wrong decisions first, and when they do target the right ones, they are running programmes that are structurally set up to fail.

Volume and value are not the same thing

The instinct that drives most AI prioritisation in financial services is straightforward and almost entirely wrong. Teams look for decisions that are made at high volume and that can be automated, calculate the labour cost saving, and build a business case on that basis. The decisions that attract AI investment first are typically account administration, STP routing, low-value card authorisation, and other high-frequency, operationally repetitive processes. These are not bad decisions to automate. They are just not the decisions where AI generates the most material return.

The decisions with the largest value at stake combine two characteristics: high volume and severe financial consequence per error. Payment fraud, AML screening, and large credit decisions all sit in this category. They are processed at scale, they are made under time pressure with incomplete information, and a single decision error — a missed fraud signal, a false positive that absorbs analyst capacity, a credit model that cannot explain an adverse action — generates a consequence that is orders of magnitude larger than the cost of a misrouted payment or an incorrectly auto-populated form. These decisions are also, not coincidentally, where regulatory pressure in North America is most acute. FinCEN’s AML/CFT National Priorities, CFPB’s ECOA enforcement posture, and the liability architecture of FedNow real-time payments all concentrate on exactly this quadrant.

The reason institutions sequence their AI investment backwards is partly organisational and partly structural. High-volume, lower-consequence decisions attract AI first because the efficiency case is legible, the business sponsor is easy to identify, and success can be measured in cost-per-transaction terms without requiring a view on fraud rates or credit losses. The harder decisions require cross-functional sponsorship, more complex outcome measurement, and data that is often fragmented across systems. They are worth substantially more. They are also substantially harder to get started on. In North America, however, regulatory urgency is now functioning as an accelerant that changes the sequencing calculus. Compliance deadlines imposed by FedNow’s real-time payment liability framework, FinCEN’s fraud designation as an AML/CFT priority, and CFPB’s ECOA enforcement guidance mean that certain decisions are moving onto the AI roadmap regardless of where they would fall in a pure value-prioritisation exercise. Institutions that treat these regulatory deadlines as constraints rather than acceleration factors will implement under pressure, in the worst possible conditions for a successful programme.

The three reasons AI programmes fail — and none of them are technology problems

For institutions that correctly identify the high-value decisions and commit to targeting them, the failure patterns are consistent and well-documented. Across programmes that have been assessed in North American, APAC, EMEA, and ANZ markets, the root causes break into three categories in roughly the same proportions every time.

The first and most common is governance failure, which accounts for around 42% of failed initiatives in our analysis. The pattern is recognisable: technology was selected before the problem was defined, the proof of concept proved the model works, and then nobody was assigned to act on it. No defined decision, no named sponsor, no measurable outcome, and no production funding pathway established at the start. Vendor demonstrations get approved before a business sponsor has been identified. PoC success gets defined as model accuracy rather than business outcome. The model achieves its technical targets and then sits in a staging environment while the organisation debates who owns it and what production deployment would require. This is not a failure of the technology. It is a failure of programme design, and it is almost entirely preventable with the right governance structure established before a line of code is written.

The second category is sequencing failure, which accounts for around 31% of failed initiatives. Data was not ready. Nobody admitted it until the proof of value was already running. Teams proceeded with insufficient labelled outcome data, fragmented source systems, or features that could not be constructed from available inputs, and produced models too weak to deploy. The pattern is that data readiness gets assumed rather than assessed. Discovery of the real data posture gets deferred until commitment has already been made — politically, contractually, and budgetarily. At that point, the incentive to admit the problem has inverted. The correct diagnosis, that the programme should stop until the data infrastructure is in place, is the one that is hardest to deliver and therefore least often delivered.

The third category is measurement failure, which accounts for the remaining 27%. The model achieved its technical targets. Nobody had defined what a good business outcome looked like in dollar terms, or assigned accountability for achieving it. Model performance metrics — AUC, F1 score, recall — become a proxy for business value, an economic baseline is never established at the outset, and when the programme is reviewed, there is no credible answer to the question of what it actually delivered. This failure mode is the most insidious because it is the hardest to detect from the outside. The model is in production. Reports are being generated. It is only when someone asks what the fraud loss rate was before and after deployment, or what the false positive rate did to analyst capacity, that the absence of measurement becomes visible.

Failure modeShare of failed initiativesPattern
Governance failure~42%Technology selected before problem defined. No named sponsor. No production funding pathway at outset. PoC success defined as model accuracy, not business outcome.
Sequencing failure~31%Data readiness assumed, not assessed. Discovery of real data posture deferred until commitment already made. Models too weak to deploy.
Measurement failure~27%No economic baseline established. Model performance metrics used as proxy for business value. No accountability assigned for outcome delivery.

Source: North America Banking Practice analysis, consistent with findings across APAC, EMEA, and ANZ markets.

What is notable about this breakdown is that none of the three failure modes are technology problems. They are programme design problems, organisational problems, and measurement problems. A more accurate model does not fix a programme with no named sponsor. A better architecture does not compensate for data that was not ready when the programme started. And a higher AUC score does not constitute a business outcome. The implication is direct: investment in model sophistication beyond a threshold that makes the model deployable generates diminishing returns if the governance, sequencing, and measurement conditions are not in place. The institutions that will compound advantage over the next several years are not necessarily the ones deploying the most technically advanced models. They are the ones that have built the programme infrastructure to translate model quality into production outcomes at speed — and that have established the data and feedback loops to improve continuously from every decision the model sees.

What this means for the sequencing question

The decision portfolio analysis and the failure diagnostic are related arguments, not sequential ones. Identifying the right decisions to target is necessary but not sufficient. Most institutions that fail do not fail because they targeted the wrong quadrant — they fail because they ran a programme that was structurally set up not to deliver, regardless of what decision it was pointed at. The implication for North American banks is that the question of where to invest in AI cannot be separated from the question of how. Acting on one without the other produces either well-governed programmes pointed at low-value decisions, or high-value decisions targeted by programmes that will not reach production.

The third and final piece of this series addresses what a programme that avoids both failure modes looks like in practice — the governance architecture, the regulatory posture, and the strategic choice that determines whether the advantage compounds or disappears.

Part 2 of 3.

Sources

Gartner. AI Project Failure Rates. Multiple editions, 2023–2025. FinTellect AI. Why 80% of AI Projects in Finance Fail. 2024. RAND Corporation. AI Project Success and Failure Rates. Referenced in industry literature. North America Banking Practice. Decision Portfolio and Failure Mode Analysis. Internal analysis, consistent across AP, EMEA, North America, and ANZ markets.