The card authorization decision carries a cost on both sides of a wrong outcome. A false decline loses the transaction, introduces friction, and in a market where customers carry multiple payment options, can redirect spend to a competitor card for longer than the declined transaction itself represents. A missed fraudulent transaction produces a direct loss that compounds with every similar transaction that passes through the same gap in the bank’s defences.

Banks track both with precision. The authorization rate, false positive rate, fraud loss rate, and dispute volume are well-instrumented metrics in any mature payments operation. The more productive question, and the one that most often surfaces in the conversations I have with payments and fraud leadership, is not what those numbers are. It is what is causing the remaining gap. Specifically, whether performance could be improved by better model quality, wider coverage, or a different positioning of the inference layer in the authorization flow. The three diagnoses are different and they require different investments.

Where the economic friction sits in the authorization journey

For card-present transactions, the authorization decision is effectively binary. A transaction request arrives, the bank evaluates it, and an approve or decline is returned. For card-not-present transactions, the picture is more nuanced. Risk-based authentication frameworks like 3DS2 allow for a frictionless flow on lower-risk transactions and a challenge step on higher-risk ones, giving issuers a richer set of responses than a simple approve or decline. In both contexts, the economics of getting the decision wrong are the same.

False declines carry a cost that is frequently underweighted because the direct revenue loss is only part of the picture. Industry benchmarks put a 0.1% reduction in the false positive rate on a portfolio of 10 million monthly transactions at between $1 million and $3 million annually when transaction value, servicing cost, and the attrition contribution are included. The attrition component is the portion most commonly absent from the fraud P&L, because the causal link between a specific decline and a subsequent shift in primary card usage is real but not always visible in the data available to the fraud team.

Fraud losses average between 5 and 8 basis points of transaction volume at large retail banks. On a $10 billion transaction portfolio, a single basis point improvement is worth $1 million annually. Given the transaction volumes that most large banks process, even a 1 to 3% improvement in detection accuracy produces material P&L impact. The question is whether current AI architecture allows that improvement to be realised or whether the architecture itself is the constraint.

Operational cost is the third component. A fraud model without sufficient behavioural precision generates false positive volume that drives analyst queues, dispute handling, and outbound customer contact. The cost of working a false positive is largely the same as working a genuine fraud case. Precision improvement that reduces false positive rates reduces this cost simultaneously, which often makes the full economic case for model investment more compelling than the fraud P&L line alone would suggest.

The decision that is harder than it looks

The authorization model needs to distinguish, in a very short window, between a legitimate transaction from a cardholder behaving normally and a fraudulent transaction from someone using obtained card credentials. The challenge is that both present the same card details, often the same device, and increasingly similar behavioural characteristics as social engineering and account takeover techniques become more sophisticated.

What differentiates a strong authorization model is the depth and recency of the behavioural baseline it draws on at the moment of scoring. A model that can reference what this cardholder has historically purchased, where, at what time, in what amounts, and how that pattern has shifted over recent periods can identify the transaction that does not fit. A model working from a shallow or infrequently refreshed baseline cannot. The quality of the behavioural baseline at the point of scoring is directly influenced by where the model runs and what data it can access within the time available.

This is where off-platform inference introduces a business problem rather than just a technology one. Fraud scoring systems operating via cloud API typically add 100 to 120 milliseconds to the authorization response, compared to 10 milliseconds or less for on-premise scoring. Mastercard’s own Transaction Fraud Monitoring solution publishes these figures directly. The practical consequence is that some banks using off-platform solutions score a subset of their transaction volume rather than all of it, applying rules to the remainder. IBM’s documentation on co-located inference confirms the same dynamic: large institutions with off-platform solutions were only able to screen a small subset of their transactions.

Most mature fraud architectures are not single-model solutions. They operate across multiple defence planes applied in sequence. The first plane is deterministic rules, which are fast, low on resource consumption, and resolve the clear cases without invoking anything more expensive. Where rules do not resolve the transaction, the next plane applies adaptive behavioural models that assess patterns specific to the individual cardholder and transaction context. Beyond that, temporal models evaluate patterns over time and network-based models assess entity relationships across accounts, devices, and counterparties. Each successive plane is more computationally expensive and invoked only when the plane before it has not produced a confident resolution.

This sequential escalation is the right architecture. The performance question is where each plane runs. Off-platform execution means each escalation introduces additional network latency, and the cumulative effect across multiple planes can consume a significant portion of the available response window. On-platform execution means all planes run in sequence within the transaction environment, with each escalation adding milliseconds rather than round-trip network time. The IBM Z platform, with transaction managers such as CICS and IMS handling the authorization flow and IBM Machine Learning for z/OS providing inference capability via the Telum on-chip accelerator, allows all planes to run within the window that off-platform architectures cannot reliably meet at full transaction volume.

What good looks like

A mature AI-enabled card authorization programme produces measurable improvement on false positive rate, fraud loss rate, and authorization rate simultaneously. The economic target for each of these should be set before any model is selected. A 1% improvement in false positive rate on a high-volume portfolio delivers material revenue recovery. A 1 basis point improvement in fraud detection on a large transaction book delivers $1 million annually. Setting those targets in advance, establishing the current baseline against which improvement will be measured, and assigning accountability for the outcome to a named business owner are the governance conditions that separate programmes that deliver from those that demonstrate.

The technology dimension

For most large retail banks, authorization runs on transaction management infrastructure on IBM Z, processing transactions through CICS, IMS, or equivalent transaction managers. The behavioural and account data that informs the authorization decision lives on the same platform. For these institutions, running AI inference on IBM Z via IBM Machine Learning for z/OS eliminates the off-platform latency entirely. IBM benchmarks demonstrate up to 282,000 CICS credit card transactions per second with in-transaction fraud scoring at 4 milliseconds total response time, using the Telum on-chip AI accelerator. Models are developed in Python using standard frameworks, exported to ONNX format, and deployed to z/OS without data leaving the platform. For institutions not on IBM Z, the same business case applies with different infrastructure choices. The principle is the same: inference positioned close to the transaction, with access to the full behavioural data, within the window where the score can influence the decision.

Starting the conversation

The productive entry point for this conversation is diagnostic. Not what is your fraud loss rate, which is already known, but what proportion of the gap between current and achievable performance do you attribute to model quality, coverage, or inference positioning. Each leads to a different programme. Model quality improvement produces incremental gains within the currently scored population. Coverage expansion addresses gaps in the scored transaction set, with the economics varying significantly by segment. Inference positioning determines what both of the others can achieve within the time constraints of the authorization flow.

Most of the productive engagement I have seen begins with that diagnostic, not with a vendor demonstration. The bank already has the P&L. The value of the conversation is identifying which part of the gap is most efficiently closed first.