Most bank onboarding processes are designed to answer two questions about an applicant: are they who they say they are, and are they a credit or compliance risk the bank should decline. Document verification answers the first question by checking that the identity document presented is genuine and matches the applicant. Credit bureau checks answer parts of the second by providing a credit history. KYC screening checks the identity against sanctions and PEP lists. This architecture is appropriate for the fraud environment it was designed for, where the primary risk was a real person presenting a stolen or forged identity.
Synthetic identity fraud operates differently. The synthetic identity is not stolen. It is built. A combination of real identifying information — often a Social Security Number belonging to a child, a deceased person, or an individual with minimal credit activity — with fabricated name, address, and date of birth produces an identity that does not belong to a real person but can pass document verification and, over time, build a genuine credit history. The fraud is patient. Synthetic identities are typically aged for months or years through small credit activities before being used for large-scale financial gain.
Why standard checks do not catch it
Document verification fails against synthetic identity because the fraudster is not presenting a forged document for a stolen identity. In many cases they are presenting a genuine document issued against the fabricated identity, or a convincingly fabricated document that passes the verification checks in use. The document check answers the question of whether the document is genuine. It does not answer the question of whether the identity is real.
Bureau checks compound the problem rather than solving it. A synthetic identity that has been aged through authorised user additions on existing accounts, secured credit cards, and small retail credit lines may have a bureau score that is more favourable than many genuine applicants. The bureau record is accurate — it reflects the credit activity conducted under that identity. It simply does not reflect a real person’s financial history. From the bureau’s perspective, the identity is indistinguishable from a genuine thin-file applicant who has been actively building credit.
Manual review catches neither dimension systematically. An application that presents clean documents, a supporting bureau record, and a complete set of application details is unlikely to be flagged for enhanced scrutiny by an underwriter or KYC analyst whose attention is directed toward incomplete applications or obvious inconsistencies. The synthetic identity application is typically more complete and more internally consistent than a genuine application, because it has been prepared specifically to avoid raising questions.
The economics of the fraud
Synthetic identity fraud is not a high-frequency, low-value crime. It is designed for low-frequency, high-value bust-out events. A synthetic identity built over 18 months, acquiring a credit card, a personal loan, and a retail finance product across multiple institutions, can accumulate tens of thousands of pounds, dollars, or euros in credit exposure before the bust-out. At that point the fraudster maximises all available credit lines, converts the proceeds, and the identity ceases to operate. The bank’s first indication that anything is wrong is a missed payment on a formerly performing account.
The scale multiplies when synthetic identities operate as rings. A single successful synthetic identity establishes a playbook that is replicated across tens or hundreds of similar identities, often using shared infrastructure — device networks, address clusters, phone number pools — that leaves traces in the application data that are invisible when applications are reviewed individually but clearly visible when they are connected. A ring of 50 synthetic identities, each with a modest credit exposure, produces a loss event that is significant in both scale and operational disruption.
The cost is not only in credit losses. Synthetic identity fraud increases the true default rate on the portfolio in a way that appears, in standard credit performance analytics, as credit risk rather than fraud. Portfolios with significant synthetic identity penetration will exhibit elevated first-payment default rates, higher-than-expected charge-off rates on recently originated accounts, and anomalous patterns in the demographics of defaulting accounts. These patterns are attributable to the fraud but are often not identified as such, meaning the corrective response — tightening credit policy rather than improving onboarding fraud detection — addresses the wrong problem.
What AI-based detection looks like
The signals that identify synthetic identity are behavioral and relational rather than documentary, which is why they are not captured by the existing verification architecture.
Behavioral biometrics during the application process provide signals that are invisible to the legitimate applicant and highly informative to the model. How quickly form fields are completed, the typing pattern, the mouse or touch movement, whether fields are completed sequentially or in an unusual order, and whether the application data is entered directly or pasted from a clipboard are all signals that distinguish human applicants completing genuine applications from applications completed programmatically or by fraud ring operators following a template. A genuine applicant pauses, corrects entries, and navigates the form in a way that is individually variable but collectively consistent with human behaviour. Applications submitted by fraud rings often exhibit a uniformity of pace and pattern that is statistically improbable across genuine individual applicants.
Device and network signals extend the behavioral picture. A device that has submitted multiple applications in a short period, that has been used to apply at multiple institutions, or that connects through infrastructure commonly associated with fraud operations provides corroborating evidence. The relationship between device characteristics, IP address, and the stated applicant details — geographic consistency, device-reported attributes versus stated attributes — adds further signal.
Network analysis connects this application to others. The address, phone number, email domain, or device identifier shared between this application and another provides a relational signal that is invisible in individual application review. A model that maps the application against the existing population of applications and accounts — within the institution and, where data sharing arrangements exist, across institutions — identifies clusters of connected applications that the individual review process cannot see.
The customer experience constraint
The risk in onboarding fraud detection is applying friction to legitimate applicants. Onboarding completion rate and fraud-at-origination rate are in tension: tightening verification reduces fraud but increases abandonment by legitimate applicants who find the process too demanding. The value of behavioral and relational signals is that they do not impose additional steps on the legitimate applicant. The biometric data is collected passively. The device and network signals are captured without customer action. The network analysis runs against the application data already submitted.
Additional verification steps — enhanced document checks, video verification, proof of address documents — are reserved for applications where the model has identified signals of synthetic identity risk, rather than applied uniformly to all applicants. This risk-based approach concentrates friction where it is justified while maintaining a smooth experience for the vast majority of genuine applicants.
The technology dimension
Onboarding systems at most large retail banks connect to core banking systems on IBM Z for identity deduplication, existing customer checks, and account creation. The behavioral and relational signals that identify synthetic identity can be processed within the onboarding workflow, with the network analysis running against the existing customer and account data held in Db2 on Z. A model deployed on IBM Z via IBM Machine Learning for z/OS has access to the full existing customer population for network analysis, enabling comparison of the incoming application against the bank’s existing account base in real time, without data extraction, and within the response time requirements of a digital onboarding journey. The risk score and any flagged signals are returned to the onboarding workflow before the application decision is made, enabling either automated decline, enhanced verification routing, or analyst referral depending on the score and the bank’s risk appetite configuration.
What success looks like
The metrics are fraud-at-origination rate — the proportion of accounts opened that subsequently exhibit synthetic identity fraud characteristics — onboarding completion rate for genuine applicants, and detection rate within the known synthetic identity fraud population. The programme should establish a baseline fraud-at-origination rate from portfolio retrospective analysis before deployment, because the existing rate is typically underestimated due to synthetic identity fraud appearing in credit performance data as credit risk rather than fraud. The improvement in detection rate is measured against that baseline, and the reduction in false positives — genuine applicants incorrectly flagged and subjected to additional friction — is measured against the onboarding completion baseline. Both improve together in a well-calibrated programme, because better signal quality reduces both fraud penetration and unnecessary friction simultaneously.