The naming-normalization bug that destroyed three Fortune 1000 cutovers
TSB Bank UK, SAAQ Quebec, Revlon. Same bug class. How our Naming Library catches it before the reconciliation report ships.
Three of the most expensive enterprise migration failures of the last decade have something in common. TSB Bank UK in 2018 (Lloyds → Sabadell). SAAQ Quebec in 2025 (legacy → SAAclic, C$1.1B overrun). Revlon's SAP rollout in 2018. Different industries, different geographies, different vendors. Same bug class.
What the bug looks like
Pick any customer or entity record. It has a name field. The source system stored it one way; the target system stores it another way. Both are correct. Both are how that field is supposed to look in their respective systems.
Now imagine the migration script copies the source value into the target as-is. The target's unique constraint on (id, normalized_name) says: name doesn't match the canonical form, this is a new entity. So the target creates a duplicate. Customer A in the source is now customer A and customer A' in the target. Their balances split. Their transaction history forks. Their addresses diverge.
The thing dev tests don't catch
Here's why this passes UAT and breaks in production: in a small test dataset, the source-format string and the target-format string don't appear simultaneously. The migration script copies a thousand source-format names into the target. Target has no "ADEBAYO ENT." before, has it after. No collision.
Then in production, the first actual customer transaction happens. The transaction system, upstream of migration, generates the customer's name in the target format ("Adebayo Enterprises"). It hits the target with this new format. Target checks: do I already have this customer under TIN 14729583012? It checks by (TIN, normalized_name). The migrated record is under (TIN, "ADEBAYO ENT."). The transaction has (TIN, "Adebayo Enterprises"). Constraint says: different. Target creates a new entity. That's when the bug surfaces.
Why production reveals the bug, not UAT — Because the bug only manifests when the source-format string and the target-format string appear in the database at the same time. UAT data has only one format. Production has both, because production is connected to upstream systems that generate the target format.
How Migratio's Naming Library handles it
We treat name normalization as a first-class agent with a versioned rule library, not a one-off SQL function. Three layers:
Layer 1: Tokenization
Break the source name into tokens: legal-entity prefix (BANK), specific name (OF AMERICA), entity-type suffix (NIG. LTD), incorporation modifiers. Each token gets a type tag and a canonical form. NIG. tokens become Nigeria; LTD tokens become Limited; case folds; whitespace collapses.
Layer 2: Equivalence proposal
The agent proposes equivalences: ADEBAYO ENT. ≡ Adebayo Enterprises with confidence 0.94. The 0.94 comes from a confidence model trained on the customer's own historical merger/dedupe decisions, not generic NLP.
Layer 3: Human approval — or auto-resolve
Above a configurable threshold (typically 0.85 for tax / 0.92 for banking), the equivalence resolves automatically. Below, it goes to Exception Triage for human review. Every decision — automated or human — is logged with cryptographic chaining for the regulator's audit pack.
What this caught at NRS
From our recent NRS TaxPro → Rev360 cutover: 47,238 entity collisions surfaced. 3,617 auto-resolved at the threshold. 43,609 went to Exception Triage. Of those, 43,597 were resolved within 4 weeks by NRS's tax-validation team using the agent's tokenized diff view. The remaining 12 went to senior escalation.
Without the Naming Library, every one of those 47,238 records would have hit production as a duplicate. NRS's downstream payment systems would have rejected payments against the "wrong" copy of each taxpayer. Customer experience: payments mark as missed; tax-authority side; legal-entity reconciliation: broken.
Where this still doesn't work
The Naming Library doesn't help if your source data is straight-up wrong — typos, transposed digits, manually entered junk. It helps where two systems have different but valid representations of the same real-world entity. That's most enterprise migrations. It's not all of them.
We're explicit about this in discovery calls: if your data is in worse shape than that — if you have, say, 8% typos in customer names — the Naming Library will give you a head start but you'll still need a separate data-cleansing pass. We'll tell you that on the call.