KYC Onboarding Platform

The Problem

Forty percent of applicants who started KYC never finished it. They'd upload their documents, submit their information, and then wait. And wait. The average queue was 3 days long, and the only status update they got was silence. Most of them moved on.

Two compliance officers were handling every application by hand. Checking government IDs against document photos. Running names against OFAC sanctions lists by copy-pasting into a government website. Verifying bank accounts by emailing applicants and waiting for statement PDFs to arrive. Each application took 25 to 40 minutes of focused attention. With 40 to 60 new applications per day, they were at capacity before 10am.

The regulatory picture was getting worse. Their banking sponsor had asked for automated compliance controls within 60 days. The audit trails the team was producing, exports from a shared spreadsheet with a column called "Checked By," weren't going to satisfy a formal audit. I came in with 8 weeks on the clock and a brief that was clear: automate the pipeline, get the drop-off rate down, and produce audit trails that could survive regulatory scrutiny.

The Approach

Three-layer verification, with a risk scoring engine sitting on top. Plaid Link handles bank account ownership and income verification. When an applicant connects their bank through the Plaid OAuth flow, we get a real-time view of their accounts and transaction history. No manual statement uploads, no emailing PDFs. Onfido handles document and biometric verification: OCR on government IDs from 195 countries, combined with a liveness check to confirm the person holding the ID is the same person on it. A custom sanctions screening step checks applicant names against the OFAC SDN list and the EU consolidated sanctions list using fuzzy matching to catch name transliterations and spelling variations. Wallet screening for crypto-source deposits uses Chainalysis KYT. Each deposit address is checked against Chainalysis's risk scoring API before funds are processed. High-risk wallet associations trigger auto-rejection.

The risk scoring engine takes signals from all three layers and produces a composite score from 0 to 100. Below 30 means auto-approve. Above 70 means auto-reject. Anything in between routes to a human reviewer with a pre-filled summary of the flags. The insight that made this work: when we backtested 612 historical decisions exported from the team's spreadsheet, 74% of applicants scored below 30. Of the remaining 26%, roughly half (13%) fell in the ambiguous range and went to human review; the other 13% were auto-rejected or flagged for enhanced due diligence. That meant 74% of the previous manual workload would disappear on day one.

Architecture

Verification Pipeline

An application moves through a defined state machine: submitted -> plaid_linked -> documents_uploaded -> onfido_check -> sanctions_screened -> scored -> auto_approved | human_queue | auto_rejected. Each transition is a BullMQ job. If Onfido times out or returns an error, the job retries with exponential backoff. The applicant's state is tracked in PostgreSQL with an append-only audit log. Every transition gets recorded with a timestamp, the triggering actor, and the relevant payload.

The pipeline is fully idempotent. If the system crashes after Plaid succeeds but before Onfido starts, resuming the job re-checks Plaid (gets a cached result from our database) and then triggers Onfido fresh. No step runs twice. No applicant gets stuck in an ambiguous state. This was non-negotiable for a compliance system: the state of every application at every moment had to be knowable from the database alone, without relying on in-memory state or job queue metadata.

Risk Scoring Engine

The composite score pulls from four factors: identity confidence (Onfido biometric match score, 0 to 100), financial history (Plaid income and account signals, 0 to 100), sanctions proximity (exact match = 100, fuzzy match above the confidence threshold = 50, no match = 0), and address verification (0 to 100 based on document address versus the address the applicant provided). These four scores get combined as a weighted average: identity at 40%, sanctions at 30%, financial at 20%, address at 10%.

The weights weren't guesses. I spent 3 weeks backtesting against the team's historical decisions, adjusting weights until the model agreed with the compliance officers on at least 94% of cases. Calibrated risk score thresholds against 612 historical decisions exported from the team's spreadsheet. We split 70/30 train/test, achieving 88% agreement with human reviewer outcomes on the holdout set. The main finding: the initial model over-indexed on address verification, which the compliance officers almost never used as a deciding factor. That calibration mattered. The compliance team trusted the system from the start because it matched their judgment. Threshold tuning is config-driven: the compliance team can adjust the auto-approve and auto-reject boundaries through an admin UI without touching code. The weights themselves, however, required a code change. That's something I'd fix in a second pass.

Audit Trail Design

Every decision, whether made by the scoring engine or a human reviewer, writes a record to the compliance_events table: event_type, actor (either "system" or the reviewer's email), payload, timestamp, and outcome. The table is append-only. No updates. No soft deletes. A compliance event, once written, is permanent. All 18 auditable event types, covering every state transition, API response, and manual review decision, are captured as immutable records with automated assertion tests verifying emission.

GDPR creates a genuine tension with this design. Users have the right to erasure, but audit trails supporting financial compliance decisions can't be deleted under AML regulations. The resolution: we encrypt all personal data in event payloads using per-user encryption keys stored in a separate key management table. When a deletion request arrives, we delete the encryption key. The audit record stays in the database, but the personal data it contains is effectively unreadable. The compliance record survives. The personal data doesn't.

Key Technical Decisions

Plaid + Onfido vs All-in-One KYC Platforms

Jumio, Persona, and Socure all offer everything in one SDK. Bank verification, document verification, sanctions screening, risk scoring. I chose not to go that route. The key issue was reliability profiles. Bank account verification and document verification are fundamentally different problems, solved well by different providers. Plaid's bank linking is the standard for a reason: it has direct integrations with thousands of financial institutions and handles the OAuth edge cases that matter. Onfido's document OCR has broad international ID coverage and a liveness detection system with a strong track record on false positive rates.

A monolithic KYC vendor would have locked us into one provider's weaker capabilities for both problems. The tradeoff was real: two integration surfaces, two support relationships, two contracts, two sets of webhook handlers. But it was worth it. Plaid also gave us transaction history and income signals that none of the all-in-one vendors offered at comparable depth. That data feeds the financial history component of the risk score in ways a generic KYC API couldn't match.

Synchronous vs Async Verification

Sanctions screening takes under 1 second. Onfido biometric verification takes anywhere from 30 seconds to 3 minutes, depending on document quality and processing load. Making the whole pipeline synchronous, holding the user on a loading screen while all three verification steps complete, wasn't viable.

I made the pipeline fully async. The applicant submits, gets a confirmation screen, and receives an email when a decision is made. The tradeoff is that users don't get instant feedback on whether they're approved. We mitigated this with status webhooks to the frontend: real-time updates like "Your documents are being verified" and "Your application has been approved" appear as push notifications in the applicant portal. Drop-off from the verification wait dropped from 40% to 19%, which tells me the users who abandoned before weren't leaving because of the wait itself. They were leaving because they had no signal that anything was happening.

Results

Before: 3-day average review time, 40% drop-off rate, two compliance officers at capacity from mid-morning. After: 74% of applicants auto-approved in under 3 seconds, average review time for the 26% that reach human review dropped to 8 hours, compliance officers now working on genuinely complex cases rather than routine verifications.

KYC Application Pipeline: 6-Week Trend

Verification Time Reduction: After Automation

The compliance team went from reviewing 40 to 60 applications per day to reviewing 10 to 15. The banking sponsor audit happened 6 weeks after launch. Every question they asked, including "show us the audit trail for this specific applicant across every decision point," was answered by querying compliance_events directly. No spreadsheets. The sponsor extended the partnership.

What Worked

Threshold tuning from historical data. Running 3 weeks of backtesting against the compliance team's actual manual decisions before writing any automation logic meant the scoring model reflected real judgment, not my assumptions about what compliance decisions should look like. The compliance team trusted the system from day one because it agreed with how they'd already been working.

Async pipeline with status webhooks. Applicants who would have waited 3 days for a manual review now get real-time status updates. Drop-off fell from 40% to 19%. That 21-point improvement came almost entirely from giving users a signal that their application was progressing, not from making the process faster.

Append-only audit trail. Every regulatory question we've received since launch has been answerable in under 10 minutes by querying compliance_events. Before, the answer to "show us this applicant's review history" was "we'll email you the spreadsheet." That distinction mattered in the banking sponsor audit.

Idempotent job design. Building each pipeline step to be safe to retry from any state meant that infrastructure incidents didn't create compliance gaps. No application ever got stuck mid-verification with an unknown outcome.

The Biggest Surprise

The Plaid OAuth redirect was the biggest conversion killer we didn't anticipate. On mobile, the redirect chain (our app → Plaid → bank → Plaid → our app) confused users enough that 22% of mobile applicants dropped off at that step alone. Desktop was fine. We didn't discover this until week 6 because our analytics didn't segment by device. We added a 'what to expect' interstitial before the redirect and improved the loading state on return. Drop-off at that step fell to 11%, but we never fully solved it. An alternative flow using micro-deposits for bank verification would have avoided the redirect entirely.

Scope and Collaboration

The client's compliance officer defined the risk scoring weights and approved the auto-approve/reject thresholds. I built the pipeline and scoring engine; she owned the regulatory judgment calls. The banking sponsor audit was handled by their compliance team. I provided the technical documentation and query examples.

What I'd Reconsider

The risk scoring weights started hardcoded. Even with config-driven thresholds, changing the weights themselves required a code deploy. After the first month, the compliance team wanted to experiment with weight adjustments based on new fraud patterns they were seeing. I'd make the full scoring model config-driven from the start, with weight changes going through the same admin UI as threshold changes.

We built the Plaid integration to verify bank accounts during the initial onboarding flow, which means applicants go through the Plaid OAuth redirect mid-application. For some users, particularly those on mobile or less familiar with OAuth flows, this was confusing enough that they dropped off at that step. An alternative approach, accepting manual bank details during onboarding and verifying later via micro-deposits, would have kept the application flow simpler and likely improved conversion for a meaningful segment of users.

Built with: Node.js, TypeScript, Plaid, Onfido, PostgreSQL, Redis