Multi-Chain Wallet Engine with Stablecoin Settlement

The Problem

The startup's settlement process was a person with a spreadsheet and a MetaMask wallet. When a merchant received a crypto payment, someone on the ops team would notice it, manually convert the funds to USDC on a DEX, and send the USDC to the merchant's bank via a fiat off-ramp. On a good day, that took four hours. On a bad day, when the ops person was in a meeting or the gas fees spiked, it took until the next morning.

That worked at $80K monthly volume. At $320K, the ops team was spending half their day on settlement. At $480K, they started missing payments. Merchants were calling to ask where their money was. The answer was always some version of "we're working on it."

The technical picture was worse than the operational one. All deposits went to a single hot wallet address per chain. There was no way to attribute a deposit to a specific merchant without checking the timestamp and cross-referencing the spreadsheet. The wallet held weeks of accumulated funds because the manual sweep process was slow. Private keys lived in a shared password manager. There was no retry logic for failed transactions, no deduplication, no audit trail beyond the spreadsheet.

I came in with a brief to build something that could handle $3M monthly volume without adding headcount.

The Approach

The core problem was attribution: knowing which deposit belonged to which merchant, on which chain, at what moment. Everything else, settlement, sweeps, reconciliation, followed from solving that.

HD wallet derivation (BIP-44) solved attribution cleanly. Instead of one address per chain, each merchant gets a unique deposit address derived from a master seed using a deterministic path: m/44'/60'/merchantIndex'/0/0. The address is stable, reproducible from the seed, and unique per merchant. Deposit detection becomes trivial: when a transfer arrives at a known address, the merchant is identified by the derivation path. No spreadsheet lookup, no timestamp matching.

For settlement, the goal was USDC on Arbitrum as the canonical output. Arbitrum because the fees are low enough that small settlements don't get eaten by gas. USDC because merchants wanted stable value, not crypto exposure. The settlement engine converts incoming tokens to USDC via a DEX aggregator (1inch), bridges to Arbitrum if needed, and sends to the merchant's configured payout address. Settlement completes under 90 seconds for L2 chains (Arbitrum, Base, Optimism, Polygon); 4-6 minutes on mainnet where we wait for 3-confirmation finality. The whole path runs automatically, triggered by deposit detection.

The architecture: Alchemy webhooks for deposit detection, BullMQ for the transaction queue, PostgreSQL as the financial ledger, Redis for nonce management and idempotency, AWS KMS for key management. Started with ethers.js for the prototype; migrated to viem mid-project for better TypeScript inference and tree-shaking. The settlement hot path was rewritten; a handful of webhook handler utilities retained ethers.js patterns until a planned cleanup.

Architecture

The startup's legal team handled AML/KYC for merchant onboarding and regulatory filings. My scope was the transaction engine and wallet infrastructure. I didn't touch compliance or merchant-facing UI.

Deposit Detection

Alchemy's address activity webhooks fire within seconds of a confirmed transaction. Each merchant's deposit address is registered with Alchemy on creation. When a webhook arrives, the handler verifies the Alchemy signature, extracts the transfer details, and enqueues a deposit.detected job in BullMQ.

The webhook handler is intentionally thin. It validates the signature, writes a raw event record to PostgreSQL, and enqueues the job. Nothing else. If BullMQ is down, the raw event is still in the database. A recovery job can replay from raw events without losing anything.

Nonce Management

Nonce management across five chains was the first thing that broke in testing. The naive approach, reading the current nonce from the chain before each transaction, fails under concurrent submissions: two jobs reading the same nonce both try to submit with that nonce, one succeeds, one gets stuck.

A Redis-backed nonce cache per chain solved this. On startup, the engine reads the current on-chain nonce for each hot wallet address and stores it in Redis. Every transaction submission increments the Redis nonce atomically before broadcasting. If a transaction fails, the nonce is decremented back. If the process crashes, a reconciliation job on startup compares the Redis nonce against the on-chain nonce and resets if they've diverged.

This gave us reliable concurrent transaction submission without stuck transactions. The edge case that still required care: chain reorganizations. A reorg can invalidate a confirmed transaction, which means the nonce it consumed is now available again. The reconciliation job handles this by checking transaction finality (12 confirmations on Ethereum, 64 on Polygon) before treating a nonce as permanently consumed.

Gas Strategy

EIP-1559 chains (Ethereum, Arbitrum, Optimism, Base) use the fee market: maxFeePerGas and maxPriorityFeePerGas. Polygon still supports legacy gas pricing. The gas estimation layer handles both, with a fallback path for chains that don't support eth_feeHistory.

For EIP-1559 chains, the engine samples the last 10 blocks to compute a base fee trend, then sets maxFeePerGas at 120% of the current base fee and maxPriorityFeePerGas at the 75th percentile of recent priority fees. That's aggressive enough to get included in the next block under normal conditions, conservative enough not to overpay during spikes.

Gas estimation failures are treated as hard stops. A transaction with a bad gas estimate is worse than a delayed transaction: it either gets stuck in the mempool or overpays by an order of magnitude. If the gas estimate fails, the job retries with exponential backoff rather than submitting with a guess.

PostgreSQL Ledger

The ledger is append-only. Every deposit, settlement, sweep, and fee is a row in wallet_transactions. No updates. The current state of any merchant's balance is the sum of all rows for that merchant, which PostgreSQL computes in under 10ms with the right indexes.

Each row carries: merchant_id, chain_id, tx_hash, from_address, to_address, token, amount, fee_amount, status, created_at, and an idempotency_key. The idempotency key is sha256(chain_id + tx_hash + event_type). Before writing any transaction record, the engine checks for an existing row with the same key. If it exists, the operation is a no-op. This is the mechanism behind zero lost transactions: every operation is safe to retry, and retries never create duplicate records.

Hot and Cold Wallet Architecture

The hot wallet holds a maximum of 24 hours of expected volume, calculated as a rolling 7-day average. Every night at 2am UTC, a sweep job moves excess funds from the hot wallet to cold storage. The cold wallet's private key lives in AWS KMS and never touches application memory. The sweep job calls KMS to sign the transaction, gets back a signed payload, and broadcasts it. The application never sees the key.

Hot wallet keys also live in KMS, accessed via IAM roles scoped to the specific Lambda functions that need signing access. Key rotation happens quarterly, with the new key deriving the same BIP-44 paths from a new master seed. Merchant deposit addresses change on rotation, which requires notifying merchants, but the old addresses remain monitored for 90 days to catch late deposits.

The startup's CTO reviewed all smart contract interaction patterns and signed off on the hot/cold wallet thresholds.

Key Technical Decisions

HD Wallets vs One Address Per Merchant

The alternative was simpler: generate a fresh private key for each merchant, store it encrypted in the database. That's how most crypto payment processors worked at the time.

The problem with that approach is key management as the merchant count grows. A hundred merchants means a hundred private keys to back up, rotate, and secure. If the database backup is compromised, every merchant's funds are at risk. With HD wallets, there's one secret: the master seed. Compromise the seed and you've compromised everything, but there's only one thing to protect, one thing to back up, one thing to rotate.

BIP-44 also gives you deterministic recovery. If the database is lost, the master seed alone can regenerate every merchant's deposit address. That's not true with individually generated keys unless you've backed up every key separately.

The tradeoff: HD wallet derivation requires knowing the merchant's index to derive their address. That index lives in the database. If the database is lost and the seed is intact, you can brute-force the first N derivation paths to find active addresses, but it's not clean. I'd add a separate encrypted backup of the merchant-to-index mapping to make recovery unambiguous.

Chain Selection for Settlement

The original design settled everything on Ethereum. That was wrong.

In month 3, a gas spike on Ethereum pushed base fees above 200 gwei for three days. Settlement transactions were getting stuck in the mempool because the gas estimates from the previous block were already stale by the time the transaction broadcast. Six transactions sat pending for hours. We added admin tooling to manually bump gas on stuck transactions, then implemented Arbitrum as a fallback settlement chain. Should have done it from day one. Those three days of degraded service taught me more about gas strategy than the previous months of smooth operation.

I added Arbitrum as a fallback settlement chain. Arbitrum fees are 10-50x lower than Ethereum and don't spike in the same way because the sequencer controls transaction ordering. The settlement engine now routes based on amount: transactions above $10K settle on Ethereum (where the absolute fee is a small percentage of the amount), everything else settles on Arbitrum. The routing threshold is config-driven.

This is the change I should have made from the start. The month-3 incident was three days of degraded service that could have been avoided.

Idempotency: Redis Plus PostgreSQL

The payment-engine project I'd built earlier used Redis as the sole idempotency store, which meant a Redis outage created a fail-open window. That experience directly informed the two-layer approach here.

For a crypto transaction engine, fail-open on idempotency means double-spending. That's not acceptable.

The solution was two-layer idempotency: fail-closed on idempotency. Redis is the fast path: check Redis first, return cached result if the key exists. If Redis is unavailable, fall through to PostgreSQL: check the wallet_transactions table for a row with the matching idempotency key. PostgreSQL is always the source of truth. Redis is a cache in front of it.

The cost is two reads on the critical path when Redis is healthy. The benefit is that a Redis outage doesn't create a window where duplicate transactions can slip through.

Results

Before: four-hour average settlement time, ops team manually processing every payment, a single hot wallet per chain with no attribution, private keys in a shared password manager.

After the first 5 months in production: settlement completes in under 90 seconds for L2 chains (Arbitrum, Base, Optimism, Polygon); 4-6 minutes on mainnet where we wait for 3-confirmation finality. Zero manual intervention required, full attribution per merchant per chain, private keys in KMS with no application-layer access.

5-Month Transaction Volume (USD)

Volume grew from $320K in the first month to $1.8M by month 5. The month-3 dip from $1.1M to $860K was the Ethereum gas spike incident. After adding Arbitrum as the fallback settlement chain, volume recovered and continued growing without another incident.

What Worked

BIP-44 derivation for merchant addresses. One master seed, deterministic paths, clean attribution. The key management story went from "a hundred encrypted keys in the database" to "one secret in KMS." Recovery from a database loss is a seed phrase and a derivation index, not a key export and re-import process.

Two-layer idempotency. Redis for speed, PostgreSQL for correctness. The combination meant zero lost transactions across the first 5 months of production, including one Redis failover event and one PostgreSQL replica promotion. Neither incident caused a duplicate or missing transaction.

Redis nonce cache with reconciliation. Concurrent transaction submission across five chains without stuck transactions. The reconciliation job on startup meant that process crashes didn't leave the nonce cache in a bad state. In practice, the reconciliation ran on every deploy and caught drift once in 5 months.

Amount-based chain routing. Routing large settlements to Ethereum and small ones to Arbitrum kept fees proportional to transaction size. The average fee as a percentage of settlement amount stayed under 0.3% across all chains. Without routing, Arbitrum's flat fee structure would have been fine for small amounts but wasteful for large ones, and Ethereum's variable fees would have been catastrophic for small amounts during gas spikes.

What I'd Reconsider

The nonce reconciliation job runs on startup. That means every deploy triggers a reconciliation, which takes 5-10 seconds per chain as it queries on-chain state. With five chains, that's 25-50 seconds of startup time before the engine can submit transactions. For a service that deploys multiple times per day, that's noticeable. A background reconciliation that runs periodically rather than at startup would be cleaner.

The merchant notification flow for key rotation was manual. I built the rotation mechanism but left the "notify merchants their deposit address is changing" step as a manual ops task. That was a mistake. Address changes are confusing for merchants, and a manual notification process means the timing is inconsistent. An automated notification with a 30-day lead time, sent via the same webhook infrastructure used for settlement confirmations, would have been the right call.

The DEX aggregator integration used 1inch's API directly. That created a dependency on 1inch's uptime for every settlement. A fallback to a second aggregator (Paraswap or 0x) would have been straightforward to add and would have eliminated the two incidents where 1inch API latency caused settlement delays. I'd add that in the first week of a second pass.

Built with: Node.js, TypeScript, PostgreSQL, Redis, viem, Alchemy