DTC Fashion Storefront - AI Recommendations and Checkout That Converts
How we rebuilt a fashion brand's product discovery and checkout flow using collaborative filtering, pgvector embeddings, and Stripe - lifting checkout completion by 34% and growing recommendation-driven revenue to 28% of total.
0%
Checkout Conversion Lift
0.0x
Avg Order Value
<0ms
Rec Response Time
0%
Revenue from Recs
The Problem
The brand's merchandising manager was spending 20 hours a week manually updating homepage collections. Meanwhile, 72% of customers who added items to their cart left without buying — after browsing 15 to 20 products each. Traffic had grown 40% over eight months. Revenue hadn't moved.
The product discovery experience was static: the same bestsellers shown to every visitor regardless of behavior, style preference, or purchase history. A customer who'd bought three minimalist black pieces got the same homepage as someone who'd bought colorful printed dresses. The site had no memory.
Checkout made it worse. Six steps, redundant address fields, no saved payment methods, no Apple Pay. Customers who made it to checkout still abandoned at a 72% rate. The problem wasn't acquisition — the brand was spending heavily on paid social and it was working. The problem was that the site couldn't close.
The Approach
Two parallel tracks ran simultaneously. The first: an AI recommendation engine using collaborative filtering over session events, with pgvector embeddings as the similarity backbone. The second: a checkout redesign using Stripe's hosted UI to collapse six steps to three and add Apple Pay and Google Pay without touching PCI scope.
Both tracks ran as separate A/B experiments with 50/50 splits and 3-week windows. Running them simultaneously rather than sequentially let us see whether the effects compounded — whether better recommendations made checkout completion better too.
The recommendation engine had one hard constraint: it needed to handle cold-start for new visitors before any session data existed. Showing obviously wrong recommendations to first-time visitors is worse than showing nothing personalized at all.
Architecture
Recommendation Engine
Product embeddings were generated using OpenAI's text-embedding-3-small model across 12,000 SKUs — title, description, category, and material attributes concatenated into a single embedding input. These vectors are stored in PostgreSQL with the pgvector extension, indexed with HNSW for approximate nearest-neighbor retrieval.
Collaborative filtering runs over a 30-day rolling window of session events: views, add-to-carts, purchases, and wishlist additions. At request time, the engine scores candidate products by combining item-based similarity (from the embedding index) with user-based collaborative signals (from the interaction matrix). Results are cached in Redis with a 5-minute TTL keyed by session ID. New visitors with no session history fall back to category bestsellers — simple, not embarrassing.
Checkout Flow
Stripe Checkout's hosted page replaced the custom six-field form. Stripe handles PCI compliance, Apple Pay, Google Pay, and card prefill for returning Stripe customers. The order lifecycle runs as a state machine: pending → payment_processing → paid → fulfillment → shipped. State transitions are driven by Stripe webhooks, with idempotency keys on every payment operation to prevent double-charges on webhook retries.
The frontend uses optimistic UI updates throughout. When a customer clicks "Pay," the UI transitions immediately to a confirmation state. If the webhook confirms payment, the state persists. If it fails, the UI rolls back with an error. The perceived latency is near-zero even when the Stripe webhook takes a few seconds to arrive.
Personalization Layer
Session events stream into an append-only events table. A nightly cron job rebuilds the user-product interaction matrix from the trailing 30 days, applying a preference decay function that weights recent events more heavily than older ones. A view from three days ago counts for less than a purchase from yesterday.
When the user matrix is too sparse for reliable collaborative filtering — new users, infrequent visitors — the engine falls back to item-based similarity using the pgvector index directly. This fallback is transparent to the user and produces reasonable results without requiring any session history.
Key Technical Decisions
PostgreSQL + pgvector over Pinecone
Pinecone is purpose-built for vector search but adds a managed service dependency, a separate billing relationship, and a network hop. pgvector with HNSW gave us 95% of Pinecone's recall at p99 latency under 120ms — well within interactive budget. The operational simplicity of staying on one database was worth that 5%.
Stripe Checkout over a custom payment form
A custom form gives full UI control and marginally better conversion on brand-trust metrics. But it also means PCI SAQ-A-EP compliance, custom Apple Pay integration, and handling 15+ edge cases in card validation. Stripe Checkout's hosted page converted 8% better than our custom form in the test — partially because Stripe prefills known cards, partially because users trust the Stripe brand at payment time.
Collaborative filtering over content-based recommendations
Content-based recommendations (similar style, color, category) are reliable from day one but plateau quickly — they surface what's similar, not what actually converts. Collaborative filtering needs 30 days of session data to outperform content-based, but once it has it, CTR was 40% higher in holdout testing. The cold-start period was worth the payoff.
Results
The checkout funnel tells the story. Checkout completion rose from 28% to 37% — a 34% relative lift. The biggest drop-off reduction came between "Checkout Start" and "Payment," where Stripe's hosted UI replaced a six-field form with a prefilled Stripe-native experience. Cart abandonment dropped from 72% to 61%.
Checkout Funnel — Conversion Rate by Stage (%)
Recommendation-driven revenue started at 2% of total in month 1 — cold-start, mostly bestseller fallbacks — and grew to 28% by month 8 as the collaborative filtering model accumulated session data. Average order value on recommendation paths rose from $68 to $157. Customers who engaged with recommendations bought more items per session and bought from higher-margin categories.
Recommendation-Driven Revenue Share — % of Total
The merchandising manager's 20-hour weekly workload dropped to under 4 hours. Homepage collections now update automatically based on trending signals from the session event stream. Manual curation shifted from maintenance to strategy.
What Worked
pgvector retrieval speed. HNSW indexing on 12,000 embeddings returned nearest neighbors under 20ms. Combined with the Redis cache, recommendation API p99 latency stayed under 120ms throughout the 8 months.
Stripe's hosted checkout. Offloading PCI compliance and Apple Pay to Stripe's hosted page removed three weeks of engineering work and immediately added a payment method that 18% of mobile users chose.
A/B testing from day one. Running the recommendation and checkout experiments simultaneously — not sequentially — let us see interaction effects. Recommendation quality improved checkout completion independently; the effects compounded.
Bestseller fallback for cold-start. New visitors saw category bestsellers instead of personalized recommendations. Simple, not embarrassing, and avoids showing obviously wrong results to first-time users.
What I'd Reconsider
The cold-start handling was a hack. Showing category bestsellers to new visitors works, but it's indistinguishable from a non-personalized storefront. A better approach: use purchase history from email capture at registration to bootstrap a preference vector before the first session, cutting the cold-start window from 30 days to near-zero.
No offline evaluation before shipping. We built the recommendation model and shipped it to the A/B test without an offline evaluation dataset. We had no ground truth for expected precision or recall before going live. Building an evaluation harness first — using historical purchase data as held-out labels — would have let us tune the model without burning real user traffic.
Redis TTL too aggressive. A 5-minute cache TTL caused unnecessary cache churn for returning visitors browsing across sessions. Recommendation vectors for returning users are stable over hours, not minutes. A 30-minute TTL would have reduced cache misses by 80% with negligible staleness cost.
Built with: Next.js · TypeScript · Stripe · PostgreSQL · Redis · Vercel