DTC Fashion Storefront - AI Recommendations and Checkout That Converts
How we rebuilt a fashion brand's product discovery and checkout flow using collaborative filtering, pgvector embeddings, and Stripe - lifting checkout completion by 22% and growing recommendation-driven revenue to 14% of total.
0%
Checkout Conversion Lift
0.0x
Avg Order Value
<0ms
Rec P95 (warm cache)
0%
Revenue from Recs
The Problem
The brand's merchandising manager was spending 20 hours a week manually updating homepage collections. Meanwhile, 72% of customers who added items to their cart left without buying, after browsing 15 to 20 products each. Traffic had grown 40% over eight months. Revenue hadn't moved.
The product discovery experience was static: the same bestsellers shown to every visitor regardless of behavior, style preference, or purchase history. A customer who'd bought three minimalist black pieces got the same homepage as someone who'd bought colorful printed dresses. The site had no memory.
Checkout made it worse. Six steps, redundant address fields, no saved payment methods, no Apple Pay. Customers who made it to checkout still abandoned at a 72% rate. The problem wasn't acquisition; the brand was spending heavily on paid social and it was working. The problem was that the site couldn't close.
The Approach
Two tracks ran sequentially. The first: an AI recommendation engine using collaborative filtering over session events, with pgvector embeddings as the similarity backbone. The second: a checkout redesign using Stripe's hosted UI to collapse six steps to three and add Apple Pay and Google Pay without touching PCI scope.
Both tracks ran as separate A/B experiments with 50/50 splits and 3-week windows. We ran them sequentially (checkout redesign first, then recommendations) because running both simultaneously would have made attribution impossible at our traffic volume (~6,000 monthly sessions).
The recommendation engine had one hard constraint: it needed to handle cold-start for new visitors before any session data existed. Showing obviously wrong recommendations to first-time visitors is worse than showing nothing personalized at all.
Architecture
Recommendation Engine
Product embeddings were generated using OpenAI's text-embedding-3-small model across 12,000 SKUs. Title, description, category, and material attributes were concatenated into a single embedding input. These vectors are stored in PostgreSQL with the pgvector extension, indexed with HNSW for approximate nearest-neighbor retrieval.
Collaborative filtering runs over a 30-day rolling window of session events: views, add-to-carts, purchases, and wishlist additions. At request time, the engine scores candidate products by combining item-based similarity (from the embedding index) with user-based collaborative signals (from the interaction matrix). Results are cached in Redis with a 4-hour TTL keyed by session ID. Under 120ms P95 on warm cache (Redis TTL: 4 hours). Cold cache P95 was ~280ms, acceptable given a 91% cache hit rate. New visitors with no session history fall back to category bestsellers: simple, not embarrassing.
Checkout Flow
Stripe Checkout's hosted page replaced the custom six-field form. Stripe handles PCI compliance, Apple Pay, Google Pay, and card prefill for returning Stripe customers. The order lifecycle runs as a state machine: pending → payment_processing → paid → fulfillment → shipped. State transitions are driven by Stripe webhooks, with idempotency keys on every payment operation to prevent double-charges on webhook retries.
The frontend uses optimistic UI updates throughout. When a customer clicks "Pay," the UI transitions immediately to a confirmation state. If the webhook confirms payment, the state persists. If it fails, the UI rolls back with an error. The perceived latency is near-zero even when the Stripe webhook takes a few seconds to arrive.
Personalization Layer
Session events stream into an append-only events table. A nightly cron job rebuilds the user-product interaction matrix from the trailing 30 days, applying a preference decay function that weights recent events more heavily than older ones. A view from three days ago counts for less than a purchase from yesterday.
When the user matrix is too sparse for reliable collaborative filtering (new users, infrequent visitors), the engine falls back to item-based similarity using the pgvector index directly. This fallback is transparent to the user and produces reasonable results without requiring any session history.
Key Technical Decisions
PostgreSQL + pgvector over Pinecone
Pinecone is purpose-built for vector search but adds a managed service dependency, a separate billing relationship, and a network hop. pgvector with HNSW gave us 95% of Pinecone's recall at p99 latency under 120ms, well within interactive budget. The operational simplicity of staying on one database was worth that 5%.
Stripe Checkout over a custom payment form
A custom form gives full UI control and marginally better conversion on brand-trust metrics. But it also means PCI SAQ-A-EP compliance, custom Apple Pay integration, and handling 15+ edge cases in card validation. Stripe Checkout's hosted page converted 8% better than our custom form in the test, partially because Stripe prefills known cards, partially because users trust the Stripe brand at payment time.
Collaborative filtering over content-based recommendations
Content-based recommendations (similar style, color, category) are reliable from day one but plateau quickly; they surface what's similar, not what actually converts. Collaborative filtering needs 30 days of session data to outperform content-based, but once it has it, CTR was 40% higher in holdout testing. The cold-start period was worth the payoff. The 'collaborative filtering' is closer to item-based similarity using the pgvector embeddings, with session co-occurrence signals mixed in. Calling it collaborative filtering is generous. It's a hybrid that leans heavily on content similarity for the first 30 days until enough session data accumulates.
Results
The checkout funnel tells the story. Checkout completion rose from 28% to 34%, a 22% relative lift. The biggest drop-off reduction came between "Checkout Start" and "Payment," where Stripe's hosted UI replaced a six-field form with a prefilled Stripe-native experience. Cart abandonment dropped from 72% to 66%.
Checkout Funnel: Conversion Rate by Stage (%)
Recommendation-driven revenue started at 2% of total in month 1 (cold-start, mostly bestseller fallbacks) and grew to 14% by month 6 as the collaborative filtering model accumulated session data. Customers who engaged with recommendations had 1.4x higher AOV ($68 → $94). Higher-intent shoppers likely self-select into recommendation paths, so we couldn't cleanly isolate causation from selection bias. At the 6-month post-launch check-in (reported by the client's analytics team; I wasn't involved in ongoing monitoring), recommendations accounted for 14% of total revenue, up from 2% at launch.
Recommendation-Driven Revenue Share: % of Total
The merchandising manager's 20-hour weekly workload dropped to 6 hours a week. Homepage collections now update automatically based on trending signals from the session event stream. Manual curation shifted from maintenance to strategy.
The brand's existing design team handled the visual merchandising and product photography. I built the recommendation engine and checkout integration; they owned the look and feel of how recommendations were displayed.
What Worked
pgvector retrieval speed. HNSW indexing on 12,000 embeddings returned nearest neighbors under 20ms. Combined with the Redis cache, recommendation API p99 latency stayed under 120ms throughout the 8 months.
Stripe's hosted checkout. Offloading PCI compliance and Apple Pay to Stripe's hosted page removed three weeks of engineering work and immediately added a payment method that 18% of mobile users chose.
A/B testing from day one. Running the recommendation and checkout experiments sequentially gave us clean attribution. We could isolate the checkout redesign's impact before layering in recommendation changes.
Bestseller fallback for cold-start. New visitors saw category bestsellers instead of personalized recommendations. Simple, not embarrassing, and avoids showing obviously wrong results to first-time users.
What Didn't Work
The first version of the recommendation engine surfaced visually similar items, which sounds right until you realize that showing someone who just bought black jeans four more pairs of black jeans is useless. The model was over-indexing on color and silhouette because those dominated the embedding input. Adding 'frequently bought together' co-occurrence signals from purchase history fixed the diversity problem, but it took two weeks of underwhelming A/B results before we diagnosed the cause.
What I'd Reconsider
The cold-start handling was a hack. Showing category bestsellers to new visitors works, but it's indistinguishable from a non-personalized storefront. A better approach: use purchase history from email capture at registration to bootstrap a preference vector before the first session, cutting the cold-start window from 30 days to near-zero.
No offline evaluation before shipping. We built the recommendation model and shipped it to the A/B test without an offline evaluation dataset. We had no ground truth for expected precision or recall before going live. Building an evaluation suite first, using historical purchase data as held-out labels, would have let us tune the model without burning real user traffic.
Redis cache TTL uniformity. The 4-hour blanket TTL works but treats all sessions identically. For returning users with stable preference vectors, a longer TTL of 12 to 24 hours would reduce unnecessary cache rebuilds. For first-session visitors whose preferences update rapidly as they browse, a shorter TTL allows faster preference capture. A per-cohort TTL strategy would have improved both cache efficiency and recommendation freshness from the start.
Built with: Next.js · TypeScript · Stripe · PostgreSQL · Redis · Vercel