SaaS Analytics Dashboard — Real-Time Data at Scale

The Problem

The team was drowning. Four departments, twelve analysts, and a reporting workflow built entirely on spreadsheets that nobody trusted. Every Monday morning started the same way: someone would pull raw exports from three different tools, paste them into a shared Google Sheet, and spend the next two hours reconciling numbers that never quite matched. By the time the report landed in the weekly all-hands, the data was already five days old.

Business decisions were being made on week-old data. The sales team was optimizing campaigns based on last week's conversion rates. Operations was staffing shifts using last month's demand patterns. Finance was forecasting on numbers that had already been revised twice before anyone saw them.

The manual reporting burden was staggering. Each analyst was spending 12+ hours per week just producing reports — not analyzing them, not acting on them, just producing them. That's 30% of a full-time role consumed by copy-paste work that a well-designed system should handle automatically.

The ask was clear: build a real-time analytics platform that ingests events from every product touchpoint, makes the data queryable in under 200ms, and gives each team a workspace tailored to their specific metrics. No more spreadsheets. No more Monday morning reconciliation. No more decisions made on stale data.

The Approach

The architecture had to solve three distinct problems simultaneously. First, ingestion: events arrive in bursts, and the system needs to handle spikes without dropping data or degrading query performance. Second, querying: analysts need to slice data by arbitrary dimensions — date ranges, user segments, product areas — and get results fast enough that exploration feels interactive. Third, real-time: dashboards need to update as events arrive, not on a polling interval.

I chose an event-driven architecture with PostgreSQL as the primary store. Events flow through an ingestion API, get written to an append-only events table, and then get aggregated into materialized views that power the dashboard queries. A WebSocket server sits alongside the HTTP API and pushes updates to connected clients whenever new aggregations are computed. Redis handles the pub/sub layer between the aggregation workers and the WebSocket server.

Role-based access control was a first-class concern from day one. Each team gets a workspace with its own set of dashboards, and row-level security in PostgreSQL ensures that a sales analyst can't accidentally query finance data. The RBAC model maps directly to database policies — no application-layer filtering that could be bypassed.

Architecture

Data Pipeline

Events arrive at the ingestion API as JSON payloads. The API validates the schema, assigns a timestamp, and writes to the events table. This table is append-only — no updates, no deletes. Every event that ever happened is preserved exactly as it arrived.

The aggregation layer runs as a background worker. Every 30 seconds, it reads new events since the last run and updates a set of materialized views: daily_event_counts, hourly_event_counts, user_session_summaries, and funnel_step_completions. These views are the query targets for the dashboard. They're pre-aggregated, indexed, and sized to return results in single-digit milliseconds.

The query API sits on top of these views. When an analyst filters by date range and user segment, the query hits a materialized view with a composite index on (date, segment_id). No full table scans. No real-time aggregation at query time. The heavy lifting happens in the background worker, not in the request path.

Real-Time Layer

The WebSocket server runs as a separate process alongside the Next.js API routes. When the aggregation worker finishes a run, it publishes a message to a Redis channel: aggregation:complete. The WebSocket server subscribes to this channel and broadcasts an update event to all connected clients.

On the client side, each dashboard component subscribes to the relevant data keys. When an update arrives, the component re-fetches its data from the query API. This is intentionally not a push model for the data itself — the WebSocket carries a signal, not a payload. The actual data comes from the same HTTP query API that handles initial page loads. This keeps the real-time path simple and avoids the complexity of streaming large datasets over WebSocket connections.

Optimistic updates handle the gap between user actions and confirmed data. When an analyst applies a filter, the UI updates immediately with a loading state, then replaces it with real data when the query resolves. The perceived latency is near-zero even when the actual query takes 150ms.

Role-Based Access Control

PostgreSQL row-level security policies enforce access at the database level. Each workspace has a workspace_id, and every table that contains workspace-specific data has an RLS policy that checks current_setting('app.workspace_id') against the row's workspace_id column.

The application sets this session variable at the start of every database connection using the authenticated user's workspace membership. Even if there's a bug in the application layer that constructs the wrong query, the database will refuse to return rows from the wrong workspace. Defense in depth.

Key Technical Decisions

PostgreSQL Materialized Views vs Elasticsearch

The obvious choice for an analytics system at this scale is Elasticsearch. It's purpose-built for aggregations, handles high write volumes gracefully, and has a rich query DSL. I chose PostgreSQL instead, and I'd make the same choice again.

The team already ran PostgreSQL for their transactional data. Adding Elasticsearch would mean a second operational system to monitor, back up, and keep in sync. The operational overhead of a dual-database architecture is real, and for a team without dedicated infrastructure engineers, it's a significant ongoing cost.

PostgreSQL materialized views gave us 90% of what Elasticsearch would have provided, with zero additional infrastructure. The queries we needed — time-series aggregations, filtered counts, funnel analysis — are all expressible in SQL and fast enough with proper indexing. The 10% we gave up was full-text search across event payloads, which wasn't a requirement.

WebSocket vs Server-Sent Events

SSE would have been simpler. It's unidirectional, works over standard HTTP, and doesn't require a separate server process. For a read-only dashboard that just needs to receive updates, SSE is often the right call.

I chose WebSocket because the product roadmap included collaborative features: analysts sharing dashboard states, commenting on data points, and triggering shared exports. A bidirectional connection was going to be necessary eventually. Building on WebSocket from the start avoided a migration later.

Recharts vs D3

D3 is the right tool when you need custom visualizations that don't fit standard chart types. For this project, every visualization was a standard type: line charts, bar charts, area charts, funnel charts. Recharts wraps D3 with a React-friendly API and handles the SVG rendering, axis scaling, and tooltip logic that would otherwise require hundreds of lines of D3 code.

The tradeoff is flexibility. Recharts makes standard charts easy and custom charts hard. For this project, that tradeoff was correct. The analysts needed reliable, readable charts — not novel visualizations.

Results

The platform launched to all four departments in week 12. The impact was immediate and measurable.

Daily Event Volume — 12-Month Growth

Event volume scaled from 45,000 events per day at launch to 2.1 million by month 12. The architecture handled this 46x growth without any schema changes or infrastructure upgrades. The materialized view approach meant that query performance actually improved as we added indexes tuned to the most common query patterns.

Weekly Reporting Hours — Before vs After Dashboard

The reporting time reduction was the metric that mattered most to the business. Sales went from 8 hours of weekly reporting to 1.5 hours. Marketing dropped from 12 hours to 2. Operations, which had the most complex reporting requirements, went from 15 hours to 3. Finance, with the most structured data, achieved the biggest relative reduction: 10 hours down to 1.

Across all four departments, the platform eliminated 68% of manual reporting work. That's roughly 30 analyst-hours per week returned to actual analysis.

Query response times stayed under 200ms at the 95th percentile throughout the growth period. The materialized view refresh cycle runs every 30 seconds, so the maximum data lag is 30 seconds — down from the previous lag of 5 to 7 days.

Uptime has been 99.9% since launch. The only downtime was a planned 4-minute maintenance window for a PostgreSQL minor version upgrade.

What Worked

The materialized view pattern. Pre-aggregating data in the background and serving queries from indexed views was the right call. Query performance was predictable from day one, and the aggregation logic lives in SQL that any engineer on the team can read and modify.

Row-level security at the database layer. Enforcing access control in PostgreSQL rather than the application layer meant that workspace isolation was guaranteed regardless of application bugs. We caught two application-layer query construction bugs in testing that would have leaked data without RLS.

WebSocket as a signal, not a data pipe. Keeping the real-time update mechanism simple — a signal that triggers a re-fetch rather than a streaming data payload — made the system much easier to reason about. The query API is the single source of truth for data, and the WebSocket just tells clients when to ask for fresh data.

TypeScript end-to-end. Shared types between the ingestion API, query API, and frontend meant that schema changes propagated as compiler errors rather than runtime surprises. When we added a new event type in month 4, the TypeScript compiler identified every place in the codebase that needed updating.

What I'd Reconsider

The 30-second aggregation cycle. For most use cases, 30-second data lag is fine. But the sales team started using the dashboard for live campaign monitoring, where they wanted to see conversion rates update in near-real-time. A configurable refresh interval — or a separate fast-path for high-priority metrics — would have served them better.

Redis for pub/sub only. We're running a Redis instance solely for the WebSocket pub/sub channel. That's operational overhead for a single use case. In retrospect, PostgreSQL's LISTEN/NOTIFY mechanism would have handled this without an additional service. The latency characteristics are similar, and it would have kept the infrastructure simpler.

No query result caching. The materialized views are fast, but identical queries from multiple users still hit the database. Adding a short-lived cache (30 seconds, matching the aggregation cycle) in front of the query API would reduce database load significantly at higher user counts. It wasn't a problem at current scale, but it's the obvious next optimization.

Built with: Next.js · TypeScript · PostgreSQL · Redis · WebSockets · Recharts · Vercel