Metrics

How to measure experiment results without a data team

You don't need a data scientist to measure experiment results. This guide shows you practical, good-enough approaches that work for small teams with simple tools like spreadsheets and basic analytics.

February 1, 20265 min read

The biggest myth in growth experimentation is that you need statistical rigor to learn anything useful. Yes, Netflix and Booking.com run experiments with millions of users and 95% confidence intervals. You're not Netflix. You need to learn fast with limited traffic, and that requires different approaches.

This guide shows you practical measurement techniques that work when you have hundreds of users, not millions. They won't pass peer review, but they'll help you make better decisions than guessing, which is the only real alternative for most founders.

The before/after method: your simplest tool

The most accessible measurement method is before/after comparison. Measure your metric for a defined period, make a change, then measure for the same period after. If signups averaged 12 per day for two weeks, you change the landing page, and signups average 18 per day for the next two weeks, that's a signal worth paying attention to.

The obvious weakness is that other things change too. Maybe a blog post went viral during your "after" period. Maybe it's a seasonal effect. You can partially control for this by looking at metrics that shouldn't have changed. If overall traffic stayed the same but signups went up, your landing page change is likely the cause.

Use at least a one-week measurement period for each phase. Daily fluctuations are too noisy. Two weeks is better if you can afford the time. And pick periods that are comparable: don't compare a holiday week to a normal week.

How to know if your results are real

The question isn't "is this statistically significant?" It's "is this change big enough to matter and consistent enough to trust?" For a solo founder, a 50% improvement in a metric is probably real. A 5% improvement might be noise. Your threshold depends on your sample size.

A rough rule of thumb: if you have fewer than 100 data points per variant, you need to see at least a 20-30% difference to trust it. With 500+ data points, a 10% difference starts to be meaningful. With thousands, even 5% matters. These aren't statistical rules, but they're practical guidelines for making decisions under uncertainty.

Look at consistency over time, not just the average. If your metric improved by 30% overall but bounced wildly day to day, be cautious. If it improved by 20% and was consistently higher every single day, that's a stronger signal even though the magnitude is smaller.

Tools you already have

Google Analytics can tell you conversion rates, traffic sources, and user behavior. Set up goals or events for your key actions. Compare time periods using the date range comparison feature. This covers 80% of what you need for experiment measurement.

Your database is your most underrated analytics tool. A simple SQL query counting users who completed an action before and after a date gives you clean, accurate numbers without any tracking setup. If you're using Supabase, Postgres queries are straightforward: SELECT COUNT(*) FROM users WHERE created_at > '2026-01-15' AND completed_onboarding = true.

A spreadsheet is often the best experiment tracker. Create columns for: experiment name, hypothesis, start date, end date, primary metric (before), primary metric (after), percent change, and lessons learned. This simple log becomes incredibly valuable as it grows. After 20 experiments, you'll have real data about what works for your specific product and audience.

When to use proper A/B testing

Graduate to real A/B testing when you have enough traffic to split it meaningfully. The threshold is roughly 1,000+ visitors per week to the page you're testing. Below that, an A/B test will take months to reach significance, and before/after is faster.

Free and affordable tools include Google Optimize (free), PostHog (generous free tier), and Statsig (free for small volumes). These handle the randomization, tracking, and statistical analysis for you. The setup takes a couple of hours but saves time on every subsequent experiment.

Even with A/B testing tools, keep your experiment log. The tool tells you that variant B beat variant A by 15%. Your log captures why you ran the test, what you expected, and what the result means for your product strategy. The tool measures; you interpret.

Common measurement mistakes to avoid

Peeking is the most common mistake. You check results daily and stop the experiment as soon as it looks good. The problem is that early results are unreliable. Commit to a timeline before you start and stick to it. If you said two weeks, wait two weeks even if the results look amazing after three days.

Another mistake is measuring too many things. When you track ten metrics, at least one will show a positive result by chance. Pick your primary metric before the experiment starts. Look at secondary metrics for context, but make your go/no-go decision based on the primary one.

Finally, don't ignore negative results. If your experiment made things worse, that's extremely valuable information. It means your assumption was wrong, and understanding why saves you from making similar mistakes. Document negative results with the same rigor as positive ones.

Problems this guide helps with

Users sign up and disappear

Your signup numbers look good, but users vanish after day one. They create an account, maybe poke around, then never return. You're filling a leaky bucket.

Users try your product but don't get it

Users sign up, click around, and leave confused. They don't understand what your product does or why they need it. Your onboarding isn't landing.

Put this into practice

Golden Gecko gives you proven playbooks matched to your goals, step-by-step guidance, and AI that tells you what results mean.