A/B testing is where direct response marketing meets the scientific method. You form a hypothesis, design an experiment, collect data, and draw conclusions. For developers, this should feel natural — it’s the same debugging process you use when shipping code, applied to your marketing. Here’s how to run A/B tests that produce reliable results, not noise.
What A/B Testing Actually Is
An A/B test shows two different versions of a page to different visitors at the same time, then measures which version produces more conversions. Half your visitors see Version A (the control — your current page), and half see Version B (the variant — your hypothesis).
The critical word is “at the same time.” Comparing this week’s conversion rate to last week’s after making a change isn’t A/B testing — it’s guessing. Traffic quality, day of week, seasonal trends, and dozens of other variables change over time. True A/B testing controls for all of these by running both versions simultaneously.
Think of it like this: an A/B test is a controlled experiment with one independent variable (the element you changed), one dependent variable (your conversion metric), and random assignment to treatment groups (which version each visitor sees).
Setting Up Your First Test
Before you touch any testing tool, define three things: what you’re testing, what metric you’re measuring, and what result would convince you to make a change.
What to test: Change one element at a time. If you change the headline and the CTA button simultaneously, you won’t know which change caused the result. Start with high-impact elements: headline, CTA copy, hero section layout, social proof placement, or pricing display.
What to measure: Pick one primary metric. For most SaaS landing pages, that’s signup conversion rate — the percentage of visitors who complete your desired action. Don’t track ten metrics and cherry-pick the one that looks good. Define your primary metric before the test starts.
What constitutes a win: Decide your minimum detectable effect in advance. Is a 5% relative improvement worth the complexity of the change? What about 10%? If your current conversion rate is 3%, a 10% relative improvement means moving to 3.3% — which across 10,000 visitors means 30 extra signups. Decide if that’s meaningful for your business.
Choosing a Testing Tool
You don’t need expensive enterprise software. Here are the best options for indie developers.
PostHog Experiments is free and open-source. It integrates with your existing PostHog analytics, supports feature flags for variant assignment, and has built-in statistical analysis. If you’re already using PostHog for product analytics, this is the obvious choice.
Vercel Edge Config with middleware lets you split traffic at the edge for Next.js apps. You’ll need to build your own analysis, but the traffic splitting is built into your deployment platform.
Google Optimize sunsetted, but tools like VWO and Optimizely offer free tiers. VWO’s visual editor lets you create variants without touching code, which is useful for testing copy changes quickly.
For the simplest possible setup, you can build your own. Generate a random number on page load, show variant A or B based on the result, and fire a custom event to your analytics tool (Plausible, PostHog, or even Google Analytics) when the page loads and when a conversion happens. It’s not the most sophisticated approach, but it works for your first few tests.
Sample Size: The Math That Matters
The biggest mistake developers make with A/B testing is calling a winner too early. Seeing Version B at 4.2% versus Version A at 3.1% after 100 visitors means absolutely nothing statistically.
Here’s why: with small sample sizes, random variation dominates. Flip a coin 10 times and you might get 7 heads. That doesn’t mean the coin is biased. You need enough flips for the true probability to emerge.
The minimum sample size depends on three factors: your baseline conversion rate, the minimum improvement you want to detect, and your desired statistical confidence level.
For a landing page converting at 3% and a minimum detectable effect of 20% relative improvement (moving from 3% to 3.6%), at 95% confidence and 80% statistical power, you need roughly 5,200 visitors per variant — 10,400 total. At 500 visitors per day, that’s a 21-day test.
If those numbers seem large, that’s the reality. With lower traffic, you have two options: test bigger changes that produce larger effects (which need fewer visitors to detect), or accept lower confidence levels. For indie developers with modest traffic, testing dramatic changes — completely different headlines, different page layouts, different offers — is more practical than testing button colors.
Running the Test Correctly
Once your test is live, follow these rules to ensure valid results.
Don’t peek. This is the hardest rule to follow. Checking results daily and stopping the test when one variant “looks good” dramatically inflates your false positive rate. A commonly cited study showed that peeking and stopping early gives you a false positive rate of 25-30% instead of the 5% you think you have. Set a minimum run time and sample size, then don’t look until you’ve hit both.
Run for full weeks. Conversion behavior varies by day of week. A test that runs Monday through Thursday misses weekend traffic patterns. Always run for a minimum of 7 days, ideally 14.
Don’t change anything during the test. No price changes, no new blog posts driving different traffic, no homepage redesigns. Any external change invalidates your results because you’ve introduced confounding variables.
Account for novelty effects. If your existing users see the variant, they might interact differently simply because it’s new, not because it’s better. For landing page tests with mostly new visitors, this is less of a concern. For product changes with existing users, run the test long enough for the novelty to wear off.
Analyzing Results
When your test reaches the required sample size and minimum duration, it’s time to analyze.
Most testing tools give you a confidence level. The industry standard is 95% — meaning there’s only a 5% chance the observed difference is due to random variation. If your tool says “95% confidence that B is better than A,” you can confidently implement the change.
If your result is inconclusive (say, 70% confidence), it means the difference between variants isn’t large enough to detect with your sample size. This is still useful information — it tells you that the element you changed isn’t a major conversion driver, and you should test something else.
Always check for segment differences. Maybe Variant B performs better on mobile but worse on desktop. Maybe it converts more signups but those signups have lower activation rates. A headline that attracts the wrong audience can increase signups while decreasing revenue.
What to Test on Your Landing Page (In Priority Order)
Here’s the testing roadmap I recommend for developer product landing pages, ordered by typical impact.
Test 1: Headline. Write three to five dramatically different headlines using the templates from our headline copywriting guide and the developer copywriting playbook. Test the two most different ones first. Headline tests often produce the largest improvements because they affect whether anyone reads the rest of your page.
Test 2: CTA copy and placement. Test different action verbs, different value propositions in the button text, and adding a second CTA higher on the page. “Start Free Trial” versus “See It In Action” versus “Get Your Dashboard” can produce very different results.
Test 3: Social proof. Test adding testimonials versus removing them. Test specific numbers (“12,847 developers”) versus vague ones (“Thousands of developers”). Test placing social proof above the fold versus below.
Test 4: Page length. Test a long, detailed page against a short, punchy one. For expensive products, longer pages typically win. For free tools, shorter pages often convert better.
Test 5: Pricing presentation. If you show pricing on your landing page, test different anchoring strategies, different tier highlighting, and whether showing pricing at all helps or hurts conversion.
Common Mistakes to Avoid
Testing too many things at once. Multivariate testing requires exponentially more traffic. Stick to one change per test until you have tens of thousands of monthly visitors.
Celebrating too early. A 50% improvement on 100 visitors is noise. Wait for significance.
Testing trivial changes. Button color tests are the classic example of wasted effort. Unless you’re Amazon with millions of daily visitors, focus on changes that could produce at least a 20% relative improvement.
Ignoring downstream metrics. Higher signup rates don’t matter if those signups never activate. Connect your A/B test data to your full funnel metrics from your analytics setup.
Never testing at all. The worst A/B testing mistake is not testing. Every day your landing page runs without a test, you’re leaving data on the table. Even with modest traffic, testing two dramatically different approaches over a month gives you actionable information.
A/B testing is the optimization layer of the DRM funnel. It’s how you systematically improve every element of your marketing until the whole system compounds. Start with your landing page headline this week, and let the data tell you what works.
The same testing discipline applies beyond your landing page. Once your on-page conversion is solid, retargeting ads give you another layer to optimize — reaching visitors who left without converting and testing different messages to bring them back.
Related Articles
// frequently asked questions
Common Questions
How much traffic do I need for A/B testing?
For a statistically significant result at 95% confidence, you typically need 200-500 visitors per variation. With a 3% baseline conversion rate, expect to need roughly 1,000-2,500 total visitors for a meaningful test.
What should I A/B test first on my landing page?
Start with your headline. It's the highest-impact element and affects whether visitors see anything else on your page. After the headline, test your CTA button copy, then your social proof section.
How long should I run an A/B test?
Run tests for a minimum of 7 days to account for day-of-week variations, and until you reach statistical significance. Ending a test early because one variant 'looks like it's winning' is the most common A/B testing mistake.
What's a good conversion rate improvement from A/B testing?
A 10-20% relative improvement is a solid win. If your baseline is 3%, reaching 3.5% is meaningful at scale. Occasionally you'll see 50%+ improvements from a headline or CTA change, but don't expect that on every test.
Want More Marketing Tactics?
Subscribe to the CodeToCash newsletter for weekly articles, playbooks, and DRM strategies for developer entrepreneurs.
Trusted by indie developers shipping SaaS, tools, and templates. No spam.
// discussion
Questions? Thoughts?
Join the conversation below. You'll need a free GitHub account to comment.