Blog · Email Sequencing

A/B Testing Your Cold Email: What to Test and What to Ignore

September 18, 2025 · 5 min read

A/B testing cold email is a practice most teams either do not do at all or do badly — running tests with sample sizes too small to be meaningful, testing the wrong variables, or failing to isolate what they are actually measuring. Done right, systematic testing is one of the most reliable ways to improve sequence performance over time.

The discipline of testing starts with a hypothesis. Not "let's try a different subject line and see what happens" — but "we believe that a question-based subject line will outperform a statement-based one because buyers are more curious about an open-ended prompt." Then you test that specific thing.

What is worth testing

Subject lines are the highest-leverage test because they determine whether the email is read. Test: short vs. long, question vs. statement, personalised vs. generic, with name vs. without. Each of these is a meaningful variable with an expected direction — test them systematically and build a picture of what your audience responds to.

First lines are the second-highest-leverage test. The first line determines whether a reader who opened the email continues past the first sentence. Test: opening with a specific observation about the company vs. opening with a problem statement vs. opening with a relevant piece of data. These produce meaningfully different click-through and reply patterns.

What is not worth testing yet

Do not test variables you cannot isolate. If you change the subject line, the first line, and the CTA at the same time and see a different result, you learn nothing. Change one thing. The most common mistake in sequence testing is running multi-variable experiments and drawing single-variable conclusions.

Do not test CTA variations, email length, or send timing until you have validated subject line and first line. These variables matter — but they matter less than getting the open and the read right. Fix the funnel from the top.

Sample size and significance

A test with 30 sends per variant is not a test — it is an anecdote. For statistical significance, you need a minimum of 100 sends per variant, and 200 or more is better. For a team sending 50 emails per day, running a meaningful subject line test takes at least a week. That timeline is frustrating but necessary.

Look at reply rate as the primary metric, not open rate. Open rate is affected by Apple Mail Privacy Protection and other tracking limitations that make it unreliable in many markets. Reply rate is a direct measure of whether the email achieved its goal.

← Back to Blog