SEO A/B Test Analyzer

Analyze and validate your SEO changes through SEO A/B testing with the Causal Impact approach.

Upload data
Run analysis
Review results

Input data

Upload a CSV with columns: date, variant, control.

CSV fileStart date of the test

Q&A

What does this tool do?

It estimates the causal impact of an SEO change by comparing observed variant performance to a modelled counterfactual built from historical variant and control data.

What input file do I need?

Upload a CSV with columns date, variant, and control. Values should be daily totals for clicks, sessions, or impressions.

How much data is recommended?

At least 100 pre-intervention days is recommended. The tool can run with less, but confidence and robustness are usually weaker.

How should I set the start date?

Choose the first day after the SEO change was launched on the variant group. Everything before is pre-period; everything after is post-period.

What is Impact on Clicks?

The estimated average relative effect in the post-period. Positive means uplift versus expected baseline; negative means decline.

What is Confidence Level?

A Bayesian confidence proxy from Causal Impact (1 - p_value). Higher values indicate lower probability that the observed effect is due to chance.

What is Daily Effect?

The estimated average absolute difference per day between observed variant performance and the modelled counterfactual.

What is Cumulative Effect?

The total absolute impact over the post-period. It sums day-level effects to estimate overall gain or loss.

How do I read chart 1 (actual vs predicted)?

If actual clicks diverge from predicted clicks after launch and stay separated, that suggests a sustained effect. The confidence band indicates uncertainty around predicted values.

How do I read chart 2 (cumulative effect)?

An upward slope indicates accumulating positive effect; downward indicates accumulating loss. Steeper slopes imply stronger day-level impact.

What do the chart toggles do?

Show confidence band: toggles uncertainty ranges. Normalize: scales series to an index for easier shape comparison. Pre/Post highlight: shades training and intervention periods.

What does Reliability Score summarize?

It combines statistical evidence, precision/stability, and robustness checks into a single quality score for decision confidence.

What is Statistical Evidence?

Evidence strength from confidence, interval separation from zero, and interval width. Higher means stronger signal and clearer direction.

What is Precision and Stability?

How tight and consistent the estimate is, based on pre-period fit quality and post-period uncertainty behavior.

What is Robustness?

How stable conclusions remain under placebo and sensitivity tests. Higher robustness means lower fragility risk.

What do GO / HOLD / NO-GO mean?

GO: positive and reliable signal. HOLD: evidence is mixed or fragile. NO-GO: reliably negative effect.

Common reasons for unreliable output?

Short pre-period, unstable control series, major concurrent external events, missing dates, or inconsistent aggregation logic between variant and control.

Can I use impressions or sessions instead of clicks?

Yes. Keep the metric consistent across both groups and all dates. Do not mix clicks with impressions in one run.

Back to home