What does this tool do?
It estimates the causal impact of an SEO change by comparing observed variant performance to a modelled counterfactual built from historical variant and control data.
Analyze and validate your SEO changes through SEO A/B testing with the Causal Impact approach.
Upload a CSV with columns: date, variant, control.
It estimates the causal impact of an SEO change by comparing observed variant performance to a modelled counterfactual built from historical variant and control data.
Use either CSV upload or Google Search Console mode. Both produce daily time-series data in the same structure: date, variant, control.
Connect GSC, choose a property and metric (clicks or impressions), set the fetch date range, and paste variant/control URL lists. The tool fetches and aggregates daily totals automatically.
Yes. Upload a CSV with columns date, variant, and control. Values should be daily totals for clicks, sessions, or impressions.
Yes. Keep one metric type consistent across variant and control for all dates within the same run.
At least 100 pre-intervention days is recommended. The tool can run with less, but confidence and robustness are usually weaker.
Choose the first day after the SEO change was launched on the variant group. Everything before is pre-period; everything after is post-period.
The average relative effect during the post-period. Positive values indicate uplift versus the modelled baseline; negative values indicate decline.
Confidence Level is calculated as 1 - p_value. Higher confidence means the observed impact is less likely to be random variation.
P value is the posterior tail-area probability from the model. Lower values are stronger evidence against a no-effect outcome.
The average absolute day-level difference between observed variant performance and the modelled counterfactual.
The total absolute impact over the post-period, built by summing the estimated day-level effects.
If actual and predicted lines separate after launch and remain separated, that suggests sustained impact. The confidence band shows forecast uncertainty.
An upward cumulative curve indicates accumulating positive impact; a downward curve indicates accumulating loss over time.
It combines effect size, confidence, interval behavior, fit quality, and robustness checks into one decision-oriented quality view.
It is a composite score from three components: Statistical evidence, Precision and stability, and Robustness.
This reflects the strength of the causal signal using confidence, whether the effect interval clearly excludes zero, and interval tightness.
This measures how tight and consistent estimates are, using pre-period fit quality and post-period uncertainty behavior.
This checks whether conclusions remain stable under placebo-date runs and sensitivity runs with different model assumptions.
This is the width of the relative-effect uncertainty interval. Narrower intervals generally indicate more precise estimates.
Pre-period coverage shows how often actual pre-period values fall inside the model's uncertainty bounds. Values around ~95% are typically healthy.
Advanced diagnostics is the technical validation layer behind Reliability Score. It includes: Residual autocorr (lag1) (leftover error pattern), Placebo false-positive rate (how often fake launch dates still look significant), Sensitivity spread (effect variation across model settings), and Sign consistency (whether uplift/decline direction stays stable). Use this section to confirm the result is not fragile.
GO: strong positive and reliable signal. HOLD: mixed or fragile evidence. NO-GO: reliably negative impact.
Short pre-period, unstable control series, major concurrent external events, missing dates, or inconsistent aggregation logic between variant and control.
Prefer the original Streamlit experience? Use the previous analyzer app.