Methodology

How AdZhi scores your ads

Every score is grounded in signal processing on the actual audio and video, AI linguistic evaluation calibrated against published research, and structural analysis mapped to proven persuasion frameworks. Here's exactly what feeds each score, how it becomes a forecast we grade against your real results, and where we're still building the evidence base.

The standard we hold every forecast to

The standard

Earned accuracy, or none at all

A forecast is only worth the proof behind it. AdZhi holds itself to one rule: accuracy is measured automatically, and shown on every report.

AdZhi only claims predictive power when it has earned it: measured out-of-sample, per account.

Graded against reality. Each forecast is frozen at analysis time and later compared to the outcome you record: the number it committed to, never one revised after the fact.
Skill, not just error. Accuracy is reported as how far the forecast beats the naive baseline of guessing your account's average. A forecast no better than that guess claims nothing.
Calibrated to you. The correction learns from your own realised results; an account-specific calibrator beats the pooled benchmark, and the report names which is in force.
Your score, re-weighted to you. Beyond the forecast, the AdZhi Score itself re-weights its signals toward what drives realised outcomes. Climbing from the global model to your industry benchmark to your own workspace as the data warrants, and applied only once it beats the more general weighting on your own data, so personalisation can never make your scores less predictive.
Honest until proven. Until a metric clears the bar on real outcome pairs, it reads “directional”, never “calibrated.”

The signal stack

Three layers of analysis, working together

Not a single model and a number. Each analysis runs acoustic measurement, linguistic evaluation and structural mapping, then combines them.

Acoustic

Measured from the waveform.

Pitch (F0) and variation over time
RMS energy envelope + CTA Momentum
Voice clarity proxy (credibility)
Speech rate, disfluency detection
Spectral shape (centroid, rolloff), attention decay model

Linguistic

Evaluated against research.

Hook strength (LLM-scored vs a persuasion rubric)
VADER sentiment (peer-reviewed NLP)
CTA identification + urgency
Value-proposition clarity
Self-critique loop for consistency

Structural

Mapped to frameworks.

Narrative arc → AIDA / PAS
Act completeness: hook, build, urgency, close
Timestamped gap detection
Section-level energy alignment
Opening pattern classification

How the numbers fit together

From raw signal to score

Every number AdZhi shows sits at one of three levels. Raw measurements feed the proprietary signals; the signals and core components roll up into the three headline scores.

Raw features

Measured directly from the waveform, the frames and the transcript: deterministic and reproducible. A representative core — the pipeline computes more still.

Show all 52 raw featuresHide raw features

Waveform

Pitch (F0)Pitch variationVoice clarityRMS energyEnergy dynamic rangeLoudness (dBFS)Spectral centroidZero-crossing rateMFCCShimmerTempo & beatMusical key & modeMusic / voice balancePause & silence ratioDramatic pausesPeak-energy timing

Frames

Scene-cut rateAverage shot lengthBrightnessContrastSaturationColour temperatureDominant paletteSharpnessMotion intensityCamera stabilityVisual entropyFace visibility %Eye-contact %Speaker countCaption coverage by thirdOn-screen text & CTACamera & edit styleAspect ratio

Transcript

Speech rate (wpm)Pacing varianceDisfluency rateHook delivery timeCTA timingsSentiment (VADER)Sentiment arcPrimary tonePower wordsUrgency cuesSocial proofCialdini principlesQuestionsBenefit vs featureClaims & compliance riskBrand-safety flagsNamed entitiesReadability

Proprietary signals

Coined signals that combine the raw features into a read you can act on. Hover any signal to see the features it’s built from. Composite + heuristic today, graded against your outcomes over time.

Persuasion Half-LifeTrust IndexCTA MomentumHook Decay RateEmotional CompressionAttention StabilityCognitive Load ScoreCreative Entropy

Headline scores

The top-line numbers you compare and act on.

AdZhi Score · 0-100Voice IQ · 0-100Hook Score · 0-10

Voice IQ vs Trust Index: Voice IQ is the overall delivery score; Trust Index is the specific credibility dimension within it. The AdZhi Score is a weighted composite of six components; everything runs across an 11-stage pipeline, and the 20-dimension Persuasion DNA fingerprint is a separate library-matching vector, not a score.

Score definitions

What each number means

AdZhi Score · 0-100

Overall creative effectiveness

A composite across six components: a single-number summary for comparing variants, creators and campaigns. Above 75 is strong; 50-75 points to specific weaknesses; below 50 usually signals a hook or CTA structural issue.

Voice IQ · 0-100

Acoustic delivery quality

Not subjective: signal processing on the actual audio. Credibility from voice clarity (a spectral-flatness proxy); expressiveness from pitch variation (Zuckerman & Driver, 1989, as a proxy for vocal expressiveness). By construction, scores spread across roughly 35-85.

Hook Score · 0-10

Opening hook strength

LLM analysis of the opening seconds: the mechanism (curiosity, pattern interrupt, bold claim) and how effectively it deploys it. Reports an interval (e.g. 7.2 ± 0.6): the ± is the spread across the self-critique loop’s passes, so it widens when the passes disagree and narrows when they converge — a measure of the model’s own certainty, not a claim about your ROAS.

Structure Score · 0-10

Narrative arc completeness

Maps the script to AIDA and PAS, identifying which elements are present, weak or missing: “no urgency signal between 00:22 and 00:38”, not “improve the structure.”

Formula transparency

How the composite is calculated

A weighted sum of six normalised components (audio path, shown below). For video, the weights rebalance to introduce a ~7% visual-alignment term. Reflects the real implementation in backend/metrics.py.

AdZhi = 100 × (
  0.25 × Hook/10
+ 0.20 × CTAQuality/100
+ 0.15 × AttentionRetention
+ 0.15 × Structure/10
+ 0.15 × VoiceIQ/100
+ 0.10 × MisalignmentScore
)

Weights are research-informed heuristics (Hook 25%, informed by Meta's published creative guidance; Attention 15% from Binet & Field; Voice IQ 15% from research linking vocal clarity to perceived credibility — measured here as a spectral-flatness proxy, not a true harmonics-to-noise ratio). The global default, not regression outputs. As your account logs real outcomes, AdZhi re-weights them toward the signals that actually drive your results, and applies that only once it beats the global weighting on your own data.

The forecast layer

From scores to a forecast

The scores aren't the end of the pipeline. AdZhi maps the signal profile to the outcomes that actually decide spend efficiency, then grades each forecast against your realised results and recalibrates to your account.

Thumb-stop

Will the first frames hold the scroll

From hook decay, opening energy and sound-off readiness: the ceiling on how many people stay past the first few seconds.

Retention

How far the watch lasts

From the attention-stability and emotional-trajectory curves: where energy dips and viewers tend to drop off.

CTR band

Likely click-through range

A banded estimate (not a false-precision number) from CTA momentum, value clarity and persuasion half-life.

Fatigue risk

How fast it will wear out

From creative entropy and structural predictability: how quickly the audience is likely to tune it out.

Validation status

What is, and isn't, proven

We publish what we haven't proven yet. A vendor that hides its gaps is hiding something.

Currently validated

Acoustic feature extraction: deterministic, reproducible, open-source (librosa).
Statistical-significance gating: a correlation is only surfaced once it clears p<0.05 on at least 8 matched outcome pairs (and, across many signals, a Benjamini-Hochberg adjustment for multiple comparisons). Below that bar it reads "directional", never "validated".
Score reproducibility: same file produces identical scores; version-pinned.
Sentiment (VADER): peer-reviewed (Hutto & Gilbert, AAAI 2014), open-source.

In progress

Score-to-ROAS correlation: needs per-account performance matched to signals.
Cross-industry benchmark generalisation: percentiles are directional at current n.
Pooled global weight regression: the default weights remain research-informed heuristics, not yet regressed across all accounts (per-account re-weighting from your own outcomes is live).
Hook-type classification accuracy: not yet evaluated against human-labelled ground truth.
Creative segmentation (k-means): clusters are deterministic; ranking segments by ROAS strengthens as you log outcomes.

We will not claim statistical validation before we have it, or cite correlations we haven't measured. When a component moves from research-calibrated to empirically validated, we'll say so, and name the dataset that validated it.

Start analysing your ads

The first analysis is free: the full breakdown across acoustic, linguistic and structural layers, with timestamped direction.

Analyse your first ad free →See a sample report →