Methodology

How AdZhi scores your ads

Every score AdZhi produces is grounded in a combination of signal processing on the actual audio and video, AI linguistic evaluation calibrated against published research, and structural analysis mapped to proven persuasion frameworks. This page explains exactly what feeds into each score, what research it draws on, and where we're still building the empirical evidence base.

The numbers, and what each one actually means

AdZhi reports several different numbers, and they describe different things — not the same thing counted five ways. Here is each one, its job, and how they connect.

9
What we look at
Analysis dimensions
The full breadth of every analysis — voice, music, visuals, script, structure, and more.
11
How we process it
Pipeline stages
The steps a file passes through, from audio extraction and transcription to scoring and synthesis.
8
What's unique to us
Proprietary signals
Trademarked metrics like Persuasion Half-Life™ and Voice Trust Index™ — a subset of the 9 dimensions, found in no ad platform.
6 / 7
How the score is built
Score components
The weighted inputs behind your single 0–100 AdZhi Score: six for audio, plus a seventh — voice↔image alignment — when the ad is a video.
20
How we match winners
Fingerprint dimensions
The acoustic vector behind Persuasion DNA™, used to find ads built like your best performers.
In one sentence: you upload a file, the 11-stage pipeline runs it across 9 analysis dimensions8 of which are proprietary signals unique to AdZhi — then 6 of those dimensions are weighted into your AdZhi Score (a 7th, voice↔image alignment, joins for video), while a 20-dimension acoustic fingerprint powers Persuasion DNA matching against your top ads.

Three layers of analysis, working together

AdZhi does not apply a single AI model and call it a score. Each analysis runs three distinct layers — acoustic measurement, linguistic evaluation, and structural mapping — and combines them into composites. The layers are described below.

Layer 1

Acoustic

  • Fundamental frequency (pitch) and pitch variation over time
  • RMS energy envelope and CTA Momentum
  • Harmonic-to-Noise Ratio (vocal credibility)
  • Speech rate (syllables per second)
  • Disfluency detection (fillers, restarts)
  • Spectral entropy (information density)
  • Attention decay model (exponential)
Layer 2

Linguistic

  • Hook strength evaluation (AI, Cialdini-calibrated)
  • VADER sentiment scoring (peer-reviewed NLP model)
  • Call-to-action identification and urgency detection
  • Clarity and specificity of the value proposition
  • Emotional arc across the transcript
  • Self-critique loop for evaluation consistency
Layer 3

Structural

  • Narrative arc mapping to AIDA and PAS frameworks
  • Act completeness: hook, build, urgency, close
  • Timestamped gap detection (missing persuasion elements)
  • Section-level energy alignment (does delivery match content?)
  • Opening pattern classification

What each number means

Four primary scores surface from the analysis. Each has a defined input set, a documented weighting rationale, and a specific interpretation range.

Voice IQ 0 – 100

Acoustic delivery quality

Voice IQ is not subjective. It is the result of signal processing on the actual audio file. No human listens. No opinion is formed. Two sub-scores contribute:

Credibility component
  • Based on Harmonic-to-Noise Ratio (HNR) — a measure of how clean and stable the voice is relative to noise and breathiness
  • HNR is a standard metric in speech science and clinical voice research; high HNR correlates with perceived authority and trustworthiness
Expressiveness component
  • Derived from pitch variation across the recording (coefficient of variation of F0)
  • Draws on Zuckerman & Driver (1989) vocal affect research: monotone delivery reduces persuasive impact; appropriate variation signals engagement and confidence
Interpreting the score: Above 70 is good delivery. Below 50 usually means flat energy, too much noise in the signal, or a credibility deficit that listeners will feel without knowing why.

Practical range: Scores typically fall between 35 and 85 in practice — the algorithm is bounded by real-world acoustic constraints. A score below 50 suggests clarity issues worth addressing; above 70 indicates strong vocal delivery.
Hook Score 0 – 10

Opening hook strength

The Hook Score evaluates the opening seconds of the ad script using AI linguistic analysis. It is not a keyword match. The model considers the mechanism the hook uses — curiosity, pattern interrupt, direct relevance, social proof, bold claim — and how effectively it deploys it.

Calibration basis
  • Cialdini's influence principles (reciprocity, scarcity, authority, consistency, liking, unity)
  • AIDA and PAS direct-response copywriting frameworks
  • Meta's published research on hook characteristics correlated with thumbstop rate
  • A self-critique loop runs during evaluation to flag inconsistencies and improve scoring reliability
Confidence interval: The score reports an interval (e.g. 7.2 ± 0.6) showing evaluation certainty. Wider intervals mean the hook is ambiguous — partially working, partially not. Narrow intervals at high scores are what you want.
Structure Score 0 – 10

Narrative arc completeness

Structure Score maps the full script to established persuasion arc frameworks, identifying which elements are present, well-executed, or missing.

Framework mapping
  • AIDA (Attention, Interest, Desire, Action) — Lewis, 1898, applied in modern DTC creative strategy
  • PAS (Problem, Agitation, Solution) — widely used in direct-response video; effective for emotionally-driven conversion
  • Timestamps gaps in the persuasion arc: e.g. "desire established but no specific action prompt in the final 10 seconds"
Interpreting the score: A perfect 10 means all arc elements are present and well-sequenced. Missing elements are surfaced specifically — not "improve the structure" but "no urgency signal between 00:22 and 00:38".
CTA Score 0 – 100

Call-to-action strength

The CTA Score reflects how the ask is delivered, not just whether it exists. A perfect script CTA can be killed by flat vocal delivery — this catches that.

What it measures
  • CTA Momentum™ — the acoustic energy at the moment of the ask, relative to the rest of the ad
  • Number of distinct calls-to-action detected, with a bonus for reinforcing the ask without overloading it
  • Flagged when urgency words land on flat delivery — the words say "act now" but the voice doesn't
Weight: 20% of the AdZhi Score — the second-highest component, reflecting direct-response evidence on the primacy of the ask.
Attention Prediction

An exponential decay model estimates how much audience attention remains at each second of the ad. Informed by mobile video attention research — specifically the Meta Audience Network 2019 study showing approximately 50% of social video attention lost within the first five seconds for audiences who continue watching.

The model flags the exact timestamp where your attention decay rate accelerates, so you know where the edit or re-delivery needs to happen.

Sentiment Analysis (NLP)

Sentiment is computed using VADER (Valence Aware Dictionary and sEntiment Reasoner), a published and peer-reviewed NLP model by Hutto & Gilbert (AAAI 2014). VADER was specifically designed for short-form social media text — making it well-suited to ad script analysis.

Sentiment is reported per section of the script, not as a single overall number. This surfaces arc patterns: a negative opening resolved into positive CTA energy scores differently from flat neutral throughout.

What the methodology builds on

AdZhi's scoring approach draws on advertising effectiveness research, speech science, and established persuasion frameworks. These are not decorative citations — each one directly informs specific signal weights or evaluation criteria.

Advertising effectiveness

Meta Creative Quality Research

Meta's published findings on hook characteristics and their relationship to thumbstop rate inform hook weighting and the attention decay model's initial parameters.

Creative testing norms

Kantar Link AI norms

Kantar's normative database on ad recall, brand linkage, and enjoyment informs the weighting placed on emotional engagement signals in the composite AdZhi Score.

Long-term effectiveness

IPA Binet & Field data

Les Binet and Peter Field's IPA Effectiveness Databank analysis of 1,000+ campaigns informs narrative structure weighting — specifically the effectiveness advantage of story-driven creative.

Direct response

Ogilvy / Hopkins principles

Claude Hopkins' "Scientific Advertising" (1923) and David Ogilvy's direct-response principles remain foundational to CTA scoring: specificity of the ask, urgency framing, and benefit clarity.

NLP model

VADER — Hutto & Gilbert, AAAI 2014

A peer-reviewed sentiment analysis model trained on social media text. Published and reproducible. Not a proprietary black box — the model's decision logic is documented in the original paper.

Speech science

Harmonic-to-Noise Ratio research

HNR is a standard metric in voice pathology and speech science. Elevated HNR scores correlate with perceived credibility and authority in human listener studies.

Vocal persuasion

Zuckerman & Driver, 1989

Research on vocal affect and persuasion showing that pitch variation significantly influences perceived enthusiasm and persuasive impact. Basis for the expressiveness component of Voice IQ.

Persuasion frameworks

Cialdini's Influence principles

Six principles of influence (reciprocity, commitment, social proof, authority, liking, scarcity) inform the hook evaluation rubric. The AI is calibrated to recognise which mechanism a hook is deploying and how effectively.

Copywriting frameworks

AIDA / PAS

Attention-Interest-Desire-Action (Lewis, 1898) and Problem-Agitate-Solution — the two most empirically-validated narrative arc frameworks in direct-response copywriting. Structure Score maps to both.

Signal processing

Shannon entropy / spectral analysis

Spectral entropy (a measure of frequency domain information density) contributes to the acoustic analysis layer. Higher entropy in the right frequencies correlates with engaged, dynamic delivery.

Attention science

Mobile video attention research

Meta Audience Network (2019) study on attention drop patterns in social video formats. Informs the shape of the exponential decay model used for attention prediction.

AI evaluation

Self-critique evaluation loop

The AI scoring of hooks and structure runs a secondary critique pass on its own output before returning a score. This reduces hallucination-driven overconfidence and widens the reported confidence interval when the evaluation is genuinely uncertain.

Research foundations — peer-reviewed basis

The research items below are cited as direct inputs to specific signal weights or scoring criteria. All citations are real, published works. Where a finding has a commonly-cited interpretation that exceeds the original study's scope, that caveat is noted.

How the AdZhi composite is calculated

The composite AdZhi Score is a weighted sum of six normalised component scores. All components are scaled to a 0–1 range before weighting. The formula below reflects the actual implementation in backend/metrics.py (audio-only path; a cross-signal variant reduces acoustic weights by 2 pp each when visual data is present to accommodate a 7% visual-alignment component).

Composite score formula (v1.0, heuristic weights — subject to revision as correlation data accumulates)
AdZhi = 100 × (
  0.25 × Hook/10
 + 0.20 × CTAQuality/100
 + 0.15 × AttentionRetention
 + 0.15 × Structure/10
 + 0.15 × VoiceIQ/100
 + 0.10 × MisalignmentScore
)
Hook — 25% Highest weight. No view without a hook.
CTA quality — 20% Intensity + explicitness. Where money is made.
Attention retention — 15% From attention half-life prediction model.
Structure — 15% Arc completeness vs AIDA / PAS frameworks.
Voice IQ — 15% Credibility, expressiveness, hook energy, professionalism.
Misalignment penalty — 10% Delivery contradicts script = conversion risk.

Weights are research-informed heuristics, not regression outputs. The hook weight (25%) draws on Meta's Creative Codes data; attention (15%) draws on Binet & Field recall evidence; voice IQ (15%) on HNR credibility research. Weight recalibration against outcome data (ROAS, CTR) is in progress — see Validation Status below.

Voice IQ sub-formula: Voice IQ is itself a weighted composite — credibility signal / HNR (30%), expressiveness / pitch range (25%), hook energy in first 3s (20%), professionalism / disfluency score (25%) — normalised to 0–100. Scores in practice cluster between 35 and 85 because bounded human speech physics prevent extreme values on all four components simultaneously.
Audio feature extraction: All acoustic features use librosa (McFee et al., 2015, SciPy). Specific features extracted: spectral centroid, RMS energy envelope, zero-crossing rate, fundamental frequency via pyin algorithm, and shimmer proxy metrics derived from frame-level amplitude variation. The pyin algorithm (Mauch & Dixon, 2014) is used for robust pitch estimation in speech, chosen for its lower false-negative rate on non-musical voice compared to earlier YIN-based methods.

What is and isn't proven

AdZhi publishes this distinction because transparency about methodology confidence levels is itself a signal of trustworthiness. The following describes the current validation state of the scoring pipeline.

Current validation state

Currently validated
  • Acoustic feature extraction — pitch, shimmer proxy, RMS energy, spectral centroid, zero-crossing rate extracted from audio using librosa. Deterministic, reproducible, open-source library.
  • Statistical significance gating — t-distribution significance test (p<0.05, n≥8) before any correlation claim is surfaced as actionable. At exactly n=8 (df=6), Pearson r must exceed ≈0.707 to reach p<0.05. This deliberately produces almost no actionable correlations until the sample is meaningful.
  • Score reproducibility — same audio file produces identical scores on repeated analysis. The acoustic pipeline is deterministic and version-pinned.
  • Sentiment analysis (VADER) — the NLP model is peer-reviewed (Hutto & Gilbert, AAAI 2014) and open-source. Its accuracy on short-form social text is documented in the original publication.
In progress — not yet validated at scale
  • Score-to-ROAS correlation — requires per-account performance data matched to the acoustic signals from the same ad. Log your performance outcomes at /performance/ to contribute to this dataset.
  • Cross-industry benchmark generalisation — current fleet benchmarks (percentile rankings) are derived from a small-n dataset. Percentile positions will shift as the fleet grows. Treat current percentiles as directional, not definitive.
  • Acoustic weight calibration — Voice IQ sub-weights (credibility 30%, expressiveness 25%, hook energy 20%, professionalism 25%) and the composite score weights (hook 25%, CTA 20%, etc.) are heuristic. They have not yet been regressed against listener-rated quality scores or outcome data at scale. Weight revision is planned as labelled data accumulates.
  • Hook type classification accuracy — AI-based hook type detection (Curiosity Gap, Bold Claim, Pattern Interrupt, etc.) has not been formally evaluated against a human-labelled ground truth set. The self-critique loop improves consistency but does not constitute external validation.
We surface this distinction because we believe transparency about methodology confidence levels is itself a signal of trustworthiness. When a score component transitions from research-calibrated to empirically validated, we will document what dataset validated it and how.

Methodology-backed today, data-backed tomorrow

Current weights are grounded in advertising effectiveness research and expert calibration. This is disclosed clearly. We are building the empirical moat, not claiming we already have it.

Every analysis AdZhi runs contributes to a growing fleet dataset. When users connect their performance data — CTR, ROAS, thumbstop rate — we match those outcomes to the acoustic and linguistic signals from the same ad. Over time, this produces a statistical picture of what actually predicts performance for specific audiences, verticals, and creative formats.

STEP 01

Analysis at scale

Every ad analysed adds acoustic, linguistic, and structural signals to the fleet dataset — anonymised and aggregated.

STEP 02

Performance import

When you connect your ad account or import your CTR and ROAS data, we match those outcomes to the signals we measured in the corresponding ad.

STEP 03

Score recalibration

As the labelled dataset grows, statistical correlations replace research-informed priors. Weights are adjusted. Predictions improve. Score ranges update to reflect what actually converts.

What we will not do: We will not claim statistical validation before we have it. We will not cite correlations that have not been measured. When a score component transitions from research-calibrated to empirically-validated, we will say so explicitly — and what dataset validated it.

The performance dataset growing through this process is AdZhi's primary IP asset — acoustic fingerprints matched to performance outcomes, non-replicable through any other path. For a technical brief on what is and isn't defensible: Moat & IP brief →

Start analysing your ads

The first analysis is free. Upload a video ad and see the full scoring breakdown — acoustic, linguistic, structural — with specific, timestamped direction for improvement.

Analyse your first ad free See how it works