The numbers, and what each one actually means
AdZhi reports several different numbers, and they describe different things — not the same thing counted five ways. Here is each one, its job, and how they connect.
Three layers of analysis, working together
AdZhi does not apply a single AI model and call it a score. Each analysis runs three distinct layers — acoustic measurement, linguistic evaluation, and structural mapping — and combines them into composites. The layers are described below.
Acoustic
- Fundamental frequency (pitch) and pitch variation over time
- RMS energy envelope and CTA Momentum
- Harmonic-to-Noise Ratio (vocal credibility)
- Speech rate (syllables per second)
- Disfluency detection (fillers, restarts)
- Spectral entropy (information density)
- Attention decay model (exponential)
Linguistic
- Hook strength evaluation (AI, Cialdini-calibrated)
- VADER sentiment scoring (peer-reviewed NLP model)
- Call-to-action identification and urgency detection
- Clarity and specificity of the value proposition
- Emotional arc across the transcript
- Self-critique loop for evaluation consistency
Structural
- Narrative arc mapping to AIDA and PAS frameworks
- Act completeness: hook, build, urgency, close
- Timestamped gap detection (missing persuasion elements)
- Section-level energy alignment (does delivery match content?)
- Opening pattern classification
What each number means
Four primary scores surface from the analysis. Each has a defined input set, a documented weighting rationale, and a specific interpretation range.
Overall ad creative effectiveness
The composite score across six components. Designed as a single-number summary for quick comparison across ad variants, creators, and campaigns. A higher score means stronger hooks, cleaner structure, more credible delivery, and more persuasive language — simultaneously.
- Hook quality — 25% — highest weight, reflecting Meta's published data on the disproportionate impact of the first few seconds on overall ad performance
- CTA clarity and conviction — 20% — informed by Hopkins and Ogilvy's direct-response principles on the primacy of the ask
- Attention retention — 15% — decay-model estimate of audience attention at each second, informed by mobile video research
- Narrative structure — 15% — weighted following Binet & Field's IPA effectiveness evidence on story-driven creative outperforming message-only formats
- Voice quality (Voice IQ™) — 15% — weighted following speech science on credibility and persuasion; acoustic quality directly affects whether the message lands
- Script–voice alignment — 10% — penalises ads where the words say one thing but the delivery says another (e.g. "urgent" copy read flatly), the misalignment AdZhi detects per section
Acoustic delivery quality
Voice IQ is not subjective. It is the result of signal processing on the actual audio file. No human listens. No opinion is formed. Two sub-scores contribute:
- Based on Harmonic-to-Noise Ratio (HNR) — a measure of how clean and stable the voice is relative to noise and breathiness
- HNR is a standard metric in speech science and clinical voice research; high HNR correlates with perceived authority and trustworthiness
- Derived from pitch variation across the recording (coefficient of variation of F0)
- Draws on Zuckerman & Driver (1989) vocal affect research: monotone delivery reduces persuasive impact; appropriate variation signals engagement and confidence
Practical range: Scores typically fall between 35 and 85 in practice — the algorithm is bounded by real-world acoustic constraints. A score below 50 suggests clarity issues worth addressing; above 70 indicates strong vocal delivery.
Opening hook strength
The Hook Score evaluates the opening seconds of the ad script using AI linguistic analysis. It is not a keyword match. The model considers the mechanism the hook uses — curiosity, pattern interrupt, direct relevance, social proof, bold claim — and how effectively it deploys it.
- Cialdini's influence principles (reciprocity, scarcity, authority, consistency, liking, unity)
- AIDA and PAS direct-response copywriting frameworks
- Meta's published research on hook characteristics correlated with thumbstop rate
- A self-critique loop runs during evaluation to flag inconsistencies and improve scoring reliability
Narrative arc completeness
Structure Score maps the full script to established persuasion arc frameworks, identifying which elements are present, well-executed, or missing.
- AIDA (Attention, Interest, Desire, Action) — Lewis, 1898, applied in modern DTC creative strategy
- PAS (Problem, Agitation, Solution) — widely used in direct-response video; effective for emotionally-driven conversion
- Timestamps gaps in the persuasion arc: e.g. "desire established but no specific action prompt in the final 10 seconds"
Call-to-action strength
The CTA Score reflects how the ask is delivered, not just whether it exists. A perfect script CTA can be killed by flat vocal delivery — this catches that.
- CTA Momentum™ — the acoustic energy at the moment of the ask, relative to the rest of the ad
- Number of distinct calls-to-action detected, with a bonus for reinforcing the ask without overloading it
- Flagged when urgency words land on flat delivery — the words say "act now" but the voice doesn't
An exponential decay model estimates how much audience attention remains at each second of the ad. Informed by mobile video attention research — specifically the Meta Audience Network 2019 study showing approximately 50% of social video attention lost within the first five seconds for audiences who continue watching.
The model flags the exact timestamp where your attention decay rate accelerates, so you know where the edit or re-delivery needs to happen.
Sentiment is computed using VADER (Valence Aware Dictionary and sEntiment Reasoner), a published and peer-reviewed NLP model by Hutto & Gilbert (AAAI 2014). VADER was specifically designed for short-form social media text — making it well-suited to ad script analysis.
Sentiment is reported per section of the script, not as a single overall number. This surfaces arc patterns: a negative opening resolved into positive CTA energy scores differently from flat neutral throughout.
What the methodology builds on
AdZhi's scoring approach draws on advertising effectiveness research, speech science, and established persuasion frameworks. These are not decorative citations — each one directly informs specific signal weights or evaluation criteria.
Meta Creative Quality Research
Meta's published findings on hook characteristics and their relationship to thumbstop rate inform hook weighting and the attention decay model's initial parameters.
Kantar Link AI norms
Kantar's normative database on ad recall, brand linkage, and enjoyment informs the weighting placed on emotional engagement signals in the composite AdZhi Score.
IPA Binet & Field data
Les Binet and Peter Field's IPA Effectiveness Databank analysis of 1,000+ campaigns informs narrative structure weighting — specifically the effectiveness advantage of story-driven creative.
Ogilvy / Hopkins principles
Claude Hopkins' "Scientific Advertising" (1923) and David Ogilvy's direct-response principles remain foundational to CTA scoring: specificity of the ask, urgency framing, and benefit clarity.
VADER — Hutto & Gilbert, AAAI 2014
A peer-reviewed sentiment analysis model trained on social media text. Published and reproducible. Not a proprietary black box — the model's decision logic is documented in the original paper.
Harmonic-to-Noise Ratio research
HNR is a standard metric in voice pathology and speech science. Elevated HNR scores correlate with perceived credibility and authority in human listener studies.
Zuckerman & Driver, 1989
Research on vocal affect and persuasion showing that pitch variation significantly influences perceived enthusiasm and persuasive impact. Basis for the expressiveness component of Voice IQ.
Cialdini's Influence principles
Six principles of influence (reciprocity, commitment, social proof, authority, liking, scarcity) inform the hook evaluation rubric. The AI is calibrated to recognise which mechanism a hook is deploying and how effectively.
AIDA / PAS
Attention-Interest-Desire-Action (Lewis, 1898) and Problem-Agitate-Solution — the two most empirically-validated narrative arc frameworks in direct-response copywriting. Structure Score maps to both.
Shannon entropy / spectral analysis
Spectral entropy (a measure of frequency domain information density) contributes to the acoustic analysis layer. Higher entropy in the right frequencies correlates with engaged, dynamic delivery.
Mobile video attention research
Meta Audience Network (2019) study on attention drop patterns in social video formats. Informs the shape of the exponential decay model used for attention prediction.
Self-critique evaluation loop
The AI scoring of hooks and structure runs a secondary critique pass on its own output before returning a score. This reduces hallucination-driven overconfidence and widens the reported confidence interval when the evaluation is genuinely uncertain.
Research foundations — peer-reviewed basis
The research items below are cited as direct inputs to specific signal weights or scoring criteria. All citations are real, published works. Where a finding has a commonly-cited interpretation that exceeds the original study's scope, that caveat is noted.
-
Voice & persuasion
Mehrabian, A. & Ferris, S.R. (1967). "Inference of attitudes from nonverbal communication in two channels." Journal of Consulting Psychology, 31(3), 248–252. — The widely cited 38% voice / 55% visual / 7% words breakdown of emotional communication. Caveat: This finding applies specifically to the emotional communication of attitudes, not to all communication contexts. AdZhi uses it as directional evidence for vocal signal weighting, not as a universal proportion.
-
Voice & trust
Nass, C. & Brave, S. (2005). Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship. MIT Press. — On why voice quality affects trust, engagement, and perceived credibility independently of verbal content. Informs the Voice IQ credibility component weighting.
-
Hook & attention
Becker, M.W., et al. (2019). "Temporal dynamics of attention in video advertising." Journal of Marketing Research. — On the outsized importance of the first seconds of video for capturing and retaining audience attention. Supports the 25% weight assigned to Hook quality in the AdZhi composite score.
-
Hook & completion
TikTok for Business (2021). Creative Codes: What Makes TikTok Ads Work. — Industry research showing hook rate as the primary driver of ad completion and downstream conversion on short-form video platforms. Corroborates the hook-weight rationale alongside academic sources.
-
Acoustic features
Schuller, B., et al. (2013). "The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism." Proceedings of INTERSPEECH. — Establishes MFCCs, pitch, and energy envelope features as validated proxies for emotional state in speech. Provides methodological grounding for AdZhi's acoustic feature set (spectral centroid, RMS energy, pitch via pyin, zero-crossing rate).
-
Creative impact
Nielsen (2022). Creative Is King: How Creative Quality Drives Ad Effectiveness. — Nielsen's attribution modelling shows creative quality accounts for approximately 47% of sales impact across measured campaigns. Caveat: "Creative" in this study encompasses all creative elements — visual, structural, and acoustic — not acoustic signals specifically. AdZhi uses this as evidence for the primacy of creative quality overall, not as a claim about acoustic contribution in isolation.
-
Sentiment NLP
Hutto, C.J. & Gilbert, E. (2014). "VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text." Proceedings of the 8th AAAI International Conference on Weblogs and Social Media (ICWSM-14). — The peer-reviewed basis for VADER, AdZhi's sentiment analysis model. Openly reproducible; decision logic is documented in the original paper, not a proprietary black box.
-
Audio processing
McFee, B., et al. (2015). "librosa: Audio and Music Signal Analysis in Python." Proceedings of the 14th Python in Science Conference (SciPy 2015). — The open-source audio analysis library underpinning AdZhi's acoustic feature extraction pipeline. AdZhi uses spectral centroid, RMS energy, zero-crossing rate, pitch via pyin, and shimmer proxy metrics as implemented in librosa.
How the AdZhi composite is calculated
The composite AdZhi Score is a weighted sum of six normalised component scores.
All components are scaled to a 0–1 range before weighting. The formula below
reflects the actual implementation in backend/metrics.py (audio-only
path; a cross-signal variant reduces acoustic weights by 2 pp each when visual
data is present to accommodate a 7% visual-alignment component).
0.25 × Hook/10
+ 0.20 × CTAQuality/100
+ 0.15 × AttentionRetention
+ 0.15 × Structure/10
+ 0.15 × VoiceIQ/100
+ 0.10 × MisalignmentScore
)
Weights are research-informed heuristics, not regression outputs. The hook weight (25%) draws on Meta's Creative Codes data; attention (15%) draws on Binet & Field recall evidence; voice IQ (15%) on HNR credibility research. Weight recalibration against outcome data (ROAS, CTR) is in progress — see Validation Status below.
What is and isn't proven
AdZhi publishes this distinction because transparency about methodology confidence levels is itself a signal of trustworthiness. The following describes the current validation state of the scoring pipeline.
Current validation state
- ✓ Acoustic feature extraction — pitch, shimmer proxy, RMS energy, spectral centroid, zero-crossing rate extracted from audio using librosa. Deterministic, reproducible, open-source library.
- ✓ Statistical significance gating — t-distribution significance test (p<0.05, n≥8) before any correlation claim is surfaced as actionable. At exactly n=8 (df=6), Pearson r must exceed ≈0.707 to reach p<0.05. This deliberately produces almost no actionable correlations until the sample is meaningful.
- ✓ Score reproducibility — same audio file produces identical scores on repeated analysis. The acoustic pipeline is deterministic and version-pinned.
- ✓ Sentiment analysis (VADER) — the NLP model is peer-reviewed (Hutto & Gilbert, AAAI 2014) and open-source. Its accuracy on short-form social text is documented in the original publication.
- ⟳ Score-to-ROAS correlation — requires per-account performance data matched to the acoustic signals from the same ad. Log your performance outcomes at /performance/ to contribute to this dataset.
- ⟳ Cross-industry benchmark generalisation — current fleet benchmarks (percentile rankings) are derived from a small-n dataset. Percentile positions will shift as the fleet grows. Treat current percentiles as directional, not definitive.
- ⟳ Acoustic weight calibration — Voice IQ sub-weights (credibility 30%, expressiveness 25%, hook energy 20%, professionalism 25%) and the composite score weights (hook 25%, CTA 20%, etc.) are heuristic. They have not yet been regressed against listener-rated quality scores or outcome data at scale. Weight revision is planned as labelled data accumulates.
- ⟳ Hook type classification accuracy — AI-based hook type detection (Curiosity Gap, Bold Claim, Pattern Interrupt, etc.) has not been formally evaluated against a human-labelled ground truth set. The self-critique loop improves consistency but does not constitute external validation.
Methodology-backed today, data-backed tomorrow
Current weights are grounded in advertising effectiveness research and expert calibration. This is disclosed clearly. We are building the empirical moat, not claiming we already have it.
Every analysis AdZhi runs contributes to a growing fleet dataset. When users connect their performance data — CTR, ROAS, thumbstop rate — we match those outcomes to the acoustic and linguistic signals from the same ad. Over time, this produces a statistical picture of what actually predicts performance for specific audiences, verticals, and creative formats.
Analysis at scale
Every ad analysed adds acoustic, linguistic, and structural signals to the fleet dataset — anonymised and aggregated.
Performance import
When you connect your ad account or import your CTR and ROAS data, we match those outcomes to the signals we measured in the corresponding ad.
Score recalibration
As the labelled dataset grows, statistical correlations replace research-informed priors. Weights are adjusted. Predictions improve. Score ranges update to reflect what actually converts.
The performance dataset growing through this process is AdZhi's primary IP asset — acoustic fingerprints matched to performance outcomes, non-replicable through any other path. For a technical brief on what is and isn't defensible: Moat & IP brief →
Start analysing your ads
The first analysis is free. Upload a video ad and see the full scoring breakdown — acoustic, linguistic, structural — with specific, timestamped direction for improvement.