Macro Sentiment Edge Study — Calchas Research

← Back to Research Working Paper

Quantitative Macro Sentiment

Abstract

This study investigates whether a combination of retail investor sentiment (AAII weekly survey), options market positioning (CBOE Put/Call Ratio), and volatility regime (VIX/VVIX) generates a measurable, statistically defensible edge in predicting forward S&P 500 returns over horizons from four weeks to one year. The AAII bearish sentiment signal survives out-of-sample walk-forward validation at 8–52 week horizons (MWU p=0.000 at 52w, test period 2006–2026, parameters frozen at pre-2006 calibration).

The divergent fear composite signal — retail bearish while options market shows no confirming demand for protection — produces a 52-week in-sample median return of +22.0% versus a base of +9.9% (MWU p=0.000), but out-of-sample evidence is mixed. The VIX/VVIX regime overlay is directionally compelling but cannot be validated given insufficient VVIX history. Methodology includes corrections for overlapping return windows, non-overlapping binomial tests, permutation testing, and walk-forward out-of-sample validation. This is a working paper, not a finished research programme.

Keywords: investor sentiment, put/call ratio, return predictability, behavioural finance, walk-forward validation

Data: AAII 1987–2026 (2,018 weeks) · CBOE PCR 2006–2026 · VIX 2000–2026 · VVIX 2012–2026

1. Executive Summary

This study investigates whether publicly available behavioural finance signals carry statistically defensible information about forward S&P 500 returns. The honest framing of what holds versus what does not is the most important contribution this paper makes.

1.1 What We Found

The AAII bearish sentiment signal is out-of-sample validated. Weeks where AAII bearishness is elevated relative to the prior two years show a statistically distinct and superior return distribution over 8–52 weeks on data the model never saw (MWU p=0.000 at 52w). Parameters calibrated exclusively on pre-2006 data, applied to 2006–2026.

The divergent fear signal is the most compelling in-sample finding. When retail is verbally bearish but the options market shows no confirming demand for protection, the 52-week median return is +22.0% versus a base of +9.9% (MWU p=0.000 in-sample). Out-of-sample evidence for the composite signal is mixed.

The VIX/VVIX regime overlay cannot yet be validated. VVIX data begins only in 2012, leaving insufficient history for a meaningful walk-forward split. The regime analysis is in-sample only.

1.2 Core Findings at a Glance

Exhibit 1. Study claims and validation status

Claim	Status
AAII fear predicts a different return distribution out-of-sample	Confirmed, MWU p=0.000, 8w through 52w
The return premium is economically significant	Modest, +0.9% to +2.0% above base at 12–52w
PCR divergence adds value over AAII alone	In-sample yes; out-of-sample mixed
Divergent fear has a statistically distinct return distribution	Confirmed in-sample — MWU p=0.000
Continuous divergence score is a proven alpha factor	No, fails permutation test on full sample
VIX/VVIX regime sharpens the signal	Directionally yes; cannot validate out-of-sample

Exhibit 2. AAII × PCR composite signal performance heatmap

AAII × PCR Composite Signal Performance Heatmap — median forward SPX return and 52-week hit rate by composite state

1.3 Disclaimers

This is a working paper, not a finished research study. While we used appropriate rigour in the methodology — proper significance testing, non-overlapping samples, permutation testing, and out-of-sample walk-forward validation — the scope is deliberately narrow. It tests three signals on index-level data over a single geography without controlling for valuation, macro regime, or factor exposures. This is a surface investigation into what behavioural signals can offer to traditional finance. We outlined five directions in Section 11, which we will take as the next steps.

2. Introduction

Behavioral finance has produced a substantial body of evidence that investor sentiment affects asset prices. It has been shown that fear and greed leave measurable imprints on market returns that traditional, and rational, models alone cannot explain. The AAII Investor Sentiment Survey, published weekly since 1987, is one of the most widely cited proxies for retail investor psychology. The CBOE Put/Call Ratio provides a complementary window into how investors position, as opposed to how they say they feel. VIX and VVIX capture the broader volatility regime.

The central research question is this: can a combination of these three publicly available data sources generate a measurable, statistically defensible edge in predicting forward S&P 500 returns? We did not set out to build a trading system nor a guaranteed alpha stream. Rather, we asked: do these signals carry information that is not already fully priced in?

The answer is: the AAII sentiment signal carries real information. The composite signals carry additional information in-sample that has not been fully confirmed out-of-sample. The vol regime layer carries interesting information that cannot yet be properly tested.

We did not construct a trading strategy nor do we claim to have discovered an exploitable market inefficiency. However, we have established what can be said with genuine statistical confidence about these signals, applied proper corrections for the methodological hazards specific to return-predictability research, and are honest about the limits of what the data supports.

3. Data

Four data series form the foundation of this study.

Exhibit 3. Data sources, coverage, and observation counts

Series	Source	Range	Observations
AAII Weekly Sentiment Survey	American Association of Individual Investors	1987–2026	2,018 weeks
CBOE Total Put/Call Ratio ($CPC)	Barchart	2006–2026	~4,880 days
CBOE Volatility Index (VIX)	Barchart	2000–2026	~6,760 days
CBOE Volatility of Volatility Index (VVIX)	Barchart	2012–2026	~3,640 days

The AAII survey asks individual investors weekly whether they are bullish, neutral, or bearish on the market over the next six months. This study focuses on the bearish reading as the primary signal of interest. The CBOE Total Put/Call Ratio measures aggregate options positioning. High readings indicate net protection-buying. Daily readings are aligned to the weekly AAII survey date. VVIX measures the volatility of VIX itself, used here as a proxy for institutional positioning anxiety. The shorter VVIX history (from 2012) is the constraint on the regime-overlay analysis.

4. Signal Architecture

The framework operates in three distinct layers.

4.1 Layer 1: The Divergence Score

The core quantitative signal:

divergence_score = bearish_z − pcr_z

bearish_z is the rolling 2-year z-score of the AAII weekly bearish reading, measuring how extreme current retail pessimism is relative to the prior 104 weeks.

pcr_z is the rolling 63-day z-score of the CBOE Total Put/Call Ratio, measuring how much options protection is being purchased relative to recent norms.

The divergence score captures the mismatch between what retail investors say and what the options market does. A high divergence score means retail is verbally bearish but the options market is not confirming it: nobody is paying for protection at an elevated rate. The thesis is that sentiment-driven fear without structural hedging confirmation tends to resolve upward; such regimes tend to be short-lived.

4.2 Layer 2: AAII × PCR Composite Buckets

Each week is classified into one of four named states by crossing the AAII signal with the PCR signal:

Exhibit 4. AAII × PCR composite signal state definitions

Signal	AAII Reading	PCR Reading	Interpretation
confirmed_fear	Bearish (z > 1)	Elevated (z > 1)	Retail scared; options market confirms fear
divergent_fear	Bearish (z > 1)	Low (z < 0)	Retail scared; nobody buying protection
quiet_hedging	Bullish (z < 0)	Elevated (z > 1)	Retail complacent; institutions quietly hedging
confirmed_greed	Bullish (z < 0)	Low (z < 0)	Universal complacency; no fear anywhere

4.3 Layer 3: VIX/VVIX Regime Overlay

Four volatility regime quadrants using 252-day rolling z-scores of VIX and VVIX. This layer does not generate a signal independently; it provides context for the divergence score. Because VVIX data begins only in 2012, the regime overlay could not be validated out-of-sample.

Exhibit 5. VIX/VVIX regime quadrant definitions

Regime	VIX z	VVIX z	Interpretation
consensus_low	Below avg	Below avg	Structural vol suppression; broadly calm
latent_tension	Below avg	Above avg	Surface calm; regime uncertainty rising; divergence signal
active_fear	Above avg	Above avg	Broad structural fear; spot vol and meta-vol elevated
exhaustion	Above avg	Below avg	VIX elevated but vol-of-vol calming; fear may be peaking

5. Methodology

5.1 Return Construction

Forward S&P 500 returns are calculated at five horizons: 4, 8, 12, 26, and 52 weeks from the weekly AAII survey close date. All horizons are computed for every weekly observation, producing overlapping return windows. This is a structural issue addressed directly in the significance testing design.

5.2 Statistical Tests

The two core problems in return-predictability research

Problem 1 — Overlapping return windows

Weekly observations with 52-week forward returns share approximately 51 weeks with their immediate neighbours. Standard tests applied to this data produce artificially low p-values. This study corrects by: (a) filtering to non-overlapping observations for binomial tests, and (b) using Newey-West HAC standard errors in regression, with lag lengths equal to the horizon.

Problem 2 — The equity base rate

The S&P 500 rises in approximately 75–80% of all 52-week periods regardless of any signal. A 90% hit rate must be tested against an ~84% unconditional base rate. With N=9 non-overlapping observations for divergent_fear at 52w, the binomial test has almost no power. This is stated explicitly throughout.

Tests applied

Mann-Whitney U (two-sided)

Tests whether a signal bucket's return distribution differs from all other periods. Non-parametric; primary significance test throughout.

Binomial test (one-sided)

Tests whether a bucket's hit rate exceeds the unconditional SPX base rate, on non-overlapping observations only. Explicitly noted as underpowered given available sample sizes.

Newey-West HAC regression

Continuous divergence_score regressed against forward returns with autocorrelation-corrected standard errors using lag lengths equal to the return horizon.

Permutation test

Signal labels shuffled 1,000 times; regression re-run on each shuffle. Observed coefficient compared to the empirical null distribution to detect period-selection artefacts.

Walk-forward out-of-sample validation

Parameters calibrated on a training period, frozen, and applied to a held-out test period. Gold standard for signal validation.

6. Results

6.1 AAII-Only Forward Returns

The baseline analysis classifies each week by AAII bearish reading into five buckets and computes forward SPX returns over 2,018 weeks of history.

Exhibit 6. AAII sentiment buckets: forward S&P 500 returns, 1986–2026

Signal	N	12w Median	52w Median	52w Hit Rate
extreme_fear	192	+4.2%	+14.8%	75.8%
fear	302	+3.9%	+14.0%	79.6%
neutral	991	+2.4%	+11.2%	80.3%
complacency	380	+3.2%	+10.3%	82.6%
extreme_greed	96	+3.2%	+12.5%	80.2%

Hit rate is the percentage of periods with positive 52-week forward return.

From the backtests, it can be confirmed that through AAII sentiment alone, we can correctly predict direction: fear periods show higher forward returns at longer horizons than complacency periods. But the differences are modest. Extreme fear at 52w produces +14.8% versus +10.3% for complacency. AAII alone is consistent but not dramatic.

6.2 AAII × PCR Composite Forward Returns

Adding the PCR overlay sharpens the picture considerably, particularly for the divergent_fear state.

Exhibit 7. AAII % bearish and S&P 500 with divergent fear signal highlights, 2006–2026

AAII Bearish Sentiment and S&P 500 with Divergent Fear Signal Highlights, 2006–2026

Divergent_fear clusters at moments of genuine retail panic unconfirmed by options market positioning. The forward return results:

Exhibit 8. AAII × PCR composite signal states: forward S&P 500 returns, 2006–2026 in-sample

Signal	N	26w Median	52w Median	52w Hit Rate
confirmed_fear	90	+4.5%	+11.8%	70.9%
divergent_fear	37	+12.1%	+22.0%	90.6%
neutral	478	+5.5%	+12.9%	80.0%
quiet_hedging	39	+6.4%	+5.5%	60.5%
confirmed_greed	97	+6.6%	+11.8%	81.4%

Exhibit 9. 52-week forward SPX return distributions by AAII × PCR composite signal bucket

The confirmed_fear bucket shows markedly lower performance at +11.8% over 52 weeks. This is exactly what the behavioural thesis predicts: when fear is structurally confirmed, the market has already partially priced it in. The quiet_hedging bucket (+5.5%, 60.5% hit rate) is the weakest result, consistent with covert institutional hedging being a more reliable signal of genuine risk than retail verbosity.

Conditioning divergent_fear on the VIX/VVIX regime produces the most extreme return results in the study. These results are in-sample only.

Exhibit 10. Divergent fear signal performance conditioned on VIX/VVIX regime, in-sample only

Regime	N	26w Median	52w Median	52w Hit Rate
consensus_low	10	+9.4%	+16.5%	100%
latent_tension	6	+14.2%	+26.0%	100%
active_fear	5	+16.6%	+34.3%	100%
exhaustion	3	+15.7%	+24.0%	100%

In-sample only. N per cell 3–10; interpret directionally only.

Exhibit 11. Divergent fear performance by VIX/VVIX regime quadrant

The active_fear result (verbal panic plus structural vol stress plus no institutional hedging confirmation) shows a 52-week median return of +34.3% with a perfect hit rate on N=5. This is consistent with genuine signal but also consistent with chance. The direction is compelling, but the statistical confidence is not sufficient.

7. Significance Testing

7.1 Non-Overlapping Sample Construction

Non-overlapping observations were constructed for each horizon by filtering to observations spaced at least N weeks apart. This produces independent observations at the cost of dramatically reducing sample size.

Exhibit 12. Raw versus independent sample sizes at the 52-week horizon

Signal	Raw N	Independent N at 52w
extreme_fear	192	24
divergent_fear	37	9
confirmed_fear	90	11

The collapse from 37 to 9 independent observations for divergent_fear is the central limiting factor for the binomial test at 52 weeks.

7.2 Mann-Whitney U Results

The MWU test was applied to all major signal buckets. Most signals did not produce statistically distinct return distributions. The results that passed:

divergent_fear: MWU p=0.000 at both 26w and 52w. The return distribution for divergent fear weeks is genuinely different from all other periods.

AAII bearish_z (extreme_fear bucket): MWU p=0.003 at 52w in-sample. Subsequently confirmed out-of-sample — see Section 9.

Exhibit 13. 52-week forward SPX return distribution: divergent fear vs. all other weeks

52-Week Forward SPX Return Distribution: Divergent Fear vs. All Other Composite Weeks

7.3 Binomial Test Results

The binomial test on non-overlapping observations produced no statistically significant results at conventional thresholds for any signal at 52 weeks. With N=9 for divergent_fear, the test has essentially no power to distinguish a true 90% hit rate from the ~84% base rate. This is not evidence against the signal — it is evidence that the sample is insufficient to confirm or disconfirm the hit rate claim.

8. Regression Analysis

8.1 Approach

A continuous divergence_score is regressed against forward SPX returns using Newey-West HAC standard errors with lag lengths set equal to the return horizon. This is the standard correction for autocorrelation from overlapping return windows in the return-predictability literature.

8.2 Results

Exhibit 14. Newey-West HAC regression of divergence_score on forward SPX returns

Horizon	Coefficient	HAC Std Error	p-value	Significance
4 weeks	+0.158	0.169	0.349	—
8 weeks	+0.296	0.286	0.300	—
12 weeks	+0.501	0.331	0.130	—
26 weeks	+1.127	0.595	0.059	*
52 weeks	+2.891	1.218	0.018	** (fails permutation)

* p<0.10, ** p<0.05 nominal. The 52-week result does not survive permutation testing; see Section 8.3.

8.3 Permutation Test: The p=0.018 Does Not Survive

Signal labels were shuffled 1,000 times and the regression re-run on each shuffle to test whether the 52-week result reflected genuine signal or data-structure artefacts.

Verdict: noise at every horizon. The p=0.018 at 52w did not survive permutation testing. The result was driven by two interacting artefacts: restricting the sample to the VVIX era selected a specific market period, and the joint regression of correlated predictors inflated the coefficient. On the full sample with shuffled labels, the observed coefficient was indistinguishable from chance. Newey-West correction addresses autocorrelation but not period-selection bias.

Exhibit 15. Regression coefficients by horizon with permutation test annotation

Divergence Score Regression Coefficients by Horizon — Newey-West HAC, 2013–2026

9. Walk-Forward Out-of-Sample Validation

Walk-forward validation is the most demanding test in this study. Parameters are calibrated on a training period, frozen, and the model is applied to a held-out test period it did not see. No re-fitting was permitted.

9.1 Test 1: AAII bearish_z Out-of-Sample (2006–2026)

Exhibit 16. Walk-forward Test 1: training and test parameters

Parameter	Value
Training period	1987–2005 (963 weeks)
Test period	2006–2026 (1,055 weeks)
Calibrated bearish mean	27.9%
Calibrated bearish std	9.2%

Exhibit 17. Walk-forward out-of-sample results — AAII bearish signal, test period 2006–2026

Horizon	N	Base Median	Signal Median	vs Base	MWU p	Verdict
4 weeks	446	+1.2%	+1.3%	+0.2%	0.378	—
8 weeks	443	+2.3%	+2.8%	+0.5%	0.010	**
12 weeks	443	+3.4%	+4.3%	+0.9%	0.001	***
26 weeks	434	+6.8%	+7.7%	+0.9%	0.001	***
52 weeks	412	+13.4%	+15.4%	+2.0%	0.000	***

Parameters frozen at training-period values. ** p<0.05, *** p<0.01.

Exhibit 18. Walk-forward out-of-sample validation — signal vs. base rate median returns by horizon

Walk-Forward Out-of-Sample Validation — AAII Bearish Signal vs. Base Rate, Test Period 2006–2026

Exhibit 18 shows the cleanest result in the study. AAII bearish sentiment, classified using parameters calibrated exclusively on pre-2006 data, produces a statistically distinct return distribution over 8 through 52 weeks on 1,055 weeks the model never saw. Return differentials are modest (+0.9% at 12w, +2.0% at 52w) but real and robust.

9.2 Test 2: Divergence Score Out-of-Sample (2016–2026)

Exhibit 19. Walk-forward Test 2: training and test parameters

Parameter	Value
Training period	2006–2015 (479 weeks)
Test period	2016–2026 (533 weeks)
Calibrated bearish mean	34.4%
Calibrated bearish std	9.6%
Calibrated PCR mean	0.947
Calibrated PCR std	0.160

Exhibit 20. Walk-forward out-of-sample results; divergence score composite, test period 2016–2026

Horizon	N	Base Median	Signal Median	vs Base	MWU p	Verdict
4 weeks	190	+1.3%	+1.7%	+0.4%	0.536	—
8 weeks	188	+2.9%	+3.4%	+0.4%	0.104	—
12 weeks	187	+3.7%	+4.3%	+0.6%	0.289	—
26 weeks	176	+7.5%	+8.7%	+1.2%	0.019	**
52 weeks	154	+17.5%	+16.0%	−1.5%	0.001	Mixed

The 52w MWU is significant but the return differential is −1.5% vs. base. The signal fired frequently during a prolonged bull market that repeatedly bought back bearishness. Mixed result.

Adding PCR to AAII improves the in-sample story clearly, but the out-of-sample evidence is insufficient to claim the composite adds confirmed value over AAII alone. The composite signal is promising, not proven.

9.3 Walk-Forward Summary

Exhibit 21. Walk-forward validation summary across all signal layers

Test	Horizons Validated	Return Differential	Verdict
AAII bearish_z OOS (2006–2026)	8w through 52w	+0.9% to +2.0%	Holds. Clean OOS validation
Divergence_score OOS (2016–2026)	26w (marginal)	+1.2% at 26w; −1.5% at 52w	Mixed. Not confirmed
VIX/VVIX regime overlay	Not testable	N/A. VVIX too short	In-sample observation only

10. Integrated Verdict

10.1 What Walk-Forward Testing Changed

Exhibit 22. Before and after walk-forward testing: assessment of each signal layer

Layer	Before Walk-Forward	After Walk-Forward
AAII signal	Directionally consistent in-sample	Out-of-sample validated at 8–52w
PCR divergence	Sharpens in-sample significantly	Promising; not OOS confirmed
VIX/VVIX regime	Compelling in-sample (+34.3% active_fear)	In-sample only — cannot test
Overall	Directionally right; not statistically proven	AAII layer is real; composite needs more data

10.2 The Defensible Claims

AAII bearish sentiment has genuine out-of-sample predictive power over return distributions at 8–52 week horizons. Weeks where AAII bearishness is elevated relative to the prior two years have historically been followed by a statistically distinct and better return distribution on data from 2006–2026, using parameters calibrated exclusively on pre-2006 data (MWU p=0.000 at 52w). The mechanism is coherent: periods of extreme retail pessimism tend to coincide with sentiment-driven rather than fundamentally-driven selloffs, and these resolve faster.
When that bearish signal is also divergent from the options market — retail is scared but nobody is buying protection — the in-sample return distribution is more extreme (divergent_fear 52w median +22.0% vs. base +9.9%, MWU p=0.000). This composite signal is the most interesting finding but requires more data history to validate out-of-sample conclusively.

10.3 Practical Use of Each Signal Layer

Exhibit 23. Appropriate application of each signal layer

Layer	Validated?	Appropriate Use
AAII bearish_z	Yes. OOS confirmed	Quantitatively backed signal; elevated readings have OOS-validated return distribution shift
AAII × PCR divergence_score	Partial. In-sample strong	Best live monitoring state; strongest in-sample precedent; treat with more weight than AAII alone, but acknowledge OOS limitations
VIX/VVIX regime	No. In-sample only	Structured context filter; use to calibrate conviction sizing, not to trigger decisions

Exhibit 24. 52-week median return vs. hit rate by signal bucket

52-Week Return vs. Hit Rate by Signal Bucket — AAII-Only and AAII × PCR Composite

When the framework shows divergent_fear concurrent with an active_fear VIX regime, that is the highest-conviction setup the framework can produce. It is not a mechanical buy signal, but it is the historically strongest combination, and the AAII component underlying it has out-of-sample validation. The decision about how to act on it still requires judgment.

11. What Would Strengthen This Study

This paper is a beginning, not a conclusion. Five extensions would materially strengthen the framework, in approximate order of priority:

Longer VVIX History. VVIX only begins in 2012. The regime overlay is the most directionally compelling but least validated component. A longer VVIX history or reconstruction of a VVIX proxy from historical VIX options data would enable a proper walk-forward test of regime conditioning and settle whether active_fear genuinely improves signal performance or whether the N=5 result is noise.
Macro Regime Conditioning. Overlaying ISM, the Chicago Fed National Financial Conditions Index (NFCI), and credit spread data would allow testing whether divergent_fear performs differently in expansion versus stress environments. The hypothesis is that divergent_fear during macro stress is more reliable because the sentiment overshoot is harder to sustain against a recovering fundamental backdrop.
Valuation Control (Shiller CAPE). A persistent alternative explanation for the AAII bearish_z result is that periods of high retail bearishness tend to coincide with cheaper markets, and the return premium is a value premium rather than a pure sentiment premium. Adding Shiller CAPE as a control variable in the regression would allow partial isolation of the sentiment effect from the valuation effect.
Equity PCR vs. Index PCR Decomposition. This study uses the CBOE Total Put/Call Ratio, which aggregates equity and index options. Index PCR ($CPCI) is heavily influenced by institutional tail-risk hedging programmes independent of directional view. Equity PCR ($CPCE) is a cleaner proxy for directional sentiment. Separating the two series would likely sharpen the divergence signal and reduce noise in the current PCR measure.
Alternative Sentiment Cross-Checks. The NAAIM Exposure Index (professional money managers' equity exposure) provides a complementary cross-check on AAII. High divergence between AAII bearishness and high NAAIM exposure would strongly confirm the divergent_fear thesis: retail panicking while professionals remain allocated.

12. Conclusion

This paper set out to answer the question: do publicly available behavioural finance signals — AAII sentiment, PCR positioning, VIX/VVIX regime — carry information about forward S&P 500 returns that is statistically distinguishable from noise? The answer is a yes for the AAII signal alone, a conditional yes for the composite divergence signal in-sample, and an honest not-yet for the regime overlay.

The AAII result is the most interesting and important finding. Based on one simple signal — weeks where retail bearishness is elevated relative to its own recent history — we can produce a statistically distinct return distribution on 20 years of data out of sample. The return premium was modest (roughly 1–2 percentage points above base at 12–52w) but tangible.

The result of the composite divergence signal was theoretically coherent and empirically striking. A 52-week median return of +22.0% versus +9.9% base, with a statistically distinct distribution, is not noise. However, its out-of-sample validation is incomplete, and the permutation test result on the continuous regression is a negative finding that cannot be glossed over.

The vol regime overlay is a compelling framework. The hypothesis that divergent_fear during active vol stress represents the maximum contrarian setup is theoretically sensible and directionally supported. However, it cannot be validated yet.

Psychology moves markets. Sentiment divergence leaves traces in return distributions. This study establishes that with appropriate methodological rigour. The work of building those observations into a robust, validated investment framework is the next step.

This is a working paper published by Calchas Research for review and discussion. March 2026. Nothing contained herein constitutes investment advice, a solicitation, or an offer to buy or sell any security or financial instrument. The views expressed are subject to change without notice. Past performance is not indicative of future performance. This document is intended for sophisticated readers and should not be relied upon as the basis for any investment decision.