Disclaimer: This project is produced solely for educational and academic research purposes. Nothing in this repository constitutes investment advice, a solicitation, or a recommendation to buy, sell, or hold any security or financial instrument. This project is not connected to, endorsed by, or produced in any professional capacity related to any broker-dealer or investment advisory firm. The authors are presenting independent academic research on asset clustering methodologies. Past performance and historical patterns do not guarantee future results.
- Nicholas Tavares
- Amjad Hanini
- Brandon Dasilva
A quantitative research framework that tracks how multi-asset portfolio structures reorganize during geopolitical crises. We apply graph theory, community detection, information theory, and multi-layer network analysis to a universe of 96 ETFs spanning equities, fixed income, commodities, currencies, crypto, managed futures, thematics, and 18 country-specific funds from January 2010 through the most recent trading day.
| Phase | Focus | Methods |
|---|---|---|
| Phase 1 | Single-layer topology | Shrinkage correlation, Leiden clustering, CMI/TDS metrics, HMM regime detection |
| Phase 2 | Multi-layer analysis | Distance correlation, tail dependence, multiplex consensus clustering, layer agreement |
| Phase 3 | Directional flow | Transfer entropy (KSG), Granger causality, lead-lag networks, cross-layer causality |
Updated findings from v0.5.0 full pipeline run (2010-2026):
- Markets restructure BEFORE the event - Peak topology deformation (TDS=0.176, exceeding COVID) occurred during the buildup to Operation Epic Fury, not during the strikes
- Correlation alone is insufficient - The June 2025 Twelve-Day War was invisible to Pearson correlation but showed massive restructuring in nonlinear and tail-dependence measures
- Tail-dependence CMI Granger-causes Pearson CMI (p=0.041) - Crash-structure changes predict normal-correlation changes, providing a potential early warning signal
- Leadership reversal during war - Calm markets: credit/real estate lead. War: Treasury complex (TLT, GOVT, TIP) becomes the primary information sender
- COVID produced complete cluster dissolution (CMI=1.0) but recovered faster than the asymmetric geopolitical shocks
- Commodity/EM assets are most migration-prone — COLO (0.22), DBA (0.21), USO (0.21) highest AMF vs core equity <0.02
- Fixed income acts as network bridge — EMB/SHY highest betweenness centrality, critical information intermediaries
- EEM is most central asset overall — highest degree, eigenvector, and closeness centrality in the latest window
- Credit spreads drive information flow — HYG is the #1 net information sender (net flow +0.385), FXE is the #1 receiver
| Metric | Value |
|---|---|
| Assets surviving cleaning | 59 of 96 (37 excluded in early windows for insufficient history) |
| Trading days | 3,938 (2010-01-05 to 2026-03-20) |
| Rolling windows | 764 (120-day, 5-day step) |
| Active clusters (latest window) | 6 |
| HMM regimes | 3: calm (89.7%), transition (5.1%), stress (5.2%) |
| Current regime | STRESS (as of 2026-03-20) |
| Mean CMI | 0.094 |
| Mean TDS | 0.022 |
| Most stable assets | VTV (0.016), DIA (0.018) |
| Most volatile assets | COLO (0.216), DBA (0.208), USO (0.206) |
| Topology exports | 4 parquet files for downstream ML model integration |
Asset Cluster Migration/
├── config/
│ ├── universe.yaml # 96-ETF universe definition
│ ├── settings.yaml # API and pipeline settings
│ └── event_windows.yaml # 8 geopolitical event windows
├── src/
│ ├── data/
│ │ ├── fmp_client.py # Async FMP API client (rate-limited, cached)
│ │ ├── ingestion.py # Data fetching orchestrator
│ │ ├── cleaning.py # Alignment and forward-fill
│ │ └── universe.py # Universe config loader
│ ├── features/
│ │ ├── returns.py # Log/simple/excess returns
│ │ ├── similarity.py # 5 similarity measures (shrinkage, dCor, tail dep, etc.)
│ │ └── lead_lag.py # Transfer entropy (KSG), Granger causality, info flow
│ ├── graphs/
│ │ ├── construction.py # Threshold graph, MST, multilayer
│ │ ├── filtering.py # PMFG (Planar Maximally Filtered Graph)
│ │ └── topology.py # Laplacian eigenvalues, spectral distance
│ ├── clustering/
│ │ ├── community.py # Leiden, spectral, consensus
│ │ ├── multiplex.py # Multiplex consensus + layer agreement
│ │ └── kmeans.py # K-Means baseline comparison engine
│ ├── migration/
│ │ ├── metrics.py # CMI, AMF, CPS, TDS (novel metrics)
│ │ └── tracking.py # Migration path tracking, flow matrices
│ ├── regimes/
│ │ ├── hmm.py # Hidden Markov Model regime detection
│ │ ├── changepoint.py # PELT changepoint detection
│ │ └── validation.py # OOS regime validation (TimeSeriesSplit)
│ ├── robustness/ # Phase 4: Statistical robustness framework
│ │ ├── walk_forward.py # Walk-forward validation (train/test splits)
│ │ ├── bootstrap.py # Block bootstrap CIs (Politis & Romano 1992)
│ │ ├── sensitivity.py # Hyperparameter sensitivity sweeps
│ │ ├── multiple_testing.py # Bonferroni, BH-FDR, Storey q-value
│ │ └── surrogate_testing.py # Surrogate data + power analysis
│ └── pipeline/
│ ├── orchestrator.py # Full pipeline orchestrator (CLI)
│ ├── steps.py # Full pipeline step implementations
│ └── council_logger.py # Training/council run logging
├── outputs/
│ ├── final_report.pdf # Complete research report (24 figures, glossary, disclaimers)
│ └── figures/ # All 24 publication-quality figures
├── data/
│ ├── raw/ # Cached API responses (gitignored)
│ └── processed/ # Parquet files: returns, correlations, assignments, TE matrices
├── logs/ # Pipeline and council run logs
├── CHANGELOG.md # Version history (patch notes)
├── Makefile # Pipeline automation
└── pyproject.toml # Dependencies
| Category | Tickers | Count |
|---|---|---|
| US Equity & Value | SPY, QQQ, IWM, DIA, VTI, SCHD, VTV, JEPI, COWZ | 9 |
| US Sectors | XLE, XLF, XLV, XLU, XLI, XLK, XLP, XLY, XLB, XLC, XLRE, RSPN, SOXX | 13 |
| International | EFA, EEM, FXI, EWZ, EWJ, VGK, CQQQ, VYMI | 8 |
| Country ETFs | EIS, INDA, EIDO, GREK, EWI, EWN, EWG, EWU, EWW, COLO, ECH, ARGT, EWY, VNM, THD, EWS, EWT, EWA | 18 |
| Fixed Income | TLT, IEF, SHY, LQD, HYG, EMB, TIP, GOVT | 8 |
| Commodities | GLD, SLV, GDX, USO, DBA, DBC, PDBC, VNQ, COPX, URA | 10 |
| FX & Volatility | UUP, FXE, FXY, VIXY | 4 |
| Thematic & Defense | ITA, XAR, QTUM, BLOK | 4 |
| Global X Thematic | BOTZ, LIT, DRIV, SOCL, CLOU, BUG, AIQ, HERO, PAVE, KRMA, FINX, SNSR, EBIZ, GNOM, DTCR, SHLD | 16 |
| Managed Futures | DBMF, KMLM, CTA, WTMF | 4 |
| Crypto | BITO, IBIT | 2 |
# Clone
git clone https://github.com/studyalwaysbro/asset-cluster-migration.git
cd asset-cluster-migration
# Setup
python -m venv .venv
source .venv/Scripts/activate # Windows
pip install -e ".[dev]"
# Configure API key
cp .env.example .env
# Edit .env with your FMP API key
# Run full pipeline (fetches data, clusters, regimes, migration, centrality, exports)
python -m src.pipeline.orchestrator run-all
# Run with cached data (skip API fetch)
python -m src.pipeline.orchestrator run-all --skip-fetch
# Run individual steps
python -m src.pipeline.orchestrator run-step fetch-data
python -m src.pipeline.orchestrator run-step run-clustering
python -m src.pipeline.orchestrator run-step export-topology
# Export topology to external analysis cache only
python -m src.pipeline.orchestrator export-topology- CMI (Cluster Migration Index) - Fraction of assets that changed cluster assignment between windows
- TDS (Topology Deformation Score) - Composite of Wasserstein degree distance + NMI community divergence + spectral distance
- Layer Agreement - Pairwise NMI between Pearson, dCor, and tail-dependence clusterings
- Net Transfer Entropy - Directional information flow ranking via KSG estimator
- Cross-Layer Granger Causality - Tests whether tail-dependence CMI predicts Pearson CMI
The full research report is available at outputs/final_report.pdf. It includes:
- Executive summary and glossary of 19 terms
- Layman-language summaries in every section
- COVID-19 and Iran-Israel conflict event studies
- 24 publication-quality figures
- All methods described with plain-English explanations
- Research observations (explicitly NOT investment recommendations)
See CHANGELOG.md for detailed version history.
- K-Means clustering on same rolling windows as Leiden (
src/clustering/kmeans.py) - Per-window CMI comparison (K-Means vs Leiden)
- Cross-method agreement metrics (ARI, NMI, silhouette)
- Event-window summary table (pre / event / post aggregation)
- Generate side-by-side figures for paper (run
rolling_kmeans_baselineon full dataset)
- Forward-chaining TimeSeriesSplit validation (
src/regimes/validation.py) - Topology metrics → regime prediction with constrained RF (no overfitting)
- Per-fold accuracy + macro-F1 reporting
- Feature importance ranking (which topology metrics matter most)
- Run validation on full dataset and document results
-
Supervised Gradient Boosting predictive layer— removed (out of scope for descriptive research; reserved for future real-time forecasting extension)
Framework implemented + critical methodological fixes. See CHANGELOG.md for full details.
- CMI permutation invariance — Hungarian algorithm for cluster label matching (
migration/metrics.py) - TDS component z-score normalization —
TDSNormalizerclass for commensurable combination - TDS spectral distance — Wasserstein on Laplacian spectra (replaces zero-padded L2)
- CPS bidirectional matching — Hungarian-based instead of greedy best-Jaccard
- MST weight double-counting fix (
graphs/construction.py) - Granger Bonferroni across lags + ADF stationarity pre-check (
features/lead_lag.py) - PELT penalty fix — proper BIC:
d * log(n)(regimes/changepoint.py)
- Split: train on 2019-2022, test on 2023-2024. Re-train on 2019-2024, test on 2025-2026
- Cross-layer Granger causality (tail -> Pearson CMI) OOS replication test
- Topology crystallization pattern replication (restructuring before events)
- Early warning signal detection with false positive rate tracking
- Run on full dataset (2010-2026)
- Block bootstrap (Politis & Romano 1992) with configurable block size
- Generic
bootstrap_metric()for any scalar metric CI -
bootstrap_te_rankings(): TE leadership stability across 1000 resamples -
bootstrap_granger_f_stat(): cross-layer Granger F-stat robustness - Run on full dataset (2010-2026)
- Window size sweep: 60, 90, 120, 150, 180, 252 days
- Top-k threshold sweep: 3, 5, 7, 10 edges per node
- Leiden resolution sweep: 0.3, 0.5, 0.7, 1.0, 1.3, 1.5, 2.0
- Tail quantile sweep: 0.01, 0.03, 0.05, 0.10
- Automatic stability assessment (ROBUST / MODERATE / SENSITIVE)
- Run on full dataset (2010-2026)
- Bonferroni correction (FWER control)
- Benjamini-Hochberg FDR
- Storey's q-value (adaptive FDR with pi_0 estimation)
- Aggregate binomial test (more significant pairs than chance?)
-
summarize_corrections()for publication-ready comparison table - Run on full dataset (2010-2026)
- Phase-randomized surrogates (Theiler et al. 1992) — preserves power spectrum
- IAAFT surrogates (Schreiber & Schmitz 1996) — preserves spectrum + distribution
- Surrogate TE significance test (null: TE from autocorrelation alone)
- Stationary block bootstrap (Politis & Romano 1994) — geometric block lengths
- Monte Carlo minimum sample size estimation (power analysis)
- Run on full dataset (2010-2026)
- Full 8-step pipeline with CLI (fetch, validate, build-features, clustering, regimes, migration, centrality, export-topology)
- Topology export to external analysis cache cache for downstream model consumption
- Disconnected graph handling for eigenvector centrality
- Run logging with JSONL summaries
- Extended data range to 2010 (was 2019)
- Completed GICS sector coverage (XLY, XLB, XLC, SOXX)
- 8 geopolitical event windows (COVID, EU debt crisis 2011, Fed 2022, SVB 2023, Japan carry 2024, Iran-Israel x2, DeepSeek 2025)
- Replace batch FMP fetch with streaming price feed (WebSocket or polling)
- Incremental rolling window update (append new day, drop oldest)
- Intraday granularity option (hourly windows for faster signal detection)
- Incremental shrinkage correlation (rank-1 update instead of full recompute)
- Online Leiden with warm-start from previous partition
- Streaming transfer entropy with exponential decay weighting
- Web dashboard (Streamlit or Dash): live CMI, TDS, layer agreement
- Configurable alert thresholds on tail CMI, layer agreement, TE leadership reversals
- Historical comparison overlay (current window vs. COVID, vs. Epic Fury buildup)
- Interactive cluster network visualization (D3.js or Plotly)
- Extend to 2008: Partially achieved — now starts 2010. GFC extension requires sourcing pre-inception data for newer ETFs.
- Higher-frequency analysis: Intraday 5-min returns during crisis windows
- Cross-market extension: Sovereign CDS, VIX term structure, yield curve factors
- Causal discovery: PCMCI+ or DYNOTEARS for full causal graph learning
- Geopolitical NLP layer: GDELT or news embeddings as an additional similarity layer
- Crypto deep-dive: Individual tokens (BTC, ETH, SOL) + DeFi indices
- Alternative clustering: Infomap, Stochastic Block Models, compare to Leiden
- Publication: Target Journal of Financial Economics, Review of Financial Studies, or Journal of Portfolio Management
The mutable logical workflow:
1. FOUNDATION (Completed)
├── Multi-asset universe construction (91→96 ETFs, 16 years)
├── Rolling-window similarity computation (3 layers)
├── Community detection + migration tracking (CMI, TDS, AMF)
└── Baseline event studies (COVID, Iran-Israel)
2. MULTI-LAYER ANALYSIS (Completed)
├── Distance correlation + tail dependence layers
├── Multiplex consensus clustering
├── Layer agreement as a meta-signal
└── Cross-layer divergence analysis
3. INFORMATION FLOW (Completed)
├── Transfer entropy (KSG estimator)
├── Granger causality network
├── Regime-conditional leadership reversal
└── Cross-layer Granger causality (key discovery)
4. STATISTICAL ROBUSTNESS (Complete — Run on Full Dataset)
├── Methodological audit: fixed CMI permutation invariance, TDS scaling, Granger corrections
├── Walk-forward validation (train 2019-2022/test 2023-2024, expand + retest)
├── Block bootstrap confidence intervals (Politis & Romano 1992)
├── Sensitivity sweeps: window size, top-k, resolution, tail quantile
├── Multiple testing correction: Bonferroni, BH-FDR, Storey q-value
├── Surrogate data testing: phase-randomized + IAAFT null distributions
├── Monte Carlo power analysis for minimum sample size estimation
└── Pipeline automation: 8-step CLI, topology export, run logging
5. REAL-TIME EXTENSION
├── Data range extended to 2010, GICS sectors completed, 8 event windows
├── Streaming data pipeline + incremental rolling window
├── Dashboard (Streamlit/Dash): live CMI, TDS, layer agreement
├── Alert system on tail CMI, TE leadership reversals
└── Live validation against emerging events
6. EXTENDED RESEARCH
├── Extend to 2008 (GFC) — partially achieved, now starts 2010
├── Intraday 5-min analysis during crisis windows
├── Causal discovery (PCMCI+, DYNOTEARS)
├── Geopolitical NLP layer (GDELT, news embeddings)
└── Crypto deep-dive (BTC, ETH, SOL, DeFi indices)
7. PUBLICATION
├── Working paper with full methodology
├── Replication package (this repository)
├── Conference presentations (AFA, EFA, INFORMS)
└── Journal submission (JFE, RFS, JPM)
MIT License - see LICENSE for details.
If you use this work in academic research, please cite:
@misc{tavares2026topology,
title={Dynamic Multi-Asset Topology and Cluster Migration Under Geopolitical Stress},
author={Tavares, Nicholas and Hanini, Amjad and Dasilva, Brandon},
year={2026},
note={Available at: https://github.com/studyalwaysbro/asset-cluster-migration}
}Reminder: This project is for educational and research purposes only. It does not constitute investment advice and is not produced in any broker-dealer or investment advisory capacity. See the full disclaimer in the research report.