Dynamic Multi-Asset Topology and Cluster Migration Under Geopolitical Stress

Disclaimer: This project is produced solely for educational and academic research purposes. Nothing in this repository constitutes investment advice, a solicitation, or a recommendation to buy, sell, or hold any security or financial instrument. This project is not connected to, endorsed by, or produced in any professional capacity related to any broker-dealer or investment advisory firm. The authors are presenting independent academic research on asset clustering methodologies. Past performance and historical patterns do not guarantee future results.

Authors

Nicholas Tavares
Amjad Hanini
Brandon Dasilva

Overview

A quantitative research framework that tracks how multi-asset portfolio structures reorganize during geopolitical crises. We apply graph theory, community detection, information theory, and multi-layer network analysis to a universe of 96 ETFs spanning equities, fixed income, commodities, currencies, crypto, managed futures, thematics, and 18 country-specific funds from January 2010 through the most recent trading day.

Three-Phase Architecture

Phase	Focus	Methods
Phase 1	Single-layer topology	Shrinkage correlation, Leiden clustering, CMI/TDS metrics, HMM regime detection
Phase 2	Multi-layer analysis	Distance correlation, tail dependence, multiplex consensus clustering, layer agreement
Phase 3	Directional flow	Transfer entropy (KSG), Granger causality, lead-lag networks, cross-layer causality

Key Findings

Updated findings from v0.5.0 full pipeline run (2010-2026):

Markets restructure BEFORE the event - Peak topology deformation (TDS=0.176, exceeding COVID) occurred during the buildup to Operation Epic Fury, not during the strikes
Correlation alone is insufficient - The June 2025 Twelve-Day War was invisible to Pearson correlation but showed massive restructuring in nonlinear and tail-dependence measures
Tail-dependence CMI Granger-causes Pearson CMI (p=0.041) - Crash-structure changes predict normal-correlation changes, providing a potential early warning signal
Leadership reversal during war - Calm markets: credit/real estate lead. War: Treasury complex (TLT, GOVT, TIP) becomes the primary information sender
COVID produced complete cluster dissolution (CMI=1.0) but recovered faster than the asymmetric geopolitical shocks
Commodity/EM assets are most migration-prone — COLO (0.22), DBA (0.21), USO (0.21) highest AMF vs core equity <0.02
Fixed income acts as network bridge — EMB/SHY highest betweenness centrality, critical information intermediaries
EEM is most central asset overall — highest degree, eigenvector, and closeness centrality in the latest window
Credit spreads drive information flow — HYG is the #1 net information sender (net flow +0.385), FXE is the #1 receiver

Latest Pipeline Results (v0.5.0)

Metric	Value
Assets surviving cleaning	59 of 96 (37 excluded in early windows for insufficient history)
Trading days	3,938 (2010-01-05 to 2026-03-20)
Rolling windows	764 (120-day, 5-day step)
Active clusters (latest window)	6
HMM regimes	3: calm (89.7%), transition (5.1%), stress (5.2%)
Current regime	STRESS (as of 2026-03-20)
Mean CMI	0.094
Mean TDS	0.022
Most stable assets	VTV (0.016), DIA (0.018)
Most volatile assets	COLO (0.216), DBA (0.208), USO (0.206)
Topology exports	4 parquet files for downstream ML model integration

Repository Structure

Asset Cluster Migration/
├── config/
│   ├── universe.yaml          # 96-ETF universe definition
│   ├── settings.yaml          # API and pipeline settings
│   └── event_windows.yaml     # 8 geopolitical event windows
├── src/
│   ├── data/
│   │   ├── fmp_client.py      # Async FMP API client (rate-limited, cached)
│   │   ├── ingestion.py       # Data fetching orchestrator
│   │   ├── cleaning.py        # Alignment and forward-fill
│   │   └── universe.py        # Universe config loader
│   ├── features/
│   │   ├── returns.py         # Log/simple/excess returns
│   │   ├── similarity.py      # 5 similarity measures (shrinkage, dCor, tail dep, etc.)
│   │   └── lead_lag.py        # Transfer entropy (KSG), Granger causality, info flow
│   ├── graphs/
│   │   ├── construction.py    # Threshold graph, MST, multilayer
│   │   ├── filtering.py       # PMFG (Planar Maximally Filtered Graph)
│   │   └── topology.py        # Laplacian eigenvalues, spectral distance
│   ├── clustering/
│   │   ├── community.py       # Leiden, spectral, consensus
│   │   ├── multiplex.py       # Multiplex consensus + layer agreement
│   │   └── kmeans.py          # K-Means baseline comparison engine
│   ├── migration/
│   │   ├── metrics.py         # CMI, AMF, CPS, TDS (novel metrics)
│   │   └── tracking.py        # Migration path tracking, flow matrices
│   ├── regimes/
│   │   ├── hmm.py             # Hidden Markov Model regime detection
│   │   ├── changepoint.py     # PELT changepoint detection
│   │   └── validation.py      # OOS regime validation (TimeSeriesSplit)
│   ├── robustness/            # Phase 4: Statistical robustness framework
│   │   ├── walk_forward.py    # Walk-forward validation (train/test splits)
│   │   ├── bootstrap.py       # Block bootstrap CIs (Politis & Romano 1992)
│   │   ├── sensitivity.py     # Hyperparameter sensitivity sweeps
│   │   ├── multiple_testing.py # Bonferroni, BH-FDR, Storey q-value
│   │   └── surrogate_testing.py # Surrogate data + power analysis
│   └── pipeline/
│       ├── orchestrator.py    # Full pipeline orchestrator (CLI)
│       ├── steps.py           # Full pipeline step implementations
│       └── council_logger.py  # Training/council run logging
├── outputs/
│   ├── final_report.pdf       # Complete research report (24 figures, glossary, disclaimers)
│   └── figures/               # All 24 publication-quality figures
├── data/
│   ├── raw/                   # Cached API responses (gitignored)
│   └── processed/             # Parquet files: returns, correlations, assignments, TE matrices
├── logs/                      # Pipeline and council run logs
├── CHANGELOG.md               # Version history (patch notes)
├── Makefile                   # Pipeline automation
└── pyproject.toml             # Dependencies

Asset Universe (96 ETFs)

Category	Tickers	Count
US Equity & Value	SPY, QQQ, IWM, DIA, VTI, SCHD, VTV, JEPI, COWZ	9
US Sectors	XLE, XLF, XLV, XLU, XLI, XLK, XLP, XLY, XLB, XLC, XLRE, RSPN, SOXX	13
International	EFA, EEM, FXI, EWZ, EWJ, VGK, CQQQ, VYMI	8
Country ETFs	EIS, INDA, EIDO, GREK, EWI, EWN, EWG, EWU, EWW, COLO, ECH, ARGT, EWY, VNM, THD, EWS, EWT, EWA	18
Fixed Income	TLT, IEF, SHY, LQD, HYG, EMB, TIP, GOVT	8
Commodities	GLD, SLV, GDX, USO, DBA, DBC, PDBC, VNQ, COPX, URA	10
FX & Volatility	UUP, FXE, FXY, VIXY	4
Thematic & Defense	ITA, XAR, QTUM, BLOK	4
Global X Thematic	BOTZ, LIT, DRIV, SOCL, CLOU, BUG, AIQ, HERO, PAVE, KRMA, FINX, SNSR, EBIZ, GNOM, DTCR, SHLD	16
Managed Futures	DBMF, KMLM, CTA, WTMF	4
Crypto	BITO, IBIT	2

Quick Start

# Clone
git clone https://github.com/studyalwaysbro/asset-cluster-migration.git
cd asset-cluster-migration

# Setup
python -m venv .venv
source .venv/Scripts/activate  # Windows
pip install -e ".[dev]"

# Configure API key
cp .env.example .env
# Edit .env with your FMP API key

# Run full pipeline (fetches data, clusters, regimes, migration, centrality, exports)
python -m src.pipeline.orchestrator run-all

# Run with cached data (skip API fetch)
python -m src.pipeline.orchestrator run-all --skip-fetch

# Run individual steps
python -m src.pipeline.orchestrator run-step fetch-data
python -m src.pipeline.orchestrator run-step run-clustering
python -m src.pipeline.orchestrator run-step export-topology

# Export topology to external analysis cache only
python -m src.pipeline.orchestrator export-topology

Novel Metrics

CMI (Cluster Migration Index) - Fraction of assets that changed cluster assignment between windows
TDS (Topology Deformation Score) - Composite of Wasserstein degree distance + NMI community divergence + spectral distance
Layer Agreement - Pairwise NMI between Pearson, dCor, and tail-dependence clusterings
Net Transfer Entropy - Directional information flow ranking via KSG estimator
Cross-Layer Granger Causality - Tests whether tail-dependence CMI predicts Pearson CMI

Report

The full research report is available at outputs/final_report.pdf. It includes:

Executive summary and glossary of 19 terms
Layman-language summaries in every section
COVID-19 and Iran-Israel conflict event studies
24 publication-quality figures
All methods described with plain-English explanations
Research observations (explicitly NOT investment recommendations)

Roadmap

See CHANGELOG.md for detailed version history.

Phase 3.5: Baseline Comparisons & Validation (v0.2.0 — Completed)

3.5.1 K-Means Baseline

K-Means clustering on same rolling windows as Leiden (src/clustering/kmeans.py)
Per-window CMI comparison (K-Means vs Leiden)
Cross-method agreement metrics (ARI, NMI, silhouette)
Event-window summary table (pre / event / post aggregation)
Generate side-by-side figures for paper (run rolling_kmeans_baseline on full dataset)

3.5.2 Out-of-Sample Regime Validation

Forward-chaining TimeSeriesSplit validation (src/regimes/validation.py)
Topology metrics → regime prediction with constrained RF (no overfitting)
Per-fold accuracy + macro-F1 reporting
Feature importance ranking (which topology metrics matter most)
Run validation on full dataset and document results

3.5.3 Removed

~~Supervised Gradient Boosting predictive layer~~ — removed (out of scope for descriptive research; reserved for future real-time forecasting extension)

Phase 4: Statistical Robustness (v0.4.0 — Completed)

Framework implemented + critical methodological fixes. See CHANGELOG.md for full details.

4.0 Methodological Audit & Fixes

CMI permutation invariance — Hungarian algorithm for cluster label matching (migration/metrics.py)
TDS component z-score normalization — TDSNormalizer class for commensurable combination
TDS spectral distance — Wasserstein on Laplacian spectra (replaces zero-padded L2)
CPS bidirectional matching — Hungarian-based instead of greedy best-Jaccard
MST weight double-counting fix (graphs/construction.py)
Granger Bonferroni across lags + ADF stationarity pre-check (features/lead_lag.py)
PELT penalty fix — proper BIC: d * log(n) (regimes/changepoint.py)

4.1 Walk-Forward Validation (`src/robustness/walk_forward.py`)

Split: train on 2019-2022, test on 2023-2024. Re-train on 2019-2024, test on 2025-2026
Cross-layer Granger causality (tail -> Pearson CMI) OOS replication test
Topology crystallization pattern replication (restructuring before events)
Early warning signal detection with false positive rate tracking
Run on full dataset (2010-2026)

4.2 Bootstrap & Confidence Intervals (`src/robustness/bootstrap.py`)

Block bootstrap (Politis & Romano 1992) with configurable block size
Generic bootstrap_metric() for any scalar metric CI
bootstrap_te_rankings(): TE leadership stability across 1000 resamples
bootstrap_granger_f_stat(): cross-layer Granger F-stat robustness
Run on full dataset (2010-2026)

4.3 Sensitivity Analysis (`src/robustness/sensitivity.py`)

Window size sweep: 60, 90, 120, 150, 180, 252 days
Top-k threshold sweep: 3, 5, 7, 10 edges per node
Leiden resolution sweep: 0.3, 0.5, 0.7, 1.0, 1.3, 1.5, 2.0
Tail quantile sweep: 0.01, 0.03, 0.05, 0.10
Automatic stability assessment (ROBUST / MODERATE / SENSITIVE)
Run on full dataset (2010-2026)

4.4 Multiple Testing Correction (`src/robustness/multiple_testing.py`)

Bonferroni correction (FWER control)
Benjamini-Hochberg FDR
Storey's q-value (adaptive FDR with pi_0 estimation)
Aggregate binomial test (more significant pairs than chance?)
summarize_corrections() for publication-ready comparison table
Run on full dataset (2010-2026)

4.5 Small-Sample Robustness (`src/robustness/surrogate_testing.py`)

Phase-randomized surrogates (Theiler et al. 1992) — preserves power spectrum
IAAFT surrogates (Schreiber & Schmitz 1996) — preserves spectrum + distribution
Surrogate TE significance test (null: TE from autocorrelation alone)
Stationary block bootstrap (Politis & Romano 1994) — geometric block lengths
Monte Carlo minimum sample size estimation (power analysis)
Run on full dataset (2010-2026)

4.6 Pipeline Automation & Integration (v0.5.0 — Completed)

Full 8-step pipeline with CLI (fetch, validate, build-features, clustering, regimes, migration, centrality, export-topology)
Topology export to external analysis cache cache for downstream model consumption
Disconnected graph handling for eigenvector centrality
Run logging with JSONL summaries

Phase 5: Real-Time Extension

5.0 Data & Coverage Expansion (v0.5.0 — Completed)

Extended data range to 2010 (was 2019)
Completed GICS sector coverage (XLY, XLB, XLC, SOXX)
8 geopolitical event windows (COVID, EU debt crisis 2011, Fed 2022, SVB 2023, Japan carry 2024, Iran-Israel x2, DeepSeek 2025)

5.1 Streaming Pipeline

Replace batch FMP fetch with streaming price feed (WebSocket or polling)
Incremental rolling window update (append new day, drop oldest)
Intraday granularity option (hourly windows for faster signal detection)

5.2 Real-Time Computation

Incremental shrinkage correlation (rank-1 update instead of full recompute)
Online Leiden with warm-start from previous partition
Streaming transfer entropy with exponential decay weighting

5.3 Dashboard & Alerting

Web dashboard (Streamlit or Dash): live CMI, TDS, layer agreement
Configurable alert thresholds on tail CMI, layer agreement, TE leadership reversals
Historical comparison overlay (current window vs. COVID, vs. Epic Fury buildup)
Interactive cluster network visualization (D3.js or Plotly)

Phase 6: Extended Research

Extend to 2008: Partially achieved — now starts 2010. GFC extension requires sourcing pre-inception data for newer ETFs.
Higher-frequency analysis: Intraday 5-min returns during crisis windows
Cross-market extension: Sovereign CDS, VIX term structure, yield curve factors
Causal discovery: PCMCI+ or DYNOTEARS for full causal graph learning
Geopolitical NLP layer: GDELT or news embeddings as an additional similarity layer
Crypto deep-dive: Individual tokens (BTC, ETH, SOL) + DeFi indices
Alternative clustering: Infomap, Stochastic Block Models, compare to Leiden
Publication: Target Journal of Financial Economics, Review of Financial Studies, or Journal of Portfolio Management

Research Workflow

The mutable logical workflow:

1. FOUNDATION (Completed)
   ├── Multi-asset universe construction (91→96 ETFs, 16 years)
   ├── Rolling-window similarity computation (3 layers)
   ├── Community detection + migration tracking (CMI, TDS, AMF)
   └── Baseline event studies (COVID, Iran-Israel)

2. MULTI-LAYER ANALYSIS (Completed)
   ├── Distance correlation + tail dependence layers
   ├── Multiplex consensus clustering
   ├── Layer agreement as a meta-signal
   └── Cross-layer divergence analysis

3. INFORMATION FLOW (Completed)
   ├── Transfer entropy (KSG estimator)
   ├── Granger causality network
   ├── Regime-conditional leadership reversal
   └── Cross-layer Granger causality (key discovery)

4. STATISTICAL ROBUSTNESS (Complete — Run on Full Dataset)
   ├── Methodological audit: fixed CMI permutation invariance, TDS scaling, Granger corrections
   ├── Walk-forward validation (train 2019-2022/test 2023-2024, expand + retest)
   ├── Block bootstrap confidence intervals (Politis & Romano 1992)
   ├── Sensitivity sweeps: window size, top-k, resolution, tail quantile
   ├── Multiple testing correction: Bonferroni, BH-FDR, Storey q-value
   ├── Surrogate data testing: phase-randomized + IAAFT null distributions
   ├── Monte Carlo power analysis for minimum sample size estimation
   └── Pipeline automation: 8-step CLI, topology export, run logging

5. REAL-TIME EXTENSION
   ├── Data range extended to 2010, GICS sectors completed, 8 event windows
   ├── Streaming data pipeline + incremental rolling window
   ├── Dashboard (Streamlit/Dash): live CMI, TDS, layer agreement
   ├── Alert system on tail CMI, TE leadership reversals
   └── Live validation against emerging events

6. EXTENDED RESEARCH
   ├── Extend to 2008 (GFC) — partially achieved, now starts 2010
   ├── Intraday 5-min analysis during crisis windows
   ├── Causal discovery (PCMCI+, DYNOTEARS)
   ├── Geopolitical NLP layer (GDELT, news embeddings)
   └── Crypto deep-dive (BTC, ETH, SOL, DeFi indices)

7. PUBLICATION
   ├── Working paper with full methodology
   ├── Replication package (this repository)
   ├── Conference presentations (AFA, EFA, INFORMS)
   └── Journal submission (JFE, RFS, JPM)

License

MIT License - see LICENSE for details.

Citation

If you use this work in academic research, please cite:

@misc{tavares2026topology,
  title={Dynamic Multi-Asset Topology and Cluster Migration Under Geopolitical Stress},
  author={Tavares, Nicholas and Hanini, Amjad and Dasilva, Brandon},
  year={2026},
  note={Available at: https://github.com/studyalwaysbro/asset-cluster-migration}
}

Reminder: This project is for educational and research purposes only. It does not constitute investment advice and is not produced in any broker-dealer or investment advisory capacity. See the full disclaimer in the research report.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
data/processed		data/processed
outputs		outputs
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RESEARCH_DESIGN.md		RESEARCH_DESIGN.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Dynamic Multi-Asset Topology and Cluster Migration Under Geopolitical Stress

Authors

Overview

Three-Phase Architecture

Key Findings

Latest Pipeline Results (v0.5.0)

Repository Structure

Asset Universe (96 ETFs)

Quick Start

Novel Metrics

Report

Roadmap

Phase 3.5: Baseline Comparisons & Validation (v0.2.0 — Completed)

3.5.1 K-Means Baseline

3.5.2 Out-of-Sample Regime Validation

3.5.3 Removed

Phase 4: Statistical Robustness (v0.4.0 — Completed)

4.0 Methodological Audit & Fixes

4.1 Walk-Forward Validation (src/robustness/walk_forward.py)

4.2 Bootstrap & Confidence Intervals (src/robustness/bootstrap.py)

4.3 Sensitivity Analysis (src/robustness/sensitivity.py)

4.4 Multiple Testing Correction (src/robustness/multiple_testing.py)

4.5 Small-Sample Robustness (src/robustness/surrogate_testing.py)

4.6 Pipeline Automation & Integration (v0.5.0 — Completed)

Phase 5: Real-Time Extension

5.0 Data & Coverage Expansion (v0.5.0 — Completed)

5.1 Streaming Pipeline

5.2 Real-Time Computation

5.3 Dashboard & Alerting

Phase 6: Extended Research

Research Workflow

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

4.1 Walk-Forward Validation (`src/robustness/walk_forward.py`)

4.2 Bootstrap & Confidence Intervals (`src/robustness/bootstrap.py`)

4.3 Sensitivity Analysis (`src/robustness/sensitivity.py`)

4.4 Multiple Testing Correction (`src/robustness/multiple_testing.py`)

4.5 Small-Sample Robustness (`src/robustness/surrogate_testing.py`)

Packages