Skip to content

feat: add James-Stein shrinkage expected returns estimator#746

Open
jrile018 wants to merge 2 commits into
PyPortfolio:mainfrom
jrile018:osc/new-returns-estimator
Open

feat: add James-Stein shrinkage expected returns estimator#746
jrile018 wants to merge 2 commits into
PyPortfolio:mainfrom
jrile018:osc/new-returns-estimator

Conversation

@jrile018

@jrile018 jrile018 commented Jul 3, 2026

Copy link
Copy Markdown

Summary

Adds a James–Stein shrinkage estimator for expected returns to pypfopt.expected_returns, and wires it into the return_model() dispatcher as method="james_stein_return".

Mean-variance optimizers are notoriously sensitive to errors in the expected-returns vector. James–Stein shrinkage pulls each asset's estimated return toward the cross-sectional (grand) mean, which reduces estimation error — especially when the number of assets is large relative to the length of the price history.

What's changed

  • New function james_stein_return(prices, returns_data=False, shrinkage=None, compounding=True, frequency=252, log_returns=False) in pypfopt/expected_returns.py.
    • Shrinks the annualised expected-return vector toward its grand mean, so shrinkage=0 reduces exactly to mean_historical_return, and shrinkage=1 returns the grand mean for every asset.
    • shrinkage=None (default) estimates the intensity from the data via a James–Stein / SURE rule. The estimate is constructed so it is invariant to the frequency argument.
    • Follows the existing expected-returns API and house style (same handling of returns_data, compounding, frequency, log_returns, and the non-DataFrame RuntimeWarning).
  • Dispatcher: return_model() now accepts method="james_stein_return" (docstring updated).
  • Docs: autofunction entry added to docs/ExpectedReturns.rst; module docstring bullet added.
  • Tests: 10 new tests in tests/test_expected_returns.py.

Rationale

James–Stein shrinkage is a classical, theoretically grounded way to reduce estimation error in a mean vector (it dominates the sample mean under quadratic loss for dimension ≥ 3). It slots naturally alongside the existing mean_historical_return / ema_historical_return / capm_return estimators and requires no new dependencies (numpy/pandas only).

Testing

All checks pass locally (Python 3.11):

  • New James–Stein tests: 10/10 pass (core behaviour, shrinkage bounds and limits, auto-shrinkage, compounding, frequency scaling, dispatcher integration, non-DataFrame warning, log returns, pre-computed returns).
  • Full tests/test_expected_returns.py: all pass.
  • Full test suite: 294 passed, 33 skipped (no regressions).
  • ruff format --check: clean. ruff check: clean.
pytest tests/test_expected_returns.py -v
ruff format --check pypfopt/expected_returns.py tests/test_expected_returns.py
ruff check pypfopt/expected_returns.py tests/test_expected_returns.py

Checklist

  • Follows ruff formatting / lint (line-length 88)
  • Unit tests added (core + parameter variants + error/edge cases + dispatcher)
  • Docstring (PEP257 / Sphinx) + ReadTheDocs autofunction
  • No new dependencies
  • Backward compatible (additive: new dispatcher method only)

Reference

Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proc. Third Berkeley Symp. on Math. Statist. and Prob., Vol. 1, 197–206.

Copilot AI review requested due to automatic review settings July 3, 2026 05:13

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new expected-returns estimator based on James–Stein shrinkage and integrates it into the existing expected-returns API, documentation, and test suite.

Changes:

  • Implement expected_returns.james_stein_return() with optional data-driven shrinkage intensity.
  • Extend expected_returns.return_model() to dispatch method="james_stein_return".
  • Document the new estimator and add unit tests covering core behaviors and parameter variants.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
pypfopt/expected_returns.py Adds the James–Stein expected returns function and wires it into return_model().
tests/test_expected_returns.py Adds unit tests for James–Stein behavior, bounds, options, and dispatcher integration.
docs/ExpectedReturns.rst Documents the new expected-returns estimator via Sphinx autofunction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pypfopt/expected_returns.py Outdated
Comment on lines +308 to +320
p = returns.shape[1]
dispersion = float(((mu - grand_mean) ** 2).sum())
if p <= 2 or dispersion <= 1e-12:
# Degenerate case: no meaningful cross-sectional dispersion to shrink,
# or too few assets for the James-Stein rule. Fall back to full
# shrinkage when all means coincide, otherwise none.
shrinkage = 1.0 if dispersion <= 1e-12 else 0.0
else:
# Variance of each annualised mean estimator (~ frequency**2 * var / n),
# averaged across assets. The frequency**2 factor cancels against the
# annualised dispersion, so the estimate is frequency-invariant.
tau_squared = (frequency**2 * returns.var(ddof=1) / returns.count()).mean()
shrinkage = float(np.clip((p - 2) * tau_squared / dispersion, 0.0, 1.0))
Add james_stein_return() to pypfopt.expected_returns. It shrinks each
asset's annualised expected return towards the cross-sectional (grand)
mean, reducing estimation error in the mean vector that mean-variance
optimisers are highly sensitive to. shrinkage=0 recovers
mean_historical_return exactly; shrinkage=1 returns the grand mean for
every asset; shrinkage=None (default) estimates the intensity from the
data via a James-Stein/SURE rule.

- Wire into the return_model() dispatcher as method="james_stein_return"
- 10 unit tests covering the estimator, parameters, dispatcher and edges
- Sphinx autofunction entry in docs/ExpectedReturns.rst

No new dependencies (numpy/pandas only).
@jrile018 jrile018 force-pushed the osc/new-returns-estimator branch from c508b94 to 7f603ef Compare July 3, 2026 05:19
Only count assets with at least two observations and a finite mean when
computing the James-Stein dimensionality p and cross-sectional dispersion,
so all-NaN / single-observation columns no longer inflate p and bias the
auto-estimated shrinkage intensity. Clarify in-code that the frequency
invariance of the intensity is exact for the arithmetic mean and only
approximate for the compounding mean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants