diff --git a/lectures/March_26_DMD_junk.md b/lectures/March_26_DMD_junk.md
deleted file mode 100644
index 7e2dcb890..000000000
--- a/lectures/March_26_DMD_junk.md
+++ /dev/null
@@ -1,535 +0,0 @@
-
-## Old Stuff -- Pre March 26
-
-We turn to the case in which $m >>n$ in which an $m \times n$  data matrix $\tilde X$ contains many more random variables $m$ than observations $n$.
-
-This  **tall and skinny** case is associated with **Dynamic Mode Decomposition**.
-
-You can read about Dynamic Mode Decomposition here {cite}`DMD_book` and here {cite}`DDSE_book` (section 7.2).
-
-We start  with an $m \times n $ matrix of data $\tilde X$ of the form 
-
-
-$$
- \tilde X =  \begin{bmatrix} X_1 \mid X_2 \mid \cdots \mid X_n\end{bmatrix}
-$$ 
-
-where for $t = 1, \ldots, n$,  the $m \times 1 $ vector $X_t$ is
-
-$$ X_t = \begin{bmatrix}  X_{1,t} & X_{2,t} & \cdots & X_{m,t}     \end{bmatrix}^T $$
-
-where $T$ again denotes complex transposition and $X_{i,t}$ is an observation on variable $i$ at time $t$.
-
-From $\tilde X$,   form two matrices 
-
-$$
- X =  \begin{bmatrix} X_1 \mid X_2 \mid \cdots \mid X_{n-1}\end{bmatrix}
-$$ 
-
-and
-
-$$
-X' =  \begin{bmatrix} X_2 \mid X_3 \mid \cdots \mid X_n\end{bmatrix}
-$$
-
-Here $'$ does not denote matrix transposition but instead is part of the name of the matrix $X'$.
-
-In forming $ X$ and $X'$, we have in each case  dropped a column from $\tilde X$,  the last column in the case of $X$, and  the first column in the case of $X'$.
-
-Evidently, $ X$ and $ X'$ are both $m \times \tilde n$ matrices where $\tilde n = n - 1$.
-
-We denote the rank of $X$ as $p \leq \min(m, \tilde n) = \tilde n$.
-
-We start with a system consisting of $m$ least squares regressions of **everything** on one lagged value of **everything**:
-
-$$
- X' = A  X + \epsilon
-$$ 
-
-where $\epsilon$ is an $m \times m$ matrix of least squares residuals satisfying
-
-$$
-\epsilon X^+ = 0
-$$
-
-and 
-
-$$
-A =  X'  X^{+} .
-$$ (eq:Afullformula)
-
-Here the (possibly huge) $\tilde n \times m $ matrix $X^{+}$ is the Moore-Penrose generalized inverse of $X$.
-
-The $i$th the row of $A$ is an $m \times 1$ vector of regression coefficients of $X_{i,t+1}$ on $X_{j,t}, j = 1, \ldots, m$.
-
-
-Consider the (reduced) singular value decomposition 
-
-  $$ 
-  X =  U \Sigma  V^T
-  $$ (eq:SVDforDMD)
-
-
-  
-where $U$ is $m \times p$, $\Sigma$ is a $p \times p$ diagonal  matrix, and $ V^T$ is a $p \times \tilde n$ matrix.
-
-Here $p$ is the rank of $X$, where necessarily $p \leq \tilde n$. 
-
-(We  described and illustrated a **reduced** singular value decomposition above, and compared it with a **full** singular value decomposition.)  
-
-We could construct the generalized inverse $X^+$  of $X$ by using
-a singular value decomposition  $X = U \Sigma V^T$ to compute
-
-$$
-X^{+} =  V \Sigma^{-1}  U^T
-$$ (eq:Xpinverse)
-
-where the matrix $\Sigma^{-1}$ is constructed by replacing each non-zero element of $ \Sigma$ with $\sigma_j^{-1}$.
-
-We could use formula {eq}`eq:Xpinverse`   together with formula {eq}`eq:Afullformula` to compute the matrix  $A$ of regression coefficients.
-
-Instead of doing that, we'll eventually use **dynamic mode decomposition** to compute a rank $r$ approximation to $A$,
-where $r <  p$.  
-
-
-The idea behind **dynamic mode decomposition** is to construct this low rank  approximation to $A$ that  
-
-
-* constructs an $m \times r$ matrix $\Phi$ that captures effects  on all $m$ variables of $r \leq p$  **modes** that are associated with the $r$ largest eigenvalues of $A$
-
-   
-* uses $\Phi$, the current value of $X_t$, and  powers of the $r$ largest eigenvalues of $A$ to forecast *future* $X_{t+j}$'s
-
-
-
-
-##  Analysis
-
-We'll put basic ideas on the table by starting with the special case in which $r = p$ so that we retain
-all $p$ singular values of $X$.
-
-(Later, we'll retain only $r < p$ of them)
-
-When $r = p$,  formula
-{eq}`eq:Xpinverse`  for $X^+$ implies that 
-
-
-$$
-A = X' V \Sigma^{-1}  U^T
-$$ (eq:Aformbig)
-
-where $V$ is an $\tilde n \times p$ matrix, $\Sigma^{-1}$ is a $p \times p$ matrix,  $U^T$ is a $p \times m$ matrix,
-and  $U^T  U = I_p$ and $V V^T = I_m $.
-
-
-It is convenient to represent $A$ as computed in  equation {eq}`eq:Aformbig` as
-
-$$
-A = U \tilde A U^T
-$$ (eq:Afactortilde)
-
-where the   $p \times p$ transition matrix $\tilde A$ can be recovered from 
-
-$$
- \tilde A = U^T A U = U^T X' V \Sigma^{-1} .
-$$ (eq:Atilde0)
-
-We use the $p$  columns of $U$, and thus the $p$ rows of $U^T$,  to define   a $p \times 1$  vector $\tilde X_t$ as follows
-
-
-$$
-\tilde X_t = U^T X_t .
-$$ (eq:tildeXdef2)
-
-Since $U U^T$ is an $m \times m$ identity matrix, it follows from equation {eq}`eq:tildeXdef2` that we can reconstruct  $X_t$ from $\tilde X_t$ by using 
-
-$$
-X_t = U \tilde X_t .
-$$ (eq:Xdecoder)
-
-
- * Equation {eq}`eq:tildeXdef2` serves as an **encoder** that  summarizes the $m \times 1$ vector $X_t$ by a $p \times 1$ vector $\tilde X_t$ 
-  
- * Equation {eq}`eq:Xdecoder` serves as a **decoder** that recovers the $m \times 1$ vector $X_t$ from the $p \times 1$ vector $\tilde X_t$ 
-
-
-
-Because $U^T U = I_p$, we have
-
-$$
-\tilde X_{t+1} = \tilde A \tilde X_t 
-$$ (eq:xtildemotion)
-
-Notice that if we multiply both sides of {eq}`eq:xtildemotion` by $U$ 
-we get
-
-$$
-U \tilde X_{t+1} = U \tilde A \tilde X_t =  U \tilde A U^T X_t 
-$$
-
-which by virtue of decoder equation {eq}`eq:xtildemotion` recovers
-
-$$
-X_{t+1} = A X_t .
-$$
-
-
-
-
-
-  
-It is useful to  construct an eigencomposition of the $p \times p$ transition matrix  $\tilde A$ defined 
-in equation in {eq}`eq:Atilde0` above:  
-
-$$ 
-  \tilde A W =  W \Lambda
-$$ (eq:tildeAeigen)
-  
-where $\Lambda$ is a $r \times r$ diagonal matrix of eigenvalues and the columns of $W$ are corresponding eigenvectors
-of $\tilde A$.   
-
-Both $\Lambda$ and $W$ are $p \times p$ matrices.
-  
-Construct the $m \times p$ matrix
-
-$$
-  \Phi = X'  V  \Sigma^{-1} W
-$$ (eq:Phiformula)
-
-
-  
-Tu et al. {cite}`tu_Rowley` established the following  
-
-**Proposition** The $r$ columns of $\Phi$ are eigenvectors of $A$ that correspond to the largest $r$ eigenvalues of $A$. 
-
-**Proof:** From formula {eq}`eq:Phiformula` we have
-
-$$  
-\begin{aligned}
-  A \Phi & =  (X' V \Sigma^{-1} U^T) (X' V \Sigma^{-1} W) \cr
-  & = X' V \Sigma^{-1} \tilde A W \cr
-  & = X' V \Sigma^{-1} W \Lambda \cr
-  & = \Phi \Lambda 
-  \end{aligned}
-$$ 
-
-Thus, we  have deduced  that
-
-$$  
-A \Phi = \Phi \Lambda
-$$ (eq:APhiLambda)
-
-Let $\phi_i$ be the the $i$the column of $\Phi$ and $\lambda_i$ be the corresponding $i$ eigenvalue of $\tilde A$ from decomposition {eq}`eq:tildeAeigen`. 
-
-Writing out the $m \times p$ vectors on both sides of  equation {eq}`eq:APhiLambda` and equating them gives
-
-
-$$
-A \phi_i = \lambda_i \phi_i .
-$$
-
-Thus, $\phi_i$ is an eigenvector of $A$ that corresponds to eigenvalue  $\lambda_i$ of $A$.
-
-This concludes the proof. 
-
-
-Also see {cite}`DDSE_book` (p. 238)
-
-
-### Two Representations of $A$
-
-We  have constructed  two representations of (or approximations to) $A$.
-
-One from equation {eq}`eq:Afactortilde` is 
-
-$$ 
-A = U \tilde A U^T  
-$$ (eq:Aform11)
-
-while from equation the eigen decomposition {eq}`eq:APhiLambda` the other  is 
-
-$$ 
-A = \Phi \Lambda \Phi^+ 
-$$ (eq:Aform12)
-
-
-From formula {eq}`eq:Aform11` we can deduce 
-
-$$
-\tilde X_{t+1}  = \tilde A \tilde X_t 
-$$
-
-where 
-
-$$
-\begin{aligned}
-\tilde X_t & = U^T X_t \cr
-X_t & = U \tilde X_t
-\end{aligned}
-$$
-
-
-From formula {eq}`eq:Aform12` we can deduce 
-
-$$ 
-b_{t+1} = \Lambda b_t 
-$$
-
-where
-
-$$
-\begin{aligned}
-b_t & = \Phi^+ X_t \cr 
-X_t & = \Phi b_t 
-\end{aligned}
-$$
-
-
-There is better formula for the $p \times 1$ vector $b_t$
-
-In particular, the following argument from {cite}`DDSE_book` (page 240) provides a computationally efficient way
-to compute $b_t$.  
-
-For convenience, we'll do this first for time $t=1$.
-
-
-
-For $t=1$, we have  
-
-$$ 
-   X_1 = \Phi b_1
-$$ (eq:X1proj)
-
-where $b_1$ is a $p \times 1$ vector. 
-
-Since $X_1 =  U \tilde X_1$, it follows that 
- 
-$$ 
-  U \tilde X_1 = X' V \Sigma^{-1} W b_1
-$$
-
-and
-
-$$ 
-  \tilde X_1 = U^T X' V \Sigma^{-1} W b_1
-$$
-
-Recall  that $ \tilde A = U^T X' V \Sigma^{-1}$ so that
-  
-$$ 
-  \tilde X_1 = \tilde A W b_1
-$$
-
-and therefore, by the eigendecomposition  {eq}`eq:tildeAeigen` of $\tilde A$, we have
-
-$$ 
-  \tilde X_1 = W \Lambda b_1
-$$ 
-
-Therefore, 
-  
-$$ 
-  b_1 = ( W \Lambda)^{-1} \tilde X_1
-$$ 
-
-or 
-
-
-$$ 
-  b_1 = ( W \Lambda)^{-1} U^T X_1
-$$ (eq:beqnsmall)
-
-
-
-which is  computationally more efficient than the following instance of our earlier equation for computing the initial vector $b_1$:
-
-$$
-  b_1= \Phi^{+} X_1
-$$ (eq:bphieqn)
-
-
-Conditional on $X_t$, we can construct forecasts $\check X_{t+j} $ of $X_{t+j}, j = 1, 2, \ldots, $  from 
-either 
-
-$$
-\check X_{t+j} = \Phi \Lambda^j \Phi^{+} X_t
-$$ (eq:checkXevoln)
-
-
-or  the following equation 
-
-$$ 
-  \check X_{t+j} = \Phi \Lambda^j (W \Lambda)^{-1}  U^T X_t
-$$ (eq:checkXevoln2)
-
-
-
-### Using Fewer Modes
-
-The preceding formulas assume that we have retained all $p$ modes associated with the positive
-singular values of $X$.  
-
-We can easily adapt all of the formulas to describe a situation in which we instead retain only
-the $r < p$ largest singular values.  
-
-In that case, we simply replace $\Sigma$ with the appropriate $r \times r$ matrix of singular values,
-$U$ with the $m \times r$ matrix of whose columns correspond to the $r$ largest singular values,
-and $V$ with the $\tilde n \times r$ matrix whose columns correspond to the $r$ largest  singular values.
-
-Counterparts of all of the salient formulas above then apply.
-
-
-
-
-
-## Reduced-order VAR
-
-DMD  is a natural tool for estimating a **reduced order vector autoregression**,
-an object that we define in terms of the population regression equation
-
-$$
-X_{t+1} = \check A X_t + C \epsilon_{t+1}
-$$ (eq:VARred)
-
-where 
-
-* $X_t$ is an $m \times 1$ vector
-* $\check A$ is an $m \times m$ matrix of rank $r$ whose eigenvalues are all less than $1$ in modulus
-* $\epsilon_{t+1} \sim {\mathcal N}(0, I)$ is an $m \times 1$ vector of i.i.d. shocks
-* $E \epsilon_{t+1} X_t^T = 0$, so that all shocks are orthogonal to all regressors
-
-To link this model to a dynamic mode decomposition (DMD), again take
-
-$$ 
-X = [ X_1 \mid X_2 \mid \cdots \mid X_{n-1} ]
-$$
-
-$$
-X' =  [ X_2 \mid X_3 \mid \cdots \mid X_n ]
-$$
-
-so that according to  model {eq}`eq:VARred` 
-
-
-$$
-X' = \begin{bmatrix} \check A X_1 + C \epsilon_2  \mid \check A X_2 + C \epsilon_3 \mid \cdots \mid \check A X_{n-1} +  C 
-\epsilon_n \end{bmatrix}
-$$
-
-To illustrate some useful calculations, assume that $n =3 $ and form
-
-$$
-X' X^T = \begin{bmatrix} \check A X_1 + C \epsilon_2  &  \check A X_2 + C \epsilon_3 \end{bmatrix} 
-   \begin{bmatrix} X_1^T \cr X_2^T \end{bmatrix} 
-$$
-
-or 
-
-$$
-X' X^T = \check A ( X_1 X_1^T + X_2 X_2^T) + C( \epsilon_2 X_1^T + \epsilon_3 X_2^T) 
-$$
-
-but because 
-
-$$
-E ( \epsilon_2 X_1^T + \epsilon_3 X_2^T)  = 0 
-$$
-
-we have
-
-$$
-X' X^T = \check A ( X_1 X_1^T + X_2 X_2^T)
-$$
-
-Evidently,
-
-$$
-X X^T = ( X_1 X_1^T + X_2 X_2^T)
-$$
-
-so that our  matrix  $\check A$ of least squares regression coefficients is
-
-$$
-\check A = (X' X^T)  (X X^T)^+
-$$
-
-Our **assumption** that $\check A$ is a matrix of rank $r$ leads us to represent it as
-
-$$
-\check A = \Phi \Lambda \Phi^{+}
-$$
-
-where $\Phi$ and $\Lambda$ are computed with the DMD algorithm described above.
-
-Associated with the VAR representation {eq}`eq:VARred`
-is the usual moving average representation
-
-$$
-X_{t+j} = \check A^j X_t + C \epsilon_{t+j} + \check A C \epsilon_{t+j-1} + \cdots \check A^{j-1} \epsilon_{t+1}
-$$
-
-After computing $\check A$, we can construct sample versions
-of
-
-$$ 
-C \epsilon_{t+1} = X_{t+1} - \check A X_t , \quad t =1, \ldots, n-1
-$$
-
-and check whether they are serially uncorrelated as assumed in {eq}`eq:VARred`.
-
-For example, we can compute spectra and cross-spectra of components of $C \epsilon_{t+1}$
-and check for serial-uncorrelatedness in the usual ways.
-
-We can also estimate the covariance matrix of $C \epsilon_{t+1}$
-from
-
-$$
-\frac{1}{n-1} \sum_{t=1}^{n-1} (C \epsilon_{t+1} )( C \epsilon_{t+1})^T 
-$$
-
-It can be enlightening to diagonize  our reduced order VAR {eq}`eq:VARred` by noting that it can 
-be written
- 
-
-$$
-X_{t+1} = \Phi \Lambda \Phi^{+} X_t + C \epsilon_{t+1}
-$$
-
-
-and then writing it as 
-
-$$
-\Phi^+ X_{t+1} = \Lambda  \Phi^{+} X_t +  \Phi^+ C \epsilon_{t+1}
-$$
-
-or
-
-$$
-\bar X_{t+1} = \Lambda \bar X_t + \bar \epsilon_{t+1} 
-$$ (eq:VARmodes)
-
-where $\bar X_t $ is an $r \times 1$ **mode** and $\bar \epsilon_{t+1}$ is an $r \times 1$
-shock.
-
-The $r$ modes $\bar X_t$ obey the  first-order VAR {eq}`eq:VARmodes` in which $\Lambda$ is an $r \times r$ diagonal matrix.  
-
-Note that while $\Lambda$ is diagonal, the contemporaneous covariance matrix of $\bar \epsilon_{t+1}$ need not be.
-
-
-**Remark:** It is permissible for $X_t$ to contain lagged values of  observables.
-
- For example, we might have a setting in which 
-
-$$
-X_t = \begin{bmatrix}
-y_{1t} \cr
-y_{1,t-1} \cr
-\vdots \cr
-y_{1, t-k}\cr
-y_{2,t} \cr
-y_{2, t-1} \cr
-\vdots
-\end{bmatrix}
-$$
-
-+++
diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 0ba2eddd3..82f5cc7ec 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -1,7 +1,584 @@
-###
-QuantEcon Bibliography File used in conjuction with sphinxcontrib-bibtex package
-Note: Extended Information (like abstracts, doi, url's etc.) can be found in quant-econ-extendedinfo.bib file in _static/
-###
+@article{Borovicka2020,
+  author    = {Borovička, Jaroslav},
+  title     = {Survival and Long-Run Dynamics with Heterogeneous Beliefs under Recursive Preferences},
+  journal   = {Journal of Political Economy},
+  volume    = {128},
+  number    = {1},
+  pages     = {206--251},
+  year      = {2020},
+  publisher = {University of Chicago Press}
+}
+
+@article{Sandroni2000Markets,
+  author    = {Sandroni, Alvaro},
+  title     = {Do Markets Favor Agents Able to Make Accurate Predictions?},
+  journal   = {Econometrica},
+  volume    = {68},
+  number    = {6},
+  pages     = {1303--1341},
+  year      = {2000}
+}
+
+@article{Blume_Easley2006,
+  author    = {Blume, Lawrence and Easley, David},
+  title     = {If You're So Smart, Why Aren't You Rich? {B}elief Selection in Complete and Incomplete Markets},
+  journal   = {Econometrica},
+  volume    = {74},
+  number    = {4},
+  pages     = {929--966},
+  year      = {2006}
+}
+
+@article{Epstein_Zin1989,
+  author    = {Epstein, Larry G. and Zin, Stanley E.},
+  title     = {Substitution, Risk Aversion, and the Temporal Behavior of Consumption and Asset Returns: A Theoretical Framework},
+  journal   = {Econometrica},
+  volume    = {57},
+  number    = {4},
+  pages     = {937--969},
+  year      = {1989}
+}
+
+@article{Epstein_Zin1991,
+  author    = {Epstein, Larry G. and Zin, Stanley E.},
+  title     = {Substitution, Risk Aversion, and the Temporal Behavior of Consumption and Asset Returns: An Empirical Analysis},
+  journal   = {Journal of Political Economy},
+  volume    = {99},
+  number    = {2},
+  pages     = {263--286},
+  year      = {1991}
+}
+
+@article{Duffie_Epstein1992a,
+  author    = {Duffie, Darrell and Epstein, Larry G.},
+  title     = {Stochastic Differential Utility},
+  journal   = {Econometrica},
+  volume    = {60},
+  number    = {2},
+  pages     = {353--394},
+  year      = {1992}
+}
+
+@article{Dumas_Uppal_Wang2000,
+  author    = {Dumas, Bernard and Uppal, Raman and Wang, Tan},
+  title     = {Efficient Intertemporal Allocations with Recursive Utility},
+  journal   = {Journal of Economic Theory},
+  volume    = {93},
+  number    = {2},
+  pages     = {154--183},
+  year      = {2000}
+}
+
+@article{DeLong_etal1991,
+  author    = {De Long, J. Bradford and Shleifer, Andrei and Summers, Lawrence H. and Waldmann, Robert J.},
+  title     = {The Survival of Noise Traders in Financial Markets},
+  journal   = {Journal of Business},
+  volume    = {64},
+  number    = {1},
+  pages     = {1--19},
+  year      = {1991}
+}
+
+@article{Blume_Easley1992,
+  author    = {Blume, Lawrence and Easley, David},
+  title     = {Evolution and Market Behavior},
+  journal   = {Journal of Economic Theory},
+  volume    = {58},
+  number    = {1},
+  pages     = {9--40},
+  year      = {1992}
+}
+
+@article{Yan2008,
+  author    = {Yan, Hongjun},
+  title     = {Natural Selection in Financial Markets: Does It Work?},
+  journal   = {Management Science},
+  volume    = {54},
+  number    = {11},
+  pages     = {1935--1950},
+  year      = {2008}
+}
+
+@article{Kogan_etal2006,
+  author    = {Kogan, Leonid and Ross, Stephen A. and Wang, Jiang and Westerfield, Mark M.},
+  title     = {The Price Impact and Survival of Irrational Traders},
+  journal   = {Journal of Finance},
+  volume    = {61},
+  number    = {1},
+  pages     = {195--229},
+  year      = {2006}
+}
+
+@article{Kogan_etal2017,
+  author    = {Kogan, Leonid and Ross, Stephen A. and Wang, Jiang and Westerfield, Mark M.},
+  title     = {Market Selection},
+  journal   = {Journal of Economic Theory},
+  volume    = {168},
+  pages     = {209--236},
+  year      = {2017}
+}
+
+@article{Kreps_Porteus1978,
+  author    = {Kreps, David M. and Porteus, Evan L.},
+  title     = {Temporal Resolution of Uncertainty and Dynamic Choice Theory},
+  journal   = {Econometrica},
+  volume    = {46},
+  number    = {1},
+  pages     = {185--200},
+  year      = {1978}
+}
+
+@article{Lucas_Stokey1984,
+  author    = {Lucas, Robert E. and Stokey, Nancy L.},
+  title     = {Optimal Growth with Many Consumers},
+  journal   = {Journal of Economic Theory},
+  volume    = {32},
+  number    = {1},
+  pages     = {139--171},
+  year      = {1984}
+}
+
+@book{Karlin_Taylor1981,
+  author    = {Karlin, Samuel and Taylor, Howard M.},
+  title     = {A Second Course in Stochastic Processes},
+  publisher = {Academic Press},
+  year      = {1981}
+}
+
+@article{Brunnermeier_etal2014,
+  author    = {Brunnermeier, Markus K. and Simsek, Alp and Xiong, Wei},
+  title     = {A Welfare Criterion for Models with Distorted Beliefs},
+  journal   = {Quarterly Journal of Economics},
+  volume    = {129},
+  number    = {4},
+  pages     = {1753--1797},
+  year      = {2014}
+}
+
+@article{Feller1952,
+  author    = {Feller, William},
+  title     = {The Parabolic Differential Equations and the Associated Semi-Groups of Transformations},
+  journal   = {Annals of Mathematics},
+  volume    = {55},
+  number    = {3},
+  pages     = {468--519},
+  year      = {1952}
+}
+
+@article{Geoffard1996,
+  author    = {Geoffard, Pierre-Yves},
+  title     = {Discounting and Optimizing: Capital Accumulation Problems as Variational Minmax Problems},
+  journal   = {Journal of Economic Theory},
+  volume    = {69},
+  number    = {1},
+  pages     = {53--70},
+  year      = {1996}
+}
+
+@article{Garleanu_Panageas2015,
+  author    = {Gârleanu, Nicolae and Panageas, Stavros},
+  title     = {Young, Old, Conservative, and Bold: The Implications of Heterogeneity and Finite Lives for Asset Pricing},
+  journal   = {Journal of Political Economy},
+  volume    = {123},
+  number    = {3},
+  pages     = {670--685},
+  year      = {2015}
+}
+
+@article{Negishi1960,
+  author    = {Negishi, Takashi},
+  title     = {Welfare Economics and Existence of an Equilibrium for a Competitive Economy},
+  journal   = {Metroeconomica},
+  volume    = {12},
+  number    = {2--3},
+  pages     = {92--97},
+  year      = {1960}
+}
+
+@article{MillerSanchirico1999,
+  author    = {Miller, Ronald I. and Sanchirico, Chris William},
+  title     = {The Role of Absolute Continuity in
+               ``{Merging} of {Opinions}'' and
+               ``{Rational} {Learning}''},
+  journal   = {Games and Economic Behavior},
+  year      = {1999},
+  volume    = {29},
+  number    = {1--2},
+  pages     = {170--190},
+  doi       = {10.1006/game.1999.0752}
+}
+
+@article{JacksonKalaiSmorodinsky1999,
+  author    = {Jackson, Matthew O. and Kalai, Ehud and
+               Smorodinsky, Rann},
+  title     = {Bayesian Representation of Stochastic Processes
+               under Learning: {de Finetti} Revisited},
+  journal   = {Econometrica},
+  year      = {1999},
+  volume    = {67},
+  number    = {4},
+  pages     = {875--893},
+  doi       = {10.1111/1468-0262.00055}
+}
+
+@article{KalaiLehrer1993Nash,
+  author    = {Kalai, Ehud and Lehrer, Ehud},
+  title     = {Rational Learning Leads to {Nash} Equilibrium},
+  journal   = {Econometrica},
+  year      = {1993},
+  volume    = {61},
+  number    = {5},
+  pages     = {1019--1045},
+  doi       = {10.2307/2951492}
+}
+
+@article{KalaiLehrer1993Subjective,
+  author    = {Kalai, Ehud and Lehrer, Ehud},
+  title     = {Subjective Equilibrium in Repeated Games},
+  journal   = {Econometrica},
+  year      = {1993},
+  volume    = {61},
+  number    = {5},
+  pages     = {1231--1240},
+  doi       = {10.2307/2951500}
+}
+
+@article{KalaiLehrer1994Merging,
+  author    = {Kalai, Ehud and Lehrer, Ehud},
+  title     = {Weak and Strong Merging of Opinions},
+  journal   = {Journal of Mathematical Economics},
+  year      = {1994},
+  volume    = {23},
+  number    = {1},
+  pages     = {73--86},
+  doi       = {10.1016/0304-4068(94)90037-X}
+}
+
+@article{KalaiLehrerSmorodinsky1999,
+  author    = {Kalai, Ehud and Lehrer, Ehud and Smorodinsky, Rann},
+  title     = {Calibrated Forecasting and Merging},
+  journal   = {Games and Economic Behavior},
+  year      = {1999},
+  volume    = {29},
+  number    = {1--2},
+  pages     = {151--169},
+  doi       = {10.1006/game.1998.0608}
+}
+
+@article{Sandroni1998Nash,
+  author    = {Sandroni, Alvaro},
+  title     = {Necessary and Sufficient Conditions for
+               Convergence to {Nash} Equilibrium:
+               The Almost Absolute Continuity Hypothesis},
+  journal   = {Games and Economic Behavior},
+  year      = {1998},
+  volume    = {22},
+  number    = {1},
+  pages     = {121--147},
+  doi       = {10.1006/game.1997.0572}
+}
+
+@article{PomattoAlNajjarSandroni2014,
+  author    = {Pomatto, Luciano and Al-Najjar, Nabil I. and
+               Sandroni, Alvaro},
+  title     = {Merging and Testing Opinions},
+  journal   = {The Annals of Statistics},
+  year      = {2014},
+  volume    = {42},
+  number    = {3},
+  pages     = {1003--1028},
+  doi       = {10.1214/14-AOS1212}
+}
+
+@article{LehrerSmorodinsky1996Compatible,
+  author    = {Lehrer, Ehud and Smorodinsky, Rann},
+  title     = {Compatible Measures and Merging},
+  journal   = {Mathematics of Operations Research},
+  year      = {1996},
+  volume    = {21},
+  number    = {3},
+  pages     = {697--706},
+  doi       = {10.1287/moor.21.3.697}
+}
+
+@incollection{LehrerSmorodinsky1996Learning,
+  author    = {Lehrer, Ehud and Smorodinsky, Rann},
+  title     = {Merging and Learning},
+  booktitle = {Statistics, Probability and Game Theory:
+               Papers in Honor of {David Blackwell}},
+  editor    = {Ferguson, Thomas S. and Shapley, Lloyd S. and
+               MacQueen, James B.},
+  series    = {{IMS} Lecture Notes -- Monograph Series},
+  volume    = {30},
+  pages     = {147--168},
+  publisher = {Institute of Mathematical Statistics},
+  address   = {Hayward, CA},
+  year      = {1996},
+  doi       = {10.1214/lnms/1215453571}
+}
+
+@article{Nyarko1994,
+  author    = {Nyarko, Yaw},
+  title     = {Bayesian Learning Leads to Correlated Equilibria
+               in Normal Form Games},
+  journal   = {Economic Theory},
+  year      = {1994},
+  volume    = {4},
+  number    = {6},
+  pages     = {821--841},
+  doi       = {10.1007/BF01213814}
+}
+
+@article{JacksonKalai1999,
+  author    = {Jackson, Matthew O. and Kalai, Ehud},
+  title     = {Reputation versus Social Learning},
+  journal   = {Journal of Economic Theory},
+  year      = {1999},
+  volume    = {88},
+  number    = {1},
+  pages     = {40--59},
+  doi       = {10.1006/jeth.1999.2538}
+}
+
+@article{AcemogluChernozhukovYildiz2016,
+  author    = {Acemoglu, Daron and Chernozhukov, Victor and
+               Yildiz, Muhamet},
+  title     = {Fragility of Asymptotic Agreement under
+               {Bayesian} Learning},
+  journal   = {Theoretical Economics},
+  year      = {2016},
+  volume    = {11},
+  number    = {1},
+  pages     = {187--225},
+  doi       = {10.3982/TE436}
+}
+
+@article{DiaconisFreedman1986,
+  author    = {Diaconis, Persi and Freedman, David},
+  title     = {On the Consistency of {Bayes} Estimates},
+  journal   = {The Annals of Statistics},
+  year      = {1986},
+  volume    = {14},
+  number    = {1},
+  pages     = {1--26},
+  doi       = {10.1214/aos/1176349830}
+}
+
+@article{lucas1967adjustment,
+  title={Adjustment costs and the theory of supply},
+  author={Lucas Jr, Robert E},
+  journal={Journal of political economy},
+  volume={75},
+  number={4, Part 1},
+  pages={321--334},
+  year={1967},
+  publisher={The University of Chicago Press}
+}
+
+@article{Prescott_Visscher_1980,
+  author    = {Prescott, Edward C. and Visscher, Michael},
+  title     = {Organization Capital},
+  journal   = {Journal of Political Economy},
+  volume    = {88},
+  number    = {3},
+  pages     = {446--461},
+  year      = {1980},
+  publisher = {University of Chicago Press}
+}
+
+@article{Coase_1937,
+  author    = {Coase, Ronald H.},
+  title     = {The Nature of the Firm},
+  journal   = {Economica},
+  volume    = {4},
+  number    = {16},
+  pages     = {386--405},
+  year      = {1937}
+}
+
+@book{Williamson_1975,
+  author    = {Williamson, Oliver E.},
+  title     = {Markets and Hierarchies: Analysis and Antitrust Implications},
+  publisher = {Free Press},
+  address   = {New York},
+  year      = {1975}
+}
+
+@article{Lucas_Prescott_1971,
+  author    = {Lucas, Robert E., Jr. and Prescott, Edward C.},
+  title     = {Investment under Uncertainty},
+  journal   = {Econometrica},
+  volume    = {39},
+  number    = {5},
+  pages     = {659--681},
+  year      = {1971}
+}
+
+@article{Stigler_1958,
+  author    = {Stigler, George J.},
+  title     = {The Economies of Scale},
+  journal   = {Journal of Law and Economics},
+  volume    = {1},
+  pages     = {54--71},
+  year      = {1958}
+}
+
+@book{Becker_1975,
+  author    = {Becker, Gary S.},
+  title     = {Human Capital: A Theoretical and Empirical Analysis, with Special Reference to Education},
+  edition   = {2nd},
+  publisher = {National Bureau of Economic Research},
+  address   = {New York},
+  year      = {1975}
+}
+
+@article{Mansfield_1962,
+  author    = {Mansfield, Edwin},
+  title     = {Entry, {G}ibrat's Law, Innovation, and the Growth of Firms},
+  journal   = {American Economic Review},
+  volume    = {52},
+  number    = {5},
+  pages     = {1023--1051},
+  year      = {1962}
+}
+
+@article{Hymer_Pashigian_1962,
+  author    = {Hymer, Stephen and Pashigian, Peter},
+  title     = {Firm Size and Rate of Growth},
+  journal   = {Journal of Political Economy},
+  volume    = {70},
+  number    = {6},
+  pages     = {556--569},
+  year      = {1962}
+}
+
+@article{blackwell1962,
+  author    = {Blackwell, David and Dubins, Lester E.},
+  title     = {Merging of Opinions with Increasing Information},
+  journal   = {Annals of Mathematical Statistics},
+  year      = {1962},
+  volume    = {33},
+  number    = {3},
+  pages     = {882--886},
+}
+
+@article{aumann1976,
+  author    = {Aumann, Robert J.},
+  title     = {Agreeing to Disagree},
+  journal   = {Annals of Statistics},
+  year      = {1976},
+  volume    = {4},
+  number    = {6},
+  pages     = {1236--1239},
+}
+
+@book{doob1953,
+  author    = {Doob, Joseph L.},
+  title     = {Stochastic Processes},
+  publisher = {Wiley},
+  address   = {New York},
+  year      = {1953},
+}
+
+@article{kakutani1948,
+  author    = {Kakutani, Shizuo},
+  title     = {On Equivalence of Infinite Product Measures},
+  journal   = {Annals of Mathematics},
+  year      = {1948},
+  volume    = {49},
+  number    = {1},
+  pages     = {214--224},
+}
+
+@article{girsanov1960,
+  author    = {Girsanov, Igor V.},
+  title     = {On Transforming a Certain Class of Stochastic Processes
+               by Absolutely Continuous Substitution of Measures},
+  journal   = {Theory of Probability and Its Applications},
+  year      = {1960},
+  volume    = {5},
+  number    = {3},
+  pages     = {285--301},
+}
+
+@article{novikov1972,
+  author    = {Novikov, Alexander A.},
+  title     = {On an Identity for Stochastic Integrals},
+  journal   = {Theory of Probability and Its Applications},
+  year      = {1972},
+  volume    = {17},
+  number    = {4},
+  pages     = {717--720},
+}
+
+@inproceedings{blackwell1951,
+  author    = {Blackwell, David},
+  title     = {Comparison of Experiments},
+  booktitle = {Proceedings of the Second {Berkeley} Symposium on Mathematical
+               Statistics and Probability},
+  editor    = {Neyman, Jerzy},
+  pages     = {93--102},
+  year      = {1951},
+  publisher = {University of California Press},
+  address   = {Berkeley, CA}
+}
+
+@article{blackwell1953,
+  author    = {Blackwell, David},
+  title     = {Equivalent Comparisons of Experiments},
+  journal   = {Annals of Mathematical Statistics},
+  volume    = {24},
+  number    = {2},
+  pages     = {265--272},
+  year      = {1953},
+  doi       = {10.1214/aoms/1177729032}
+}
+
+@techreport{bonnenblust1949,
+  author      = {Bohnenblust, H. F. and Shapley, Lloyd S. and Sherman, Seymour},
+  title       = {Reconnaissance in Game Theory},
+  institution = {The RAND Corporation},
+  number      = {RM-208},
+  year        = {1949},
+  address     = {Santa Monica, CA},
+  note        = {Cited for the economic criterion for comparing experiments}
+}
+
+@article{degroot1962,
+  author    = {{DeGroot}, Morris H.},
+  title     = {Uncertainty, Information, and Sequential Experiments},
+  journal   = {Annals of Mathematical Statistics},
+  volume    = {33},
+  number    = {2},
+  pages     = {404--419},
+  year      = {1962},
+  doi       = {10.1214/aoms/1177704567}
+}
+
+@incollection{kihlstrom1984,
+  author    = {Kihlstrom, Richard E.},
+  title     = {A {Bayesian} Exposition of {Blackwell}'s Theorem on the
+               Comparison of Experiments},
+  booktitle = {Bayesian Models in Economic Theory},
+  editor    = {Boyer, Marcel and Kihlstrom, Richard E.},
+  series    = {Studies in Bayesian Econometrics},
+  volume    = {5},
+  pages     = {13--31},
+  year      = {1984},
+  publisher = {North-Holland},
+  address   = {Amsterdam}
+}
+
+@article{kihlstrom1974a,
+  author    = {Kihlstrom, Richard E.},
+  title     = {A General Theory of Demand for Information about Product Quality},
+  journal   = {Journal of Economic Theory},
+  volume    = {8},
+  number    = {4},
+  pages     = {413--439},
+  year      = {1974},
+  doi       = {10.1016/0022-0531(74)90019-2}
+}
 
 @inproceedings{hansen2004certainty,
   title={Certainty equivalence and model uncertainty},
@@ -10,7 +587,6 @@ @inproceedings{hansen2004certainty
   year={2004}
 }
 
-
 @article{evans2005interview,
   title={An interview with Thomas J. Sargent},
   author={Evans, George W and Honkapohja, Seppo},
@@ -177,17 +753,6 @@ @article{alchian1950uncertainty
 }
 
 
-@article{blume2006if,
-  title={If you're so smart, why aren't you rich? Belief selection in complete and incomplete markets},
-  author={Blume, Lawrence and Easley, David},
-  journal={Econometrica},
-  volume={74},
-  number={4},
-  pages={929--966},
-  year={2006},
-  publisher={Wiley Online Library}
-}
-
 @article{mendoza1998international,
   title={The international ramifications of tax reforms: supply-side economics in a global economy},
   author={Mendoza, Enrique G and Tesar, Linda L},
@@ -197,7 +762,6 @@ @article{mendoza1998international
   publisher={JSTOR}
 }
 
-
 @book{intriligator2002mathematical,
   title={Mathematical optimization and economic theory},
   author={Intriligator, Michael D},
@@ -246,7 +810,6 @@ @article{Orcutt_Winokur_69
   year      = {1969}
 }
 
-
 @incollection{Hurwicz:1962,
  address = {Stanford, CA},
  author = {Hurwicz, Leonid},
@@ -259,7 +822,6 @@ @incollection{Hurwicz:1962
  year = {1962}
 }
 
-
 @article{Hurwicz:1966,
    abstract = {Publisher Summary This chapter concentrates on the structural form of interdependent systems. A great deal of effort is devoted in econometrics and elsewhere to find the behavior pattern of an observed configuration. Such effort is justified on the grounds that the knowledge of the behavior pattern is needed for the purpose of giving explanation or prediction. The merits of this justification are also examined in the chapter. At this point, the chapter considers certain difficulties encountered in the process of looking for the behavior patterns. In certain fields, notably economics (but also— for example, electronic network theory), it deals with a set (configuration) of objects (components) that are interdependent in their behavior. For purposes of both theoretical analysis and empirical investigation of such situations, the phenomena are often described in the chapter (in idealized form) by means of a system of simultaneous equations. History alone is not enabled to determine the behavior pattern of the configuration; but this does not mean that the task is hopeless. The priori information is obtained from the axiom systems or theories that are believed to be relevant to the behavior pattern of the configuration.},
    author = {Leonid Hurwicz},
@@ -274,7 +836,6 @@ @article{Hurwicz:1966
    year = {1966},
 }
 
-
 @article{hurwicz1950least,
   title   = {Least squares bias in time series},
   author  = {Hurwicz, Leonid},
@@ -312,7 +873,6 @@ @article{warner1965randomized
   publisher = {Taylor \& Francis}
 }
 
-
 @article{ljungqvist1993unified,
   title     = {A unified approach to measures of privacy in randomized response models: A utilitarian perspective},
   author    = {Ljungqvist, Lars},
@@ -333,7 +893,6 @@ @article{lanke1976degree
   publisher = {JSTOR}
 }
 
-
 @article{leysieffer1976respondent,
   title     = {Respondent jeopardy and optimal designs in randomized response models},
   author    = {Leysieffer, Frederick W and Warner, Stanley L},
@@ -345,7 +904,6 @@ @article{leysieffer1976respondent
   publisher = {Taylor \& Francis}
 }
 
-
 @article{anderson1976estimation,
   title     = {Estimation of a proportion through randomized response},
   author    = {Anderson, Harald},
@@ -377,7 +935,6 @@ @article{greenberg1977respondent
   publisher = {Elsevier}
 }
 
-
 @article{greenberg1969unrelated,
   title     = {The unrelated question randomized response model: Theoretical framework},
   author    = {Greenberg, Bernard G and Abul-Ela, Abdel-Latif A and Simmons, Walt R and Horvitz, Daniel G},
@@ -400,8 +957,6 @@ @article{lanke1975choice
   publisher = {Taylor \& Francis}
 }
 
-
-
 @article{schmid2010,
   title     = {Dynamic mode decomposition of numerical and experimental data},
   author    = {Schmid, Peter J},
@@ -412,7 +967,6 @@ @article{schmid2010
   publisher = {Cambridge University Press}
 }
 
-
 @article{apostolakis1990,
   title     = {The concept of probability in safety assessments of technological systems},
   author    = {Apostolakis, George},
@@ -424,7 +978,6 @@ @article{apostolakis1990
   publisher = {American Association for the Advancement of Science}
 }
 
-
 @unpublished{Greenfield_Sargent_1993,
   author = {Moses A Greenfield and Thomas J Sargent},
   title  = {A Probabilistic Analysis of a Catastrophic Transuranic Waste Hoise Accident at the WIPP},
@@ -444,8 +997,6 @@ @article{Ardron_2018
   year    = {2018}
 }
 
-
-
 @article{Groves_73,
   author  = {Groves, T.},
   year    = {1973},
@@ -473,9 +1024,6 @@ @article{Vickrey_61
   pages   = {8-37}
 }
 
-
-
-
 @article{Phelan_Townsend_91,
   author   = {Christopher Phelan and Robert M. Townsend},
   title    = {{Computing Multi-Period, Information-Constrained Optima}},
@@ -491,7 +1039,6 @@ @article{Phelan_Townsend_91
   url      = {https://ideas.repec.org/a/oup/restud/v58y1991i5p853-881..html}
 }
 
-
 @article{Spear_Srivastava_87,
   author   = {Stephen E. Spear and Sanjay Srivastava},
   title    = {{On Repeated Moral Hazard with Discounting}},
@@ -507,7 +1054,6 @@ @article{Spear_Srivastava_87
   url      = {https://ideas.repec.org/a/oup/restud/v54y1987i4p599-617..html}
 }
 
-
 @article{tu_Rowley,
   title   = {On dynamic mode decomposition: Theory and applications},
   author  = {Tu, J. H. and Rowley, C. W. and Luchtenburg, D. M. and Brunton, S. L. and Kutz, J. N.},
@@ -518,7 +1064,6 @@ @article{tu_Rowley
   pages   = {391--421}
 }
 
-
 @book{Knight:1921,
   author        = {Knight, Frank H.},
   date-added    = {2020-08-20 10:29:34 -0500},
@@ -529,7 +1074,6 @@ @book{Knight:1921
   year          = {1921}
 }
 
-
 @article{MaccheroniMarinacciRustichini:2006b,
   author        = {Maccheroni, Fabio and Marinacci, Massimo and Rustichini, Aldo},
   date-added    = {2021-05-19 08:04:27 -0500},
@@ -558,7 +1102,6 @@ @article{GilboaSchmeidler:1989
   year            = {1989}
 }
 
-
 @book{Sutton_2018,
   title={Reinforcement learning: An introduction},
   author={Sutton, Richard S and Barto, Andrew G},
@@ -581,7 +1124,6 @@ @article{AHS_2003
   url      = {https://ideas.repec.org/a/tpr/jeurec/v1y2003i1p68-123.html}
 }
 
-
 @article{BHS_2009,
   author   = {Barillas, Francisco and Hansen, Lars Peter and Sargent, Thomas J.},
   title    = {{Doubts or variability?}},
@@ -597,8 +1139,6 @@ @article{BHS_2009
   url      = {https://ideas.repec.org/a/eee/jetheo/v144y2009i6p2388-2418.html}
 }
 
-
-
 @article{HST_1999,
   author   = {Lars Peter Hansen and Thomas J. Sargent and Thomas D. Tallarini},
   title    = {{Robust Permanent Income and Pricing}},
@@ -614,7 +1154,6 @@ @article{HST_1999
   url      = {https://ideas.repec.org/a/oup/restud/v66y1999i4p873-907..html}
 }
 
-
 @article{simon1956dynamic,
   title={Dynamic programming under uncertainty with a quadratic criterion function},
   author={Simon, Herbert A},
@@ -643,9 +1182,6 @@ @article{Jacobson_73
   pages   = {124-131}
 }
 
-
-
-
  @book{Bucklew_2004,
   title     = {An Introduction  to Rare Event Simulation},
   author    = {James A. Bucklew},
@@ -654,8 +1190,6 @@ @book{Bucklew_2004
   year      = {2004}
 }
 
-
-
  @book{Whittle_1990,
   author    = {Peter Whittle},
   title     = {Risk-Sensitive Optimal Control},
@@ -664,7 +1198,6 @@ @book{Whittle_1990
   address   = {New York}
 }
 
-
  @article{Whittle_1981,
   author  = {Peter Whittle},
   year    = {1981},
@@ -706,7 +1239,6 @@ @book{DDSE_book
   address   = {New York}
 }
 
-
 @book{bertsimas_tsitsiklis1997,
   author    = {Bertsimas, D. & Tsitsiklis, J. N.},
   title     = {{Introduction to linear optimization}},
@@ -722,7 +1254,6 @@ @book{hu_guo2018
   year      = {2018}
 }
 
-
 @article{definetti,
   author        = {Bruno de Finetti},
   date-added    = {2014-12-26 17:45:57 +0000},
@@ -1117,13 +1648,6 @@ @article{rosen1994cattle
   publisher = {The University of Chicago Press}
 }
 
-@book{HS2013,
-  title     = {Recursive Linear Models of Dynamic Economics},
-  author    = {Hansen, Lars Peter and Thomas J. Sargent},
-  year      = {2013},
-  publisher = {Princeton University Press},
-  address   = {Princeton, New Jersey}
-}
 
 @article{Reffett1996,
   title     = {Production-based asset pricing in monetary economies with transactions costs},
@@ -1344,16 +1868,6 @@ @article{JuddYeltekinConklin2003
   url      = {https://ideas.repec.org/a/ecm/emetrp/v71y2003i4p1239-1254.html}
 }
 
-@book{kreps,
-  author        = {David M. Kreps},
-  date-added    = {2014-12-26 17:45:57 +0000},
-  date-modified = {2014-12-26 17:45:57 +0000},
-  publisher     = {Westview Press},
-  series        = {Underground Classics in Economics},
-  title         = {Notes on the Theory of Choice},
-  year          = {1988}
-}
-
 @book{Kreps88,
   title     = {Notes on the Theory of Choice},
   author    = {David M. Kreps},
@@ -1451,17 +1965,6 @@ @book{Lucas1987
   publisher = {Oxford Blackwell}
 }
 
-@article{hansen2009long,
-  title     = {Long-term risk: An operator approach},
-  author    = {Hansen, Lars Peter and Scheinkman, Jos{\'e} A},
-  journal   = {Econometrica},
-  volume    = {77},
-  number    = {1},
-  pages     = {177--234},
-  year      = {2009},
-  publisher = {Wiley Online Library}
-}
-
 @article{Hans_Scheink_2009,
   author  = {Lars Peter Hansen and Jose A. Scheinkman},
   title   = {Long-Term Risk: An Operator Approach},
@@ -1473,13 +1976,6 @@ @article{Hans_Scheink_2009
   month   = {01}
 }
 
-@book{hansen2008robustness,
-  title     = {Robustness},
-  author    = {Hansen, Lars Peter and Sargent, Thomas J},
-  year      = {2008},
-  publisher = {Princeton university press}
-}
-
 @book{Whittle1963,
   title     = {Prediction and regulation by linear least-square methods},
   author    = {Whittle, Peter},
@@ -2006,7 +2502,6 @@ @article{hopenhayn1992entry
   publisher = {JSTOR}
 }
 
-
 @book{bacsar2008h,
   title={H-infinity optimal control and related minimax design problems: a dynamic game approach},
   author={Ba{\c{s}}ar, Tamer and Bernhard, Pierre},
@@ -2025,7 +2520,6 @@ @article{sargent1981interpreting
   publisher={The University of Chicago Press}
 }
 
-
 @inproceedings{lucas1976econometric,
   title={Econometric policy evaluation: A critique},
   author={Lucas, Robert E Jr},
@@ -2156,14 +2650,6 @@ @article{Lucas1978
   year    = {1978}
 }
 
-@article{LucasPrescott1971,
-  author  = {Lucas, Jr., Robert E and Prescott, Edward C},
-  journal = {Econometrica: Journal of the Econometric Society},
-  pages   = {659--681},
-  title   = {{Investment under uncertainty}},
-  year    = {1971}
-}
-
 @article{LucasStokey1983,
   author  = {Lucas, Jr., Robert E and Stokey, Nancy L},
   journal = {Journal of monetary Economics},
@@ -2403,17 +2889,6 @@ @article{Schelling1969
   year      = {1969}
 }
 
-@article{bansal2004risks,
-  title     = {Risks for the long run: A potential resolution of asset pricing puzzles},
-  author    = {Bansal, Ravi and Yaron, Amir},
-  journal   = {The journal of Finance},
-  volume    = {59},
-  number    = {4},
-  pages     = {1481--1509},
-  year      = {2004},
-  publisher = {Wiley Online Library}
-}
-
 @article{Bansal_Yaron_2004,
   author   = {Ravi Bansal and Amir Yaron},
   title    = {{Risks for the Long Run: A Potential Resolution of Asset Pricing Puzzles}},
@@ -2440,31 +2915,6 @@ @article{hansen2008consumption
   publisher = {The University of Chicago Press}
 }
 
-@article{HHL_2008,
-  author   = {Lars Peter Hansen and John C. Heaton and Nan Li},
-  title    = {{Consumption Strikes Back? Measuring Long-Run Risk}},
-  journal  = {Journal of Political Economy},
-  year     = 2008,
-  volume   = {116},
-  number   = {2},
-  pages    = {260-302},
-  month    = {04},
-  keywords = {},
-  doi      = {},
-  abstract = { We characterize and measure a long-term risk-return trade-off for the valuation of cash flows exposed to fluctuations in macroeconomic growth. This trade-off features risk prices of cash flows that are realized far into the future but continue to be reflected in asset values. We apply this analysis to claims on aggregate cash flows and to cash flows from value and growth portfolios by imputing values to the long-run dynamic responses of cash flows to macroeconomic shocks. We explore the sensitivity of our results to features of the economic valuation model and of the model cash flow dynamics. (c) 2008 by The University of Chicago. All rights reserved.},
-  url      = {https://ideas.repec.org/a/ucp/jpolec/v116y2008i2p260-302.html}
-}
-
-@article{hansen2007beliefs,
-  title   = {Beliefs, doubts and learning: Valuing macroeconomic risk},
-  author  = {Hansen, Lars Peter},
-  journal = {American Economic Review},
-  volume  = {97},
-  number  = {2},
-  pages   = {1--30},
-  year    = {2007}
-}
-
 @article{Hansen_2007,
   author   = {Lars Peter Hansen},
   title    = {{Beliefs, Doubts and Learning: Valuing Macroeconomic Risk}},
@@ -2480,16 +2930,6 @@ @article{Hansen_2007
   url      = {https://ideas.repec.org/a/aea/aecrev/v97y2007i2p1-30.html}
 }
 
-@article{lucas2003macroeconomic,
-  title   = {Macroeconomic priorities},
-  author  = {Lucas Jr, Robert E},
-  journal = {American economic review},
-  volume  = {93},
-  number  = {1},
-  pages   = {1--14},
-  year    = {2003}
-}
-
 @article{Lucas_2003,
   author   = {Lucas, Jr., Robert E},
   title    = {{Macroeconomic Priorities}},
@@ -2737,14 +3177,6 @@ @techreport{giannoni2010optimal
   institution = {National Bureau of Economic Research}
 }
 
-@article{miller1985dynamic,
-  title     = {Dynamic games and the time inconsistency of optimal policy in open economies},
-  author    = {Miller, Marcus and Salmon, Mark},
-  journal   = {The Economic Journal},
-  pages     = {124--137},
-  year      = {1985},
-  publisher = {JSTOR}
-}
 
 @article{pearlman1986rational,
   title     = {Rational expectations models with partial information},
@@ -2856,16 +3288,6 @@ @article{kikuchi2018span
   publisher = {Wiley Online Library}
 }
 
-@article{coase1937nature,
-  title     = {The nature of the firm},
-  author    = {Coase, Ronald Harry},
-  journal   = {economica},
-  volume    = {4},
-  number    = {16},
-  pages     = {386--405},
-  year      = {1937},
-  publisher = {Wiley Online Library}
-}
 
 @article{do1999solutions,
   title     = {Solutions for the linear-quadratic control problem of Markov jump linear systems},
@@ -3134,3 +3556,78 @@ @article{AngPiazzesi2003
   number  = {4},
   pages   = {745--787}
 }
+
+@article{csiszar1963,
+  author  = {Csisz{\'a}r, Imre},
+  title   = {{Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizit{\"a}t von Markoffschen Ketten}},
+  journal = {Magyar Tud. Akad. Mat. Kutat{\'o} Int. K{\"o}zl.},
+  year    = 1963,
+  volume  = {8},
+  pages   = {85--108}
+}
+
+@article{morimoto1963,
+  author  = {Morimoto, Tetsuzo},
+  title   = {{Markov Processes and the H-Theorem}},
+  journal = {Journal of the Physical Society of Japan},
+  year    = 1963,
+  volume  = {18},
+  number  = {3},
+  pages   = {328--331},
+  doi     = {10.1143/JPSJ.18.328}
+}
+
+@article{ali1966,
+  author  = {Ali, S. M. and Silvey, S. D.},
+  title   = {{A general class of coefficients of divergence of one distribution from another}},
+  journal = {Journal of the Royal Statistical Society, Series B},
+  year    = 1966,
+  volume  = {28},
+  number  = {1},
+  pages   = {131--142}
+}
+
+@article{liese2012,
+  author  = {Liese, Friedrich},
+  title   = {{phi-divergences, sufficiency, Bayes sufficiency, and deficiency}},
+  journal = {Kybernetika},
+  year    = 2012,
+  volume  = {48},
+  number  = {4},
+  pages   = {690--713}
+}
+
+@book{chentsov1981,
+  author    = {{\v{C}}encov, Nikolai N.},
+  title     = {{Statistical Decision Rules and Optimal Inference}},
+  series    = {Translations of Mathematical Monographs},
+  volume    = {53},
+  publisher = {American Mathematical Society},
+  address   = {Providence, RI},
+  year      = 1981
+}
+
+@book{amari_nagaoka2000,
+  author    = {Amari, Shun-ichi and Nagaoka, Hiroshi},
+  title     = {{Methods of Information Geometry}},
+  series    = {Translations of Mathematical Monographs},
+  volume    = {191},
+  publisher = {American Mathematical Society and Oxford University Press},
+  address   = {Providence, RI},
+  year      = 2000
+}
+
+@inproceedings{tishby_pereira_bialek1999,
+  author    = {Tishby, Naftali and Pereira, Fernando C. and Bialek, William},
+  title     = {{The Information Bottleneck Method}},
+  booktitle = {Proceedings of the 37th Annual Allerton Conference on Communication, Control, and Computing},
+  year      = 1999,
+  pages     = {368--377}
+}
+
+@article{shwartz_ziv_tishby2017,
+  author  = {Shwartz-Ziv, Ravid and Tishby, Naftali},
+  title   = {{Opening the Black Box of Deep Neural Networks via Information}},
+  journal = {arXiv preprint arXiv:1703.00810},
+  year    = 2017
+}
diff --git a/lectures/_toc.yml b/lectures/_toc.yml
index da19f0b03..28999d83f 100644
--- a/lectures/_toc.yml
+++ b/lectures/_toc.yml
@@ -42,8 +42,11 @@ parts:
   - file: wald_friedman_2
   - file: exchangeable
   - file: likelihood_bayes
+  - file: blackwell_kihlstrom
   - file: mix_model
   - file: navy_captain
+  - file: merging_of_opinions
+  - file: survival_recursive_preferences
 - caption: Linear Programming
   numbered: true
   chapters:
@@ -61,6 +64,7 @@ parts:
   - file: wealth_dynamics
   - file: kalman
   - file: kalman_2
+  - file: organization_capital
   - file: measurement_models
 - caption: Search
   numbered: true
diff --git a/lectures/affine_risk_prices.md b/lectures/affine_risk_prices.md
index e17231b32..9c3964b9f 100644
--- a/lectures/affine_risk_prices.md
+++ b/lectures/affine_risk_prices.md
@@ -60,7 +60,7 @@ Instead, it
   assets to let the data reveal risks and their prices.
 
 ```{note}
-Researchers including {cite}`bansal2004risks` and {cite}`hansen2008consumption` have been less willing
+Researchers including {cite}`Bansal_Yaron_2004` and {cite}`hansen2008consumption` have been less willing
 to give up on consumption-based models of the stochastic discount factor.
 ```
 
diff --git a/lectures/blackwell_kihlstrom.md b/lectures/blackwell_kihlstrom.md
new file mode 100644
index 000000000..ad1cfb9d3
--- /dev/null
+++ b/lectures/blackwell_kihlstrom.md
@@ -0,0 +1,1362 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.16.4
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+
+(blackwell_kihlstrom)=
+```{raw} jupyter
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# Blackwell's Theorem on Comparing Experiments
+
+```{contents} Contents
+:depth: 2
+```
+
+## Overview
+
+
+
+This lecture studies *Blackwell's theorem* {cite}`blackwell1951,blackwell1953` on ranking statistical experiments.
+
+Our presentation brings in findings from a Bayesian interpretation of Blackwell's theorem by  {cite:t}`kihlstrom1984`.
+
+Blackwell and Kihlstrom study statistical model-selection questions closely related to those encountered in this QuantEcon lecture {doc}`likelihood_bayes`. 
+
+To appreciate the connection involved, it is helpful to appreciate how Blackwell's notion of
+an **experiment** is related to the concept of a ''probability distribution'' or ''parameterized statistical model'' appearing in  {doc}`likelihood_bayes`  
+
+Blackwell studies a situation in which a decision maker wants to know the value of a state $s$ that lives in a space $S$.
+
+For Blackwell, an **experiment** is  a **conditional probability model** $\{\mu(\cdot \mid s) : s \in S\}$, i.e., a family of probability distributions that are conditioned by the same state $s \in S$.
+
+We are free to interpret the "state" as a "parameter" or "parameter vector".
+
+In a two-state case $S = \{s_1, s_2\}$, the  two conditional densities $f(\cdot) = \mu(\cdot \mid s_1)$ and $g(\cdot) = \mu(\cdot \mid s_2)$ are the ones used repeatedly in  our studies of classical hypothesis testing and Bayesian inference in this  QuantEcon lecture {doc}`likelihood_bayes` as well as several other lectures in this suite of QuantEcon lectures.
+
+{cite:t}`kihlstrom1984` interprets the question *which experiment is more informative?* as asking which conditional probability model allows a Bayesian decision maker with a prior over $\{s_1, s_2\}$ to gather higher expected utility.
+
+We'll use the terms "signal" and "experiment" as synyomyms.
+
+Thus, suppose that two signals, $\tilde{x}_\mu$ and $\tilde{x}_\nu$, are both informative about an unknown state $\tilde{s}$.
+
+Signal $\mu$ is **at least as informative as** signal $\nu$ if every Bayesian decision maker can attain weakly higher expected utility with $\mu$ than with $\nu$.
+
+This economic criterion is equivalent to two statistical criteria:
+
+- *Sufficiency* (Blackwell): $\tilde{x}_\nu$ can be generated from $\tilde{x}_\mu$ by an additional randomization.
+- *Uncertainty reduction* ({cite:t}`degroot1962`): $\tilde{x}_\mu$ lowers expected uncertainty at least as much as $\tilde{x}_\nu$ for every concave uncertainty function.
+
+Kihlstrom's formulation focuses on the *posterior distribution*.
+
+More informative experiments generate posterior distributions that are more dispersed in convex order.
+
+In the two-state case, this becomes a mean-preserving-spread comparison on $[0, 1]$, which can be checked with the integrated-CDF test used for second-order stochastic dominance.
+
+The lecture proceeds as follows:
+
+1. Set up notation and define experiments as Markov matrices.
+2. Define stochastic transformations using Markov kernels.
+3. State the three equivalent criteria.
+4. State and sketch the proof of the main theorem.
+5. Develop the Bayesian interpretation via standard experiments and mean-preserving spreads.
+6. Illustrate each idea with Python simulations.
+
+We begin with some imports.
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy.optimize import minimize
+
+np.random.seed(0)
+```
+
+## Experiments and stochastic transformations
+
+### The state space and experiments
+
+Let $S = \{s_1, \ldots, s_N\}$ be a finite set of possible states of the world.
+
+An **experiment** is described by the conditional distribution of an observed signal
+$\tilde{x}$ given the state $\tilde{s}$.
+
+When the signal space is also finite, say $X = \{x_1, \ldots, x_M\}$, an experiment
+reduces to an $N \times M$ *Markov matrix*
+
+$$
+\mu = [\mu_{ij}], \qquad
+\mu_{ij} = \Pr(\tilde{x}_\mu = x_j \mid \tilde{s} = s_i) \geq 0,
+\quad \sum_{j=1}^{M} \mu_{ij} = 1 \;\forall\, i.
+$$
+
+Each row $i$ gives the distribution of signals when the true state is $s_i$.
+
+
+```{code-cell} ipython3
+μ = np.array([[0.6, 0.3, 0.1],
+              [0.1, 0.3, 0.6]])
+
+Q = np.array([[1.0, 0.0],
+              [0.5, 0.5],
+              [0.0, 1.0]])
+
+ν = μ @ Q
+
+print("Experiment μ (3 signals, rows sum to 1):")
+print(μ)
+print("\nStochastic transformation Q (3 × 2):")
+print(Q)
+print("\nExperiment ν = μ @ Q (2 signals):")
+print(ν)
+print("\nRow sums μ:", μ.sum(axis=1))
+print("Row sums ν:", ν.sum(axis=1))
+```
+
+### Stochastic transformations
+
+A **stochastic transformation** $Q$ maps signals from one experiment to signals from another by further randomization.
+
+In the discrete setting with $M$ input signals and $K$ output signals, $Q$ is an
+$M \times K$ Markov matrix: $q_{lk} \geq 0$ and $\sum_k q_{lk} = 1$ for every row $l$.
+
+```{prf:definition} Sufficiency
+:label: def-sufficiency
+
+Experiment $\mu$ is *sufficient for* $\nu$ if there exists a stochastic
+transformation $Q$ (an $M \times K$ Markov matrix) such that
+
+$$
+\nu = \mu \, Q,
+$$
+
+meaning that an observer of $\tilde{x}_\mu$ can generate the distribution of
+$\tilde{x}_\nu$ by passing their signal through $Q$.
+```
+
+If you observe the more informative signal $\tilde{x}_\mu$, then you can always *throw away* information to reproduce a less informative signal.
+
+The reverse is not possible: a less informative signal cannot be enriched to
+recover what was lost.
+
+We can verify this numerically using the two experiments $\mu$ and $\nu$
+defined above.
+
+The function below searches for a stochastic transformation $Q$ that
+minimizes $\|\nu - \mu \, Q\|$.
+
+If an exact $Q$ exists the residual will be close to zero; otherwise it will
+be large.
+
+```{code-cell} ipython3
+def find_stochastic_transform(μ, ν, tol=1e-8):
+    """
+    Find a row-stochastic matrix Q that minimizes ||ν - μ @ Q||.
+    """
+    _, M = μ.shape
+    _, K = ν.shape
+
+    def unpack(q_flat):
+        return q_flat.reshape(M, K)
+
+    def objective(q_flat):
+        Q = unpack(q_flat)
+        return np.linalg.norm(ν - μ @ Q)**2
+
+    constraints = [
+        {"type": "eq", "fun": lambda q_flat, 
+        row=i: unpack(q_flat)[row].sum() - 1.0}
+        for i in range(M)
+    ]
+    bounds = [(0.0, 1.0)] * (M * K)
+    Q0 = np.full((M, K), 1 / K).ravel()
+
+    result = minimize(
+        objective,
+        Q0,
+        method="SLSQP",
+        bounds=bounds,
+        constraints=constraints,
+        options={"ftol": tol, "maxiter": 1_000},
+    )
+
+    Q = unpack(result.x)
+    residual = np.linalg.norm(ν - μ @ Q)
+    return Q, residual
+
+# Forward: find Q such that ν = μ @ Q  (should succeed)
+Q_fwd, res_fwd = find_stochastic_transform(μ, ν)
+print("Forward (μ to ν):")
+print(f"  residual = {res_fwd:.2e}")
+print(f"  exact transformation exists: {res_fwd < 1e-6}")
+
+# Reverse: find Q' such that μ = ν @ Q'  (should fail)
+Q_rev, res_rev = find_stochastic_transform(ν, μ)
+print("\nReverse (ν to μ):")
+print(f"  residual = {res_rev:.2e}")
+print(f"  exact transformation exists: {res_rev < 1e-6}")
+```
+
+The forward residual is close to zero: a stochastic transformation from
+$\mu$ to $\nu$ exists, confirming that $\mu$ is sufficient for $\nu$.
+
+The reverse residual is large: no stochastic transformation can recover
+$\mu$ from $\nu$.
+
+No stochastic transformation can undo the
+information loss.
+
+The key is that the inverse of a stochastic transformation in general is not a stochastic transformation.
+
+In fact, the only stochastic transformations whose inverses are also stochastic are permutation matrices, which merely relabel signals without losing any information.
+
+## Three equivalent criteria
+
+Blackwell's theorem establishes that three different ways of comparing experiments all turn out to be equivalent.
+
+### Criterion 1: the economic criterion
+
+The first criterion compares experiments by their value to decision makers.
+
+Let $A$ be a compact convex set of actions and $u: A \times S \to \mathbb{R}$ a
+bounded utility function.
+
+A decision maker observes $x \in X$, updates beliefs about $\tilde{s}$ by Bayes' rule, and chooses $d(x) \in A$ to maximize expected utility.
+
+Let $p = (p_1, \ldots, p_N)$ be the prior over states, and write
+
+$$
+P = \bigl\{(p_1, \ldots, p_N) : p_i \geq 0,\; \textstyle\sum_i p_i = 1\bigr\}
+$$
+
+for the probability simplex.
+
+For fixed $A$ and $u$, the set of *achievable expected-utility vectors* under experiment $\mu$ is
+
+$$
+B(\mu, A, u) = \Bigl\{v \in \mathbb{R}^N :
+  v_i = \textstyle\int_X u(f(x), s_i)\,\mu_i(dx)
+  \text{ for some measurable } f: X \to A \Bigr\}.
+$$
+
+```{prf:definition} Economic criterion
+:label: def-economic-criterion
+
+$\mu$ is **at least as informative as** $\nu$ in the economic sense if
+
+$$
+B(\mu, A, u) \supseteq B(\nu, A, u)
+$$
+
+for every compact convex action set $A$ and every bounded utility function $u: A \times S \to \mathbb{R}$.
+```
+
+This criterion says that experiment $\mu$ is better than experiment $\nu$ if anything a decision maker can achieve after seeing $\nu$, they can also achieve after seeing $\mu$.
+
+The reason is that a more informative experiment lets the agent imitate a less informative one by *ignoring* or *garbling* some of the extra information.
+
+But the reverse need not be possible.
+
+So $B(\mu, A, u) \supseteq B(\nu, A, u)$ means that $\mu$ gives the decision maker at least as many feasible expected-utility outcomes as $\nu$.
+
+Equivalently, every Bayesian decision maker attains weakly higher expected utility with $\tilde{x}_\mu$ than with $\tilde{x}_\nu$, for every prior $p \in P$.
+
+### Criterion 2: the sufficiency criterion
+
+The second criterion uses the stochastic transformation idea introduced above.
+
+```{prf:definition} Blackwell sufficiency
+:label: def-blackwell-sufficiency
+
+$\mu \geq \nu$ in Blackwell's sense if there exists a stochastic transformation $Q$ from the signal space of $\mu$ to the signal space of $\nu$ such that
+
+$$
+\nu_i(E) = (Q \circ \mu_i)(E)
+\quad \forall\, E \in \mathscr{G},\; i = 1, \ldots, N.
+$$
+```
+
+In matrix notation for finite experiments: $\nu = \mu \, Q$.
+
+### Criterion 3: the uncertainty criterion
+
+The third criterion compares experiments by how much they reduce uncertainty about the state.
+
+{cite:t}`degroot1962` calls any concave function $U: P \to \mathbb{R}$ an **uncertainty function**.
+
+The prototypical example is Shannon entropy:
+
+$$
+U(p) = -\sum_{i=1}^{N} p_i \log p_i.
+$$
+
+```{prf:definition} DeGroot uncertainty criterion
+:label: def-degroot-uncertainty
+
+$\mu$ **reduces expected uncertainty at least as much as** $\nu$ if, for every prior $p \in P$ and every concave $U: P \to \mathbb{R}$,
+
+$$
+\int_P U(q)\,\hat\mu^p(dq)
+\;\leq\;
+\int_P U(q)\,\hat\nu^p(dq),
+$$
+
+where $\hat\mu^p$ is the distribution of posterior beliefs induced by experiment $\mu$ under prior $p$.
+```
+
+To see this, let $Q = p^\mu(X)$ denote the random posterior induced by experiment $\mu$.
+
+Then $Q$ has distribution $\hat\mu^p$, so
+
+$$
+\mathbb{E}[U(Q)] = \int_P U(q)\,\hat\mu^p(dq).
+$$
+
+Since $U$ is concave, Jensen's inequality gives
+
+$$
+\mathbb{E}[U(Q)] \leq U(\mathbb{E}[Q]) = U(p).
+$$
+
+Hence
+
+$$
+\int_P U(q)\,\hat\mu^p(dq) \leq U(p),
+$$
+
+so any experiment weakly lowers expected uncertainty.
+
+Kihlstrom's standard-experiment construction will later let us compare posterior distributions under the uniform prior $c = (1 / N, \ldots, 1 / N)$.
+
+## The main theorem
+
+```{prf:theorem} Blackwell's theorem
+:label: thm-blackwell
+
+The following three conditions are equivalent:
+
+(i) Economic criterion: $B(\mu, A, u) \supseteq B(\nu, A, u)$ for every compact convex $A$ and every bounded utility function $u$.
+
+(ii) Sufficiency criterion: There exists a stochastic transformation $Q$ from the signal space of $\mu$ to the signal space of $\nu$ such that $\nu = Q \circ \mu$.
+
+(iii) Uncertainty criterion: $\int_P U(q)\,\hat\mu^p(dq) \leq \int_P U(q)\,\hat\nu^p(dq)$ for every prior $p \in P$ and every concave $U$.
+```
+
+See also {cite:t}`blackwell1951`, {cite:t}`bonnenblust1949`, and {cite:t}`degroot1962`.
+
+The hard part is the equivalence between the economic and sufficiency criteria.
+
+*Sketch (ii $\Rightarrow$ i):* If $\nu = \mu Q$, then any decision rule based on $\tilde{x}_\nu$ can be replicated by first observing $\tilde{x}_\mu$, then drawing a synthetic $\tilde{x}_\nu$ from $Q$, and then applying the same rule.
+
+*Sketch (i $\Rightarrow$ ii):* Since $B(\mu, A, u) \supseteq B(\nu, A, u)$ for every $A$ and $u$, a separating-hyperplane (duality) argument implies the existence of a posterior-space mean-preserving kernel $D$ sending the standard experiment of $\nu$ into that of $\mu$. Passing from these posterior laws back to the original signal spaces then yields the required garbling $Q$ with $\nu = \mu Q$. Thus $D$ is an intermediate randomization on posterior beliefs, not literally the signal-space kernel $Q$.
+
+*Sketch (ii $\Rightarrow$ iii):* Under a garbling, the posterior from the coarser experiment is the conditional expectation of the posterior from the finer experiment, so Jensen's inequality gives the result for every concave $U$.
+
+*Sketch (iii $\Rightarrow$ ii):* The converse, that the inequality for all concave $U$ forces the existence of $Q$, is proved in {cite}`blackwell1953`. Kihlstrom's posterior-based representation makes the geometry transparent.
+
+## Kihlstrom's Bayesian interpretation
+
+### Posteriors and standard experiments
+
+The key object in Kihlstrom's analysis is the *posterior belief vector*.
+
+When prior $p$ holds and experiment $\mu$ produces signal $x$, Bayes' rule gives
+
+$$
+p_i^\mu(x) = \Pr(\tilde{s} = s_i \mid \tilde{x}_\mu = x)
+= \frac{\mu_{ix} \, p_i}{\sum_j \mu_{jx}\, p_j}, \qquad i = 1, \ldots, N.
+$$
+
+The posterior $p^\mu(x) \in P$ is a random point in the simplex.
+
+```{prf:property} Mean preservation
+:label: prop-mean-preservation
+
+The prior $p$ is the expectation of the posterior:
+
+$$
+\mathbb{E}[p^\mu] = \sum_x \Pr(\tilde{x}_\mu = x)\, p^\mu(x) = p.
+$$
+
+This is sometimes called the *law of iterated expectations for beliefs*.
+```
+
+For a fixed prior $c$, Kihlstrom's **standard experiment** replaces the raw signals of $\mu$ with the posterior beliefs they generate.
+
+Let $\hat\mu^c$ denote the distribution over posteriors induced by $\mu$ under prior $c$.
+Mean preservation implies $\int_P q \, \hat\mu^c(dq) = c$.
+
+Two experiments are **informationally equivalent** when they induce the same posterior distribution.
+
+The standard experiment strips away every detail of the signal except its posterior, so it provides a canonical Bayesian representation for comparing experiments.
+
+A stochastic kernel on posterior beliefs lives on the simplex $P$, whereas a Blackwell garbling $Q$ lives on the original signal space. Kihlstrom's construction uses the former to study convex order and then recovers the latter after passing to standard experiments.
+
+Any two experiments that generate the same distribution over posteriors lead to identical decisions for every Bayesian decision maker, regardless of how different their raw signal spaces may look.
+
+### Mean-preserving spreads and Blackwell's order
+
+Kihlstrom's key reformulation is the following.
+
+```{prf:theorem} Kihlstrom's Reformulation
+:label: thm-kihlstrom
+
+$\mu \geq \nu$ in Blackwell's sense if and only if $\hat\mu^c$ is a
+**mean-preserving spread** of $\hat\nu^c$; that is,
+
+$$
+\int_P g(p)\,\hat\mu^c(dp) \;\geq\; \int_P g(p)\,\hat\nu^c(dp)
+$$
+
+for every convex function $g: P \to \mathbb{R}$.
+```
+
+Equivalently, $\hat\mu^c$ is larger than $\hat\nu^c$ in convex order.
+
+A better experiment spreads posterior beliefs farther from the prior while preserving their mean.
+
+To see this concretely, we define two experiments for the two-state case and compute their posteriors.
+
+```{code-cell} ipython3
+def compute_posteriors(μ, prior, tol=1e-14):
+    """
+    Compute the posterior distribution for each signal realisation.
+    """
+    N, M = μ.shape
+    signal_probs = μ.T @ prior
+    numerators = μ.T * prior
+    posteriors = np.zeros((M, N))
+    np.divide(
+        numerators,
+        signal_probs[:, None],
+        out=posteriors,
+        where=signal_probs[:, None] > tol,
+    )
+    return posteriors, signal_probs
+
+
+def check_mean_preservation(posteriors, signal_probs, prior):
+    """Verify E[posterior] == prior."""
+    expected_posterior = (posteriors * signal_probs[:, None]).sum(axis=0)
+    return expected_posterior, np.allclose(expected_posterior, prior)
+
+
+N = 2
+prior = np.array([0.5, 0.5])
+
+μ_info = np.array([[0.8, 0.2],
+                   [0.2, 0.8]])
+
+ν_info = np.array([[0.6, 0.4],
+                   [0.4, 0.6]])
+
+post_μ, probs_μ = compute_posteriors(μ_info, prior)
+post_ν, probs_ν = compute_posteriors(ν_info, prior)
+
+print("Experiment μ (more informative):\n")
+print("Signal probabilities:", probs_μ.round(3))
+print("Posteriors (row = signal, col = state):")
+print(post_μ.round(3))
+mean_μ, ok_μ = check_mean_preservation(post_μ, probs_μ, prior)
+print(f"E[posterior] = {mean_μ.round(4)}  (equals prior: {ok_μ})")
+
+print("\n Experiment ν (less informative):\n")
+print("Signal probabilities:", probs_ν.round(3))
+print("Posteriors:")
+print(post_ν.round(3))
+mean_ν, ok_ν = check_mean_preservation(post_ν, probs_ν, prior)
+print(f"E[posterior] = {mean_ν.round(4)}  (equals prior: {ok_ν})")
+```
+
+For $N = 2$ states, the simplex $P$ is the unit interval $[0, 1]$ (the probability
+of state $s_1$).  
+
+We can directly plot the distribution of posteriors under
+experiments $\mu$ and $\nu$.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Posterior distributions in the two-state case
+    name: fig-blackwell-two-state-posteriors
+---
+def plot_posterior_distributions(μ_matrix, ν_matrix, prior,
+                                 labels=("μ (more informative)",
+                                         "ν (less informative)")):
+    """
+    For a two-state experiment, plot the distribution of posteriors
+    (i.e., the standard experiment distribution) on [0,1].
+    """
+    posts_μ, probs_μ = compute_posteriors(μ_matrix, prior)
+    posts_ν, probs_ν = compute_posteriors(ν_matrix, prior)
+
+    fig, axes = plt.subplots(1, 2, figsize=(11, 4), sharey=False)
+    prior_val = prior[0]
+
+    for ax, posts, probs, label in zip(
+        axes, [posts_μ, posts_ν], [probs_μ, probs_ν], labels):
+        p_s1 = posts[:, 0]
+        ax.vlines(p_s1, 0, probs, linewidth=6, color="steelblue", alpha=0.7)
+        ax.axvline(prior_val, color="tomato", linestyle="--", linewidth=2,
+                   label=f"prior = {prior_val:.2f}")
+        ax.set_xlim(0, 1)
+        ax.set_xlabel(r"posterior $p(s_1 \mid x)$", fontsize=12)
+        ax.set_ylabel("probability mass", fontsize=12)
+        mean_post = (p_s1 * probs).sum()
+        ax.axvline(mean_post, color="green", linestyle=":", linewidth=2,
+                   label=f"E[post] = {mean_post:.2f}")
+        ax.text(0.03, 0.94, label, transform=ax.transAxes, va="top")
+        ax.legend()
+
+    plt.tight_layout()
+    plt.show()
+
+plot_posterior_distributions(μ_info, ν_info, prior)
+```
+
+This is the mean-preserving spread in action: both distributions have the same mean (equal to the prior), but the more informative experiment $\mu$ spreads its posteriors farther apart.
+
+We can verify the mean-preserving spread condition numerically.
+
+The key fact is that, up to an affine term, any convex function can be represented as a mixture of
+"call option" payoffs $g_t(p) = \max(p - t, 0)$.
+
+Because the two posterior distributions being compared have the same mean, that affine term cancels in the comparison.
+
+So it suffices to check $E[g_t(p^\mu)] \geq E[g_t(p^\nu)]$ for all
+thresholds $t \in [0, 1]$.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Convex-order check in the two-state case
+    name: fig-blackwell-convex-order-check
+---
+def check_mps_convex_functions(μ_matrix, ν_matrix, prior, n_functions=200):
+    """
+    Verify the mean-preserving spread condition using
+    convex functions g(p) = max(p - t, 0).
+    """
+    posts_μ, probs_μ = compute_posteriors(μ_matrix, prior)
+    posts_ν, probs_ν = compute_posteriors(ν_matrix, prior)
+
+    p_μ = posts_μ[:, 0]
+    p_ν = posts_ν[:, 0]
+
+    thresholds = np.linspace(0, 1, n_functions)
+    diffs = []
+    for t in thresholds:
+        Eg_μ = (np.maximum(p_μ - t, 0) * probs_μ).sum()
+        Eg_ν = (np.maximum(p_ν - t, 0) * probs_ν).sum()
+        diffs.append(Eg_μ - Eg_ν)
+
+    fig, ax = plt.subplots(figsize=(8, 4))
+    ax.plot(thresholds, diffs, color="steelblue", linewidth=2)
+    ax.axhline(0, color="tomato", linestyle="--", linewidth=2)
+    ax.fill_between(thresholds, diffs, 0,
+                    where=np.array(diffs) >= 0,
+                    alpha=0.25, color="steelblue",
+                    label="$E[g(p^μ)] - E[g(p^ν)] \\geq 0$")
+    ax.set_xlabel("threshold $t$", fontsize=12)
+    ax.set_ylabel(r"$E[\max(p-t,0)]$ difference", fontsize=12)
+    ax.legend(fontsize=11)
+    plt.tight_layout()
+    plt.show()
+
+    all_non_negative = all(d >= -1e-10 for d in diffs)
+    print(f"μ is a mean-preserving spread of ν: {all_non_negative}")
+    return diffs
+
+_ = check_mps_convex_functions(μ_info, ν_info, prior)
+```
+
+The difference $E[g_t(p^\mu)] - E[g_t(p^\nu)]$ is non-negative for every threshold $t$, confirming that $\hat\mu^c$ is a mean-preserving spread of $\hat\nu^c$ and therefore $\mu \geq \nu$ in the Blackwell order.
+
+## Simulating the Blackwell order with many states
+
+We now move to a three-state example.
+
+Experiment $\mu$ is strongly correlated with the state, and experiment $\nu$ is a garbling of $\mu$.
+
+```{code-cell} ipython3
+N3 = 3
+prior3 = np.array([1/3, 1/3, 1/3])
+
+μ3 = np.array([[0.7, 0.2, 0.1],
+               [0.1, 0.7, 0.2],
+               [0.2, 0.1, 0.7]])
+
+Q3 = np.array([[0.9, 0.05, 0.05],
+               [0.05, 0.8, 0.15],
+               [0.05, 0.15, 0.8]])
+
+ν3 = μ3 @ Q3
+
+print("μ (3×3):")
+print(np.round(μ3, 2))
+print("\nQ (garbling):")
+print(np.round(Q3, 2))
+print("\nν = μ @ Q:")
+print(np.round(ν3, 3))
+```
+
+
+For three states, posterior beliefs live in a 2-simplex.
+
+Let's visualize sampled posterior points under $\mu$ and $\nu$
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Sampled posterior points on the 2-simplex
+    name: fig-blackwell-simplex-clouds
+---
+def sample_posteriors(μ_matrix, prior, n_draws=3000):
+    """
+    Simulate n_draws observations from the experiment and compute
+    the resulting posterior beliefs.
+    Returns array of shape (n_draws, N).
+    """
+    N, M = μ_matrix.shape
+    states = np.random.choice(N, size=n_draws, p=prior)
+    signals = np.array([np.random.choice(M, p=μ_matrix[s]) for s in states])
+    posteriors, _ = compute_posteriors(μ_matrix, prior)
+    return posteriors[signals]
+
+
+def simplex_to_cart(pts):
+    """Convert 3-simplex barycentric coordinates to 2-D Cartesian."""
+    corners = np.array([[0.0, 0.0],
+                        [1.0, 0.0],
+                        [0.5, np.sqrt(3)/2]])
+    return pts @ corners
+
+
+def plot_simplex_posteriors(μ_matrix, ν_matrix, prior3, n_draws=3000):
+    posts_μ = sample_posteriors(μ_matrix, prior3, n_draws)
+    posts_ν = sample_posteriors(ν_matrix, prior3, n_draws)
+
+    cart_μ = simplex_to_cart(posts_μ)
+    cart_ν = simplex_to_cart(posts_ν)
+    prior_cart = simplex_to_cart(prior3[None, :])[0]
+
+    corners = np.array([[0.0, 0.0],
+                        [1.0, 0.0],
+                        [0.5, np.sqrt(3)/2]])
+
+    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
+    panel_labels = ["μ (more informative)", "ν (garbled)"]
+    data = [(cart_μ, "steelblue"), (cart_ν, "darkorange")]
+    labels = ["$s_1$", "$s_2$", "$s_3$"]
+    offsets = [(-0.07, -0.05), (0.02, -0.05), (-0.02, 0.03)]
+
+    for ax, (cart, c), panel_label in zip(axes, data, panel_labels):
+        tri = plt.Polygon(corners, fill=False, edgecolor="black", linewidth=2)
+        ax.add_patch(tri)
+        ax.scatter(cart[:, 0], cart[:, 1], s=4, alpha=0.25, color=c)
+        ax.scatter(*prior_cart, s=120, color="red", zorder=5,
+                   label="prior", marker="*")
+        for i, (lbl, off) in enumerate(zip(labels, offsets)):
+            ax.text(corners[i][0] + off[0], corners[i][1] + off[1],
+                    lbl, fontsize=13)
+        ax.set_xlim(-0.15, 1.15)
+        ax.set_ylim(-0.1, np.sqrt(3)/2 + 0.1)
+        ax.set_aspect("equal")
+        ax.set_xticks([])
+        ax.set_yticks([])
+        ax.text(0.03, 0.94, panel_label, transform=ax.transAxes, va="top")
+        ax.legend(fontsize=11, loc="upper right")
+
+    plt.tight_layout()
+    plt.show()
+
+plot_simplex_posteriors(μ3, ν3, prior3)
+```
+
+Because this example has only three signals, each panel consists of three posterior atoms sampled repeatedly rather than a continuous cloud.
+
+Under $\mu$, the sampled posterior points reach farther toward the vertices.
+
+Under the garbled experiment $\nu$, the sampled posterior points stay closer to the center.
+
+## The DeGroot uncertainty function
+
+### Concave uncertainty functions and the value of information
+
+{cite}`degroot1962` formalizes the value of information through an **uncertainty function** $U: P \to \mathbb{R}$.
+
+In DeGroot's axiomatization, an uncertainty function is:
+
+- *Concave*: by Jensen, observing any signal weakly reduces expected uncertainty.
+- *Symmetric*: it depends on the components of $p$, not their labeling.
+- *Normalized*: it is maximized at $p = (1/N, \ldots, 1/N)$ and minimized at vertices.
+
+The *value of experiment $\mu$ given prior $p$* is
+
+$$
+I(\tilde{x}_\mu;\, \tilde{s};\, U)
+= U(p) - \mathbb{E}[U(p^\mu)],
+$$
+
+This quantity is the expected reduction in uncertainty.
+
+Blackwell's order is equivalent to the statement that $I(\tilde{x}_\mu; \tilde{s}; U) \geq I(\tilde{x}_\nu; \tilde{s}; U)$ for *every* concave $U$.
+
+### Shannon entropy as a special case
+
+The canonical uncertainty function is Shannon entropy
+
+$$
+U_H(p) = -\sum_{i=1}^{N} p_i \log p_i.
+$$
+
+Under the uniform prior $c = (1/N, \ldots, 1/N)$, DeGroot's value formula becomes
+
+$$
+I(\tilde{x}_\mu, c;\, U_H)
+= \log N - H(\tilde{s} \mid \tilde{x}_\mu),
+$$
+
+where $H(\tilde{s} \mid \tilde{x}_\mu)$ is the conditional entropy of the state given the signal.
+
+To see why, write $H(\tilde{s} \mid \tilde{x}_\mu) = \sum_x \Pr(\tilde{x}_\mu = x) \, H(\tilde{s} \mid \tilde{x}_\mu = x)$, where each conditional entropy term equals $-\sum_i p_i^\mu(x) \log p_i^\mu(x) = U_H(p^\mu(x))$.
+
+Substituting into DeGroot's formula gives $I = U_H(c) - \mathbb{E}[U_H(p^\mu)] = \log N - H(\tilde{s} \mid \tilde{x}_\mu)$, which is exactly the *mutual information* between $\tilde{x}_\mu$ and $\tilde{s}$.
+
+```{note}
+The Blackwell ordering implies the entropy-based inequality, but the *converse fails*: entropy alone does not pin down the full Blackwell ordering.
+
+Two experiments can have the same mutual information yet differ in Blackwell rank, because a single concave function cannot detect all differences in the dispersion of posteriors.
+
+The full Blackwell ordering requires the inequality to hold for *every* concave $U$, not just Shannon entropy.
+```
+
+```{code-cell} ipython3
+def entropy(p, ε=1e-12):
+    """Shannon entropy of a probability vector."""
+    p = np.asarray(p, dtype=float)
+    p = np.clip(p, ε, 1.0)
+    return -np.sum(p * np.log(p))
+
+
+def degroot_value(μ_matrix, prior, U_func):
+    """
+    Compute DeGroot's value of information I = U(prior) - E[U(posterior)].
+    """
+    posts, probs = compute_posteriors(μ_matrix, prior)
+    prior_uncertainty = U_func(prior)
+    expected_post_uncertainty = sum(
+        probs[j] * U_func(posts[j]) for j in range(len(probs)))
+    return prior_uncertainty - expected_post_uncertainty
+
+
+def gini_impurity(p):
+    """Gini impurity: 1 - sum(p_i^2)."""
+    return 1.0 - np.sum(np.asarray(p)**2)
+
+
+def tsallis_entropy(p, q=2):
+    """Tsallis entropy of order q (concave for q>1)."""
+    p = np.clip(p, 1e-12, 1.0)
+    return (1 - np.sum(p**q)) / (q - 1)
+
+
+def tsallis_q15(p):
+    """Tsallis entropy with q=1.5 for an independent concavity check."""
+    return tsallis_entropy(p, q=1.5)
+
+
+def sqrt_index(p):
+    """Concave uncertainty index based on sum(sqrt(p_i))."""
+    p = np.clip(np.asarray(p), 0.0, 1.0)
+    return np.sum(np.sqrt(p)) - 1.0
+
+uncertainty_functions = {
+    "Shannon entropy": entropy,
+    "Gini impurity": gini_impurity,
+    "Tsallis (q=1.5)": tsallis_q15,
+    "Square-root index": sqrt_index,
+}
+
+header = (f"{'Uncertainty function':<22}  "
+          f"{'I(μ)':<10}  {'I(ν)':<10}  "
+          f"{'I(μ)>=I(ν)?'}")
+print(header)
+print("-" * 58)
+for name, U in uncertainty_functions.items():
+    I_μ = degroot_value(μ_info, prior, U)
+    I_ν = degroot_value(ν_info, prior, U)
+    print(f"{name:<22}  {I_μ:<10.4f}  {I_ν:<10.4f}  {I_μ >= I_ν - 1e-10}")
+```
+
+As predicted by the theorem, $I(\mu) \geq I(\nu)$ for every concave uncertainty function once we know $\mu \geq \nu$ in the Blackwell sense.
+
+### Value of information as a function of experiment quality
+
+We now parameterize a continuum of experiments between the uninformative and perfectly informative cases.
+
+For $N = 2$ states, a natural family is
+
+$$
+\mu(\theta) = (1 - \theta) \cdot \tfrac{1}{2}\mathbf{1}\mathbf{1}^\top
+             + \theta \cdot I_2,
+\quad \theta \in [0, 1],
+$$
+
+The first term is the completely mixed matrix and $I_2$ is the identity.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Value of information and experiment quality
+    name: fig-blackwell-value-by-quality
+---
+def make_experiment(θ, N=2):
+    """Parameterized experiment: θ=0 is uninformative, θ=1 is perfect."""
+    return (1 - θ) * np.ones((N, N)) / N + θ * np.eye(N)
+
+
+θs = np.linspace(0, 1, 100)
+prior2 = np.array([0.5, 0.5])
+
+fig, ax = plt.subplots(figsize=(9, 4))
+for name, U in uncertainty_functions.items():
+    values = [degroot_value(make_experiment(θ), prior2, U) for θ in θs]
+    vmin, vmax = values[0], values[-1]
+    normed = (np.array(values) - vmin) / (vmax - vmin + 1e-15)
+    ax.plot(θs, normed, label=name, linewidth=2)
+
+ax.set_xlabel("experiment quality θ  (0 = uninformative, 1 = perfect)",
+              fontsize=11)
+ax.set_ylabel("normalized value of information I(μ(θ))", fontsize=11)
+ax.legend(fontsize=10)
+plt.tight_layout()
+plt.show()
+```
+
+Every concave uncertainty function assigns weakly higher value to a more informative experiment.
+
+## Connection to second-order stochastic dominance
+
+A random variable $X$ **second-order stochastically dominates**
+$Y$ (written $X \succeq_{\text{SOSD}} Y$) if
+$E[u(X)] \geq E[u(Y)]$ for every concave function $u$.
+Equivalently, $Y$ is a mean-preserving spread of $X$.
+
+The uncertainty-function representation makes the connection
+to SOSD explicit.
+
+Because $U$ is concave, $-U$ is convex, and the condition
+
+$$
+\mathbb{E}[U(p^\mu)] \leq \mathbb{E}[U(p^\nu)] \quad \text{for all concave } U
+$$
+
+is precisely the statement that $\hat\mu^c$ dominates $\hat\nu^c$ in convex order on $P$.
+
+When $N = 2$, posterior beliefs are scalars in $[0, 1]$, and the SOSD comparison reduces to the classical integrated-CDF test.
+
+Specifically, $\hat\mu^c$ is a mean-preserving spread of $\hat\nu^c$ if and only if $\int_0^t F_\mu(s)\,ds \geq \int_0^t F_\nu(s)\,ds$ for all $t \in [0,1]$, where $F_\mu$ and $F_\nu$ are the CDFs of the posterior on $s_1$ under each experiment. Equivalently, in SOSD language, the less informative posterior under $\nu$ dominates the more dispersed posterior under $\mu$.
+
+We can verify this graphically for the two-state example above
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Integrated-CDF check in the two-state case
+    name: fig-blackwell-integrated-cdf
+---
+def cdf_data_1d(weights, values):
+    """Sort support points and cumulative masses for a discrete distribution."""
+    idx = np.argsort(values)
+    sorted_vals = values[idx]
+    sorted_wts = weights[idx]
+    cum_mass = np.cumsum(sorted_wts)
+    return sorted_vals, cum_mass
+
+
+def plot_sosd_posteriors(μ_matrix, ν_matrix, prior):
+    """Plot CDFs and integrated CDFs for the posterior-on-s1 distributions."""
+    posts_μ, probs_μ = compute_posteriors(μ_matrix, prior)
+    posts_ν, probs_ν = compute_posteriors(ν_matrix, prior)
+
+    p_μ = posts_μ[:, 0]
+    p_ν = posts_ν[:, 0]
+
+    sv_μ, cm_μ = cdf_data_1d(probs_μ, p_μ)
+    sv_ν, cm_ν = cdf_data_1d(probs_ν, p_ν)
+
+    fig, axes = plt.subplots(1, 2, figsize=(11, 4))
+
+    ax = axes[0]
+    for sv, cm, lbl, c in [(sv_μ, cm_μ, "μ", "steelblue"),
+                           (sv_ν, cm_ν, "ν", "darkorange")]:
+        xs = np.concatenate([[0], sv, [1]])
+        ys = np.concatenate([[0], cm, [1]])
+        ax.step(xs, ys, where="post", label=lbl, color=c, linewidth=2)
+    ax.axvline(prior[0], linestyle="--", color="gray", alpha=0.6, linewidth=2,
+               label="prior")
+    ax.set_xlabel(r"posterior $p(s_1 \mid x)$", fontsize=12)
+    ax.set_ylabel("cumulative probability", fontsize=12)
+    ax.text(0.03, 0.94, "CDFs", transform=ax.transAxes, va="top")
+    ax.legend(fontsize=11)
+
+    ax2 = axes[1]
+    grid = np.linspace(0, 1, 200)
+
+    def integrated_cdf(sorted_vals, cum_mass, grid):
+        cdf = np.array([cum_mass[sorted_vals <= t].max()
+                        if np.any(sorted_vals <= t) else 0.0
+                        for t in grid])
+        return np.cumsum(cdf) * (grid[1] - grid[0])
+
+    int_μ = integrated_cdf(sv_μ, cm_μ, grid)
+    int_ν = integrated_cdf(sv_ν, cm_ν, grid)
+
+    ax2.plot(grid, int_μ, label=r"$\int F_\mu$", color="steelblue", linewidth=2)
+    ax2.plot(grid, int_ν, color="darkorange",
+             label=r"$\int F_\nu$", linewidth=2)
+    ax2.fill_between(grid, int_ν, int_μ,
+                     where=int_μ >= int_ν,
+                     alpha=0.2, color="steelblue",
+                     label=(r"$\int F_\mu \geq \int F_\nu$"
+                            r" ($\mu$ is an MPS of $\nu$)"))
+    ax2.set_xlabel(r"$t$", fontsize=12)
+    ax2.set_ylabel("integrated CDF", fontsize=12)
+    ax2.text(0.03, 0.94, "integrated CDFs", transform=ax2.transAxes, va="top")
+    ax2.legend(fontsize=10)
+
+    plt.tight_layout()
+    plt.show()
+
+plot_sosd_posteriors(μ_info, ν_info, prior)
+```
+
+## Application 1: product quality information
+
+{cite:t}`kihlstrom1974a` applies Blackwell's theorem to consumer demand for information about product quality.
+
+- The unknown state $\tilde{s}$ is a product parameter $\theta$.
+- A consumer can purchase $\lambda$ units of information at cost $c(\lambda)$.
+- As $\lambda$ rises, the experiment becomes more informative in the Blackwell sense.
+
+The Blackwell order says that, absent costs, more information is always better for every expected-utility maximizer.
+
+With costs, the consumer chooses quality investment $\theta$ to maximize *net value*.
+
+If quality investment translates into experiment accuracy with diminishing returns — say, accuracy $\phi(\theta) = 1 - e^{-a\theta}$ for a rate parameter $a$ — then the marginal value of information eventually decreases in $\theta$.
+
+With a convex cost $c(\theta) = c \, \theta^2$, the increasing marginal cost eventually overtakes the declining marginal value, producing an interior optimum.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Information demand with a quadratic cost
+    name: fig-blackwell-information-demand
+---
+def gross_value(θ, prior2, U=entropy, rate=2):
+    """Gross value of quality investment θ (diminishing returns)."""
+    accuracy = 1 - np.exp(-rate * θ)
+    μ_t = (1 - accuracy) * np.ones((2, 2)) / 2 + accuracy * np.eye(2)
+    return degroot_value(μ_t, prior2, U)
+
+
+θ_fine = np.linspace(0, 1, 200)
+c = 0.6
+
+gross_vals = np.array([gross_value(θ, prior2) for θ in θ_fine])
+cost_vals = c * θ_fine**2
+net_vals = gross_vals - cost_vals
+marginal_vals = np.gradient(gross_vals, θ_fine)
+marginal_cost = 2 * c * θ_fine
+opt_idx = int(np.argmax(net_vals))
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+
+ax = axes[0]
+ax.plot(θ_fine, gross_vals,
+        label="Gross value I(θ)",
+        color="steelblue", linewidth=2)
+ax.plot(θ_fine, cost_vals,
+        label=r"Cost $c\theta^2$",
+        color="tomato", linestyle="--", linewidth=2)
+ax.plot(θ_fine, net_vals,
+        label="Net value", color="green", linewidth=2)
+ax.axvline(θ_fine[opt_idx], color="green",
+           linestyle=":", linewidth=2,
+           label=f"θ* ≈ {θ_fine[opt_idx]:.2f}")
+ax.set_xlabel("quality investment θ", fontsize=11)
+ax.set_ylabel("value (entropy units)", fontsize=11)
+ax.legend(fontsize=10)
+
+ax2 = axes[1]
+ax2.plot(θ_fine, marginal_vals,
+         label="Marginal value I'(θ)",
+         color="steelblue", linewidth=2)
+ax2.plot(θ_fine, marginal_cost,
+         label=r"Marginal cost $2c\theta$",
+         color="tomato", linestyle="--", linewidth=2)
+ax2.axvline(θ_fine[opt_idx], color="green",
+            linestyle=":", linewidth=2,
+            label=f"θ* ≈ {θ_fine[opt_idx]:.2f}")
+ax2.set_xlabel("quality investment θ", fontsize=11)
+ax2.set_ylabel("marginal value / cost", fontsize=11)
+ax2.legend(fontsize=10)
+
+plt.tight_layout()
+plt.show()
+```
+
+The optimal investment $\theta^*$ occurs where marginal value equals marginal cost.
+
+Because experiment accuracy has diminishing returns in $\theta$, the marginal value of investment eventually falls below the rising marginal cost, yielding a genuine interior optimum.
+
+Raising $c$ shifts the marginal cost curve up and reduces $\theta^*$, while a more asymmetric prior shifts the marginal value curve and changes the optimum.
+
+## Application 2: sequential experimental design
+
+{cite:t}`degroot1962` applies the uncertainty-function framework to *sequential experimental design*.
+
+Each period a statistician observes one draw and updates the posterior.
+
+The question is which sequence of experiments minimizes cumulative expected uncertainty.
+
+If one experiment is more informative than another at every stage, then the Blackwell order favors using the better experiment at every date.
+
+We now simulate sequential belief updating for experiments of different quality.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Sequential posterior paths for different experiment qualities
+    name: fig-blackwell-sequential-paths
+---
+def sequential_update(μ_matrix, prior, T=20, seed=0):
+    """Simulate T sequential belief updates under experiment μ."""
+    rng = np.random.default_rng(seed)
+    N, M = μ_matrix.shape
+    beliefs = np.zeros((T + 1, N))
+    beliefs[0] = prior.copy()
+
+    true_state = rng.choice(N, p=prior)
+
+    for t in range(T):
+        p = beliefs[t]
+        signal = rng.choice(M, p=μ_matrix[true_state])
+        unnorm = μ_matrix[:, signal] * p
+        beliefs[t + 1] = unnorm / unnorm.sum()
+
+    return beliefs, true_state
+
+
+def plot_sequential_beliefs(θs_compare, prior2, T=25):
+    fig, axes = plt.subplots(1, len(θs_compare), figsize=(14, 4), sharey=True)
+
+    for ax, θ in zip(axes, θs_compare):
+        μ_t = make_experiment(θ, N=2)
+        for seed in range(15):
+            beliefs, ts = sequential_update(μ_t, prior2, T=T, seed=seed)
+            c = "steelblue" if ts == 0 else "darkorange"
+            ax.plot(beliefs[:, 0], alpha=0.35, color=c, linewidth=2)
+        ax.axhline(prior2[0], linestyle="--", color="gray", linewidth=2,
+                   label="prior")
+        ax.axhline(1.0, linestyle=":", color="steelblue", linewidth=2)
+        ax.axhline(0.0, linestyle=":", color="darkorange", linewidth=2)
+        ax.set_xlabel(r"period $t$", fontsize=11)
+        if θ == θs_compare[0]:
+            ax.set_ylabel(r"posterior $p(s_1 \mid x^t)$", fontsize=11)
+        ax.set_ylim(-0.05, 1.05)
+        ax.text(0.03, 0.94, f"θ = {θ}", transform=ax.transAxes, va="top")
+        ax.legend(fontsize=9)
+
+    plt.tight_layout()
+    plt.show()
+
+plot_sequential_beliefs([0.2, 0.5, 0.9], prior2, T=30)
+```
+
+More informative experiments make beliefs converge faster to the truth.
+
+Under the correct prior, the posterior process is a martingale.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Unconditional implication of the posterior martingale property
+    name: fig-blackwell-martingale-mean
+---
+def check_martingale_mean(μ_matrix, prior, T=15, n_paths=2000, seed=0):
+    """
+    Simulate many belief paths and check E[p_t] = p_0.
+    """
+    rng = np.random.default_rng(seed)
+    N, M = μ_matrix.shape
+    all_paths = np.zeros((n_paths, T + 1, N))
+
+    for k in range(n_paths):
+        true_state = rng.choice(N, p=prior)
+        p = prior.copy()
+        all_paths[k, 0] = p
+        for t in range(T):
+            signal = rng.choice(M, p=μ_matrix[true_state])
+            unnorm = μ_matrix[:, signal] * p
+            p = unnorm / unnorm.sum()
+            all_paths[k, t + 1] = p
+
+    mean_path = all_paths[:, :, 0].mean(axis=0)
+
+    fig, ax = plt.subplots(figsize=(8, 4))
+    ax.plot(mean_path, color="steelblue", linewidth=2,
+            label=r"$\bar p_t(s_1)$ (mean over paths)")
+    ax.axhline(prior[0], linestyle="--", color="tomato", linewidth=2,
+               label=fr"Prior $p_0 = {prior[0]:.2f}$")
+    ax.set_xlabel(r"period $t$", fontsize=12)
+    ax.set_ylabel(r"$E[p_t(s_1)]$", fontsize=12)
+    ax.legend(fontsize=11)
+    ax.set_ylim(0, 1)
+    plt.tight_layout()
+    plt.show()
+
+    print(f"Prior = {prior[0]:.4f}")
+    print(f"Average mean belief across dates: {mean_path.mean():.4f}")
+
+check_martingale_mean(μ_info, prior, T=20, n_paths=5000)
+```
+
+The simulated cross-sectional mean stays close to the prior at every date.
+
+This is the unconditional implication of the posterior martingale property.
+
+## Summary
+
+Blackwell's theorem identifies a *partial order* on statistical experiments with
+three equivalent characterizations:
+
+| Criterion | Condition |
+|-----------|-----------|
+| Economic | Every decision maker weakly prefers $\mu$ to $\nu$: $B(\mu, A, u) \supseteq B(\nu, A, u)$ |
+| Sufficiency | $\nu$ is a garbling of $\mu$: $\nu = \mu Q$ for some Markov $Q$ |
+| Uncertainty | $\mu$ reduces expected uncertainty more for every prior $p$ and every concave $U$ |
+
+Kihlstrom's Bayesian exposition places the *posterior distribution* at the center.
+
+A more informative experiment generates a more dispersed posterior distribution with the same mean prior.
+
+The right probabilistic language is convex order on the simplex of posterior beliefs.
+
+In the two-state case this reduces to the familiar SOSD / integrated-CDF test on $[0, 1]$.
+
+DeGroot's contribution is to extend the comparison from particular utility functions to the full class of concave uncertainty functions.
+
+
+## The Data Processing Inequality and Coarse-Graining
+
+Blackwell's condition that $\nu = \mu Q$ for some Markov kernel $Q$ is the same mathematical operation that underlies the **data processing inequality** (DPI) and the **coarse-graining theorem** in information theory, information geometry, and machine learning.
+
+### The DPI for f-divergences
+
+An **f-divergence** between two probability distributions $P$ and $Q$ over a finite space $\Omega$ is
+
+$$
+D_f(P \| Q) = \sum_{\omega \in \Omega} q_\omega \, f\!\left(\frac{p_\omega}{q_\omega}\right),
+$$
+
+where $f : (0,\infty) \to \mathbb{R}$ is a convex function with $f(1) = 0$.
+
+Special cases include:
+
+| Divergence | Generator $f(t)$ |
+|:---|:---|
+| KL-divergence | $t \log t$ |
+| Squared Hellinger $H^2$ | $(\sqrt{t} - 1)^2 / 2$ |
+| Total variation TV | $\lvert t - 1 \rvert / 2$ |
+| Chi-squared $\chi^2$ | $(t-1)^2$ |
+
+The class of f-divergences was introduced independently by {cite:t}`ali1966`, {cite:t}`csiszar1963`, and {cite:t}`morimoto1963`; see also {cite:t}`liese2012`.
+
+```{prf:theorem} Data Processing Inequality
+:label: thm-data-processing
+
+For any f-divergence $D_f$ and any Markov kernel (stochastic transformation)
+$\kappa$, with $P \kappa$ denoting the image of $P$ under $\kappa$, we have
+
+$$
+D_f(P \| Q) \geq D_f(P\kappa \| Q\kappa).
+$$
+
+If $\kappa$ is induced by a sufficient statistic for the pair $\{P, Q\}$, then equality holds.
+
+A converse of this form requires additional hypotheses; a clean binary-model characterization is given below.
+```
+
+The proof follows from Jensen's inequality applied to the convex function $f$, using the fact that $\kappa$ is a stochastic matrix {cite}`csiszar1963`.
+
+### Connection to Blackwell's sufficiency condition
+
+In Blackwell's framework, $\mu$ and $\nu$ are experiments over the same state space $S = \{s_1, \ldots, s_N\}$.
+
+For two states, each experiment has two rows: $\mu_1 = \mu(s_1, \cdot)$ and $\mu_2 = \mu(s_2, \cdot)$.
+
+If $\nu = \mu Q$ (i.e., $\nu$ is a garbling of $\mu$), then the pair $(\nu_1, \nu_2) = (\mu_1 Q, \mu_2 Q)$ is obtained by applying the Markov kernel $Q$ to the pair $(\mu_1, \mu_2)$.
+
+The coarse-graining theorem then implies immediately:
+
+$$
+D_f(\mu_1 \| \mu_2) \geq D_f(\nu_1 \| \nu_2)
+\quad \text{for every f-divergence } D_f,
+$$
+
+whenever $\mu \geq \nu$ in the Blackwell order.
+
+So a more informative experiment always produces *more separated* conditional signal distributions, in the sense of every f-divergence simultaneously.
+
+The DPI is thus a statement about the *distinguishability* of states: garbling an experiment makes the states harder to tell apart under every statistical measure of separability.
+
+For binary experiments, the equality condition links the DPI directly back to Blackwell: $D_f(\mu_1 Q \| \mu_2 Q) = D_f(\mu_1 \| \mu_2)$ for some strictly convex $f$ if and only if $Q$ is a sufficient statistic for $(\mu_1, \mu_2)$. 
+
+Once sufficiency holds, equality follows for every convex $f$ {cite}`liese2012`.
+
+### Information geometry: Chentsov's theorem
+
+The DPI has an infinitesimal, differential-geometric companion.
+
+**Chentsov's theorem** {cite}`chentsov1981` states that the **Fisher information matrix** $I_F(\theta)$ is, up to a constant rescaling, the *unique* Riemannian metric on a statistical manifold that contracts under every Markov morphism (coarse-graining):
+
+$$
+I_F(\theta;\, \mu) \succeq I_F(\theta;\, \mu\kappa)
+\quad \text{for every differentiable family } \{\mu_\theta\} \text{ and every Markov kernel } \kappa.
+$$
+
+Equality holds if and only if $\kappa$ is a sufficient statistic for $\theta$.
+
+The uniqueness clause is deep: it says that the Fisher information is not merely *one* metric that happens to contract under coarse-graining, but the *only one* with that property.
+
+See {cite:t}`amari_nagaoka2000` for a thorough treatment of information geometry and its connections to sufficiency.
+
+### The information bottleneck in machine learning
+
+The **information bottleneck** method of {cite:t}`tishby_pereira_bialek1999` provides a prominent application of the DPI in machine learning.
+
+Given a joint distribution $p(X, Y)$ over an input $X$ and a target $Y$, the goal is to find a compressed representation $T$, formed by a stochastic mapping $p(T \mid X)$, that retains as much information about $Y$ as possible while using as few bits as possible to describe $X$.
+
+The method minimizes the Lagrangian
+
+$$
+\mathcal{L}[p(T \mid X)] = I(X;\, T) - \beta \, I(T;\, Y),
+$$
+
+where $I(\cdot\,;\,\cdot)$ denotes mutual information and $\beta \geq 0$ governs the compression–relevance trade-off.
+
+Because $Y - X - T$ forms a Markov chain (T is derived from X alone), the DPI implies
+
+$$
+I(T;\, Y) \leq I(X;\, Y),
+$$
+
+with equality if and only if $T$ is a **sufficient statistic** for $Y$ given $X$.
+
+The Blackwell ordering explains why no deterministic or random post-processing of $X$ can increase the mutual information with $Y$: any Markov kernel applied to $X$ is a garbling in Blackwell's sense, and the DPI is the mutual-information form of the coarse-graining theorem.
+
+In machine learning language the information bottleneck searches among all garblings of $X$ for the one that best preserves relevant information about $Y$ subject to a compression budget.
+
+In a deep neural network with input $X$ and target $Y$ and layers $X \to T_1 \to T_2 \to \cdots \to T_L \to \hat{Y}$, each layer's representation is a garbling of the previous one.
+
+The DPI then implies the chain of inequalities
+
+$$
+I(X;\, Y) \geq I(T_1;\, Y) \geq I(T_2;\, Y) \geq \cdots \geq I(T_L;\, Y),
+$$
+
+so successive layers can only lose, never gain, information about $Y$.
+
+This observation was placed at the center of the study of what deep networks learn by {cite}`shwartz_ziv_tishby2017`.
+
+{numref}`fig-blackwell-value-by-quality` already illustrates this: as experiment quality $\theta$ increases, every measure of informativeness rises monotonically. 
+
+The DPI says the same thing in reverse: garbling (decreasing $\theta$) can only contract these measures.
+
+### Summary of the DPI–Blackwell correspondence
+
+The table below collects the precise correspondence between Blackwell's framework and the data-processing and coarse-graining literature.
+
+| Blackwell / DeGroot | Data processing / coarse-graining |
+|:---|:---|
+| Garbling $\nu = \mu Q$ | Applying Markov kernel $\kappa$ to a pair $(P, Q) = (\mu_1, \mu_2)$ |
+| $\mu \geq \nu$ in Blackwell order | $D_f(\mu_1 \| \mu_2) \geq D_f(\nu_1 \| \nu_2)$ for every f-divergence |
+| Sufficiency ($Q$ discards nothing) | Equality in DPI; in binary models, one strictly convex $f$ already characterizes sufficiency |
+| DeGroot value $I(\mu; U_H)$ | Mutual information $I(\tilde{x}_\mu;\, \tilde{s})$ (Shannon DPI) |
+| Posterior spreads under $\mu$ vs $\nu$ | $D_f$ between rows larger under $\mu$ |
+| Blackwell theorem (economic $\Leftrightarrow$ garbling) | DPI for all $f$ $\Leftrightarrow$ single Markov kernel witnesses dominance |
+| Chentsov's uniqueness theorem | Fisher information is the unique coarse-graining-contracting metric |
+| Information bottleneck $I(T;Y) \leq I(X;Y)$ | DPI for mutual information applied to Markov chain $Y{-}X{-}T$ |
+
+
+## Relation to Bayesian likelihood-ratio learning
+
+The lecture {doc}`likelihood_bayes` is a dynamic two-state special case of the framework developed here.
+
+Let $S = \{s_1, s_2\}$ with $s_1 \leftrightarrow f$ and $s_2 \leftrightarrow g$, where $f$ and $g$ are the two candidate data-generating densities.
+
+Then a single observation is a Blackwell experiment with rows $f(\cdot)$ and $g(\cdot)$, and the history $w^t = (w_1, \ldots, w_t)$ defines a richer experiment $\mu_t$.
+
+Because one can always discard the last $t-s$ observations, $\mu_t$ Blackwell-dominates $\mu_s$ for every $t > s$.
+
+The likelihood-ratio process
+
+$$
+L(w^t) = \prod_{i=1}^t \frac{f(w_i)}{g(w_i)}
+$$
+
+is a sufficient statistic for $\mu_t$, and the posterior
+
+$$
+\pi_t = \Pr(s_1 \mid w^t)
+= \frac{\pi_0 L(w^t)}{\pi_0 L(w^t) + 1 - \pi_0}
+$$
+
+is Kihlstrom's standard experiment in this two-state setting.
+
+Its martingale property, $E[\pi_t] = \pi_0$, is exactly the mean-preservation result proved above for posterior distributions.
+
+Likewise, $\mu_t \geq \mu_s$ implies that the distribution of $\pi_t$ is a mean-preserving spread of the distribution of $\pi_s$, so additional data pushes beliefs farther toward $0$ and $1$ while lowering expected uncertainty under every concave uncertainty function.
+
+### Summary table
+
+The table below records the dictionary between the two lectures without repeating the earlier arguments.
+
+| Concept in {doc}`likelihood_bayes` | Concept in this lecture |
+|---|---|
+| States $\{f, g\}$ | State space $S = \{s_1, s_2\}$ |
+| Densities $f(\cdot)$, $g(\cdot)$ | Rows of experiment matrix $\mu$ |
+| Single draw $w_t$ | Blackwell experiment with continuous signal space |
+| History $w^t$ of $t$ IID draws | Richer experiment $\mu_t$ Blackwell-dominating $\mu_s$, $s < t$ |
+| Likelihood ratio $L(w^t)$ | Sufficient statistic for $\mu_t$ |
+| Prior $\pi_0$ | Prior $p \in P$ on the 1-simplex $[0,1]$ |
+| Posterior $\pi_t$ | Posterior on $P = [0,1]$ (Kihlstrom's standard experiment) |
+| Distribution of $\pi_t$ across histories | $\hat{\mu}^c$ (Kihlstrom's posterior distribution) |
+| Martingale property $E[\pi_t] = \pi_0$ | Mean preservation of $\hat{\mu}^c$ |
+| $\pi_t \to 0$ or $1$ almost surely | Posteriors spread to vertices (MPS in the limit) |
+| Mutual information $I(\mu_t; U_H)$ | DeGroot value of information |
+| More draws $\Rightarrow$ better for all decision makers | Blackwell ordering $\mu_t \geq \mu_s$ |
+| Garbling (discard last $t - s$ draws) | Stochastic transformation $Q$ with $\mu_s = \mu_t Q$ |
diff --git a/lectures/graph.txt b/lectures/graph.txt
deleted file mode 100644
index 9cb9e2e33..000000000
--- a/lectures/graph.txt
+++ /dev/null
@@ -1,100 +0,0 @@
-node0, node1 0.04, node8 11.11, node14 72.21
-node1, node46 1247.25, node6 20.59, node13 64.94
-node2, node66 54.18, node31 166.80, node45 1561.45
-node3, node20 133.65, node6 2.06, node11 42.43
-node4, node75 3706.67, node5 0.73, node7 1.02
-node5, node45 1382.97, node7 3.33, node11 34.54
-node6, node31 63.17, node9 0.72, node10 13.10
-node7, node50 478.14, node9 3.15, node10 5.85
-node8, node69 577.91, node11 7.45, node12 3.18
-node9, node70 2454.28, node13 4.42, node20 16.53
-node10, node89 5352.79, node12 1.87, node16 25.16
-node11, node94 4961.32, node18 37.55, node20 65.08
-node12, node84 3914.62, node24 34.32, node28 170.04
-node13, node60 2135.95, node38 236.33, node40 475.33
-node14, node67 1878.96, node16 2.70, node24 38.65
-node15, node91 3597.11, node17 1.01, node18 2.57
-node16, node36 392.92, node19 3.49, node38 278.71
-node17, node76 783.29, node22 24.78, node23 26.45
-node18, node91 3363.17, node23 16.23, node28 55.84
-node19, node26 20.09, node20 0.24, node28 70.54
-node20, node98 3523.33, node24 9.81, node33 145.80
-node21, node56 626.04, node28 36.65, node31 27.06
-node22, node72 1447.22, node39 136.32, node40 124.22
-node23, node52 336.73, node26 2.66, node33 22.37
-node24, node66 875.19, node26 1.80, node28 14.25
-node25, node70 1343.63, node32 36.58, node35 45.55
-node26, node47 135.78, node27 0.01, node42 122.00
-node27, node65 480.55, node35 48.10, node43 246.24
-node28, node82 2538.18, node34 21.79, node36 15.52
-node29, node64 635.52, node32 4.22, node33 12.61
-node30, node98 2616.03, node33 5.61, node35 13.95
-node31, node98 3350.98, node36 20.44, node44 125.88
-node32, node97 2613.92, node34 3.33, node35 1.46
-node33, node81 1854.73, node41 3.23, node47 111.54
-node34, node73 1075.38, node42 51.52, node48 129.45
-node35, node52 17.57, node41 2.09, node50 78.81
-node36, node71 1171.60, node54 101.08, node57 260.46
-node37, node75 269.97, node38 0.36, node46 80.49
-node38, node93 2767.85, node40 1.79, node42 8.78
-node39, node50 39.88, node40 0.95, node41 1.34
-node40, node75 548.68, node47 28.57, node54 53.46
-node41, node53 18.23, node46 0.28, node54 162.24
-node42, node59 141.86, node47 10.08, node72 437.49
-node43, node98 2984.83, node54 95.06, node60 116.23
-node44, node91 807.39, node46 1.56, node47 2.14
-node45, node58 79.93, node47 3.68, node49 15.51
-node46, node52 22.68, node57 27.50, node67 65.48
-node47, node50 2.82, node56 49.31, node61 172.64
-node48, node99 2564.12, node59 34.52, node60 66.44
-node49, node78 53.79, node50 0.51, node56 10.89
-node50, node85 251.76, node53 1.38, node55 20.10
-node51, node98 2110.67, node59 23.67, node60 73.79
-node52, node94 1471.80, node64 102.41, node66 123.03
-node53, node72 22.85, node56 4.33, node67 88.35
-node54, node88 967.59, node59 24.30, node73 238.61
-node55, node84 86.09, node57 2.13, node64 60.80
-node56, node76 197.03, node57 0.02, node61 11.06
-node57, node86 701.09, node58 0.46, node60 7.01
-node58, node83 556.70, node64 29.85, node65 34.32
-node59, node90 820.66, node60 0.72, node71 0.67
-node60, node76 48.03, node65 4.76, node67 1.63
-node61, node98 1057.59, node63 0.95, node64 4.88
-node62, node91 132.23, node64 2.94, node76 38.43
-node63, node66 4.43, node72 70.08, node75 56.34
-node64, node80 47.73, node65 0.30, node76 11.98
-node65, node94 594.93, node66 0.64, node73 33.23
-node66, node98 395.63, node68 2.66, node73 37.53
-node67, node82 153.53, node68 0.09, node70 0.98
-node68, node94 232.10, node70 3.35, node71 1.66
-node69, node99 247.80, node70 0.06, node73 8.99
-node70, node76 27.18, node72 1.50, node73 8.37
-node71, node89 104.50, node74 8.86, node91 284.64
-node72, node76 15.32, node84 102.77, node92 133.06
-node73, node83 52.22, node76 1.40, node90 243.00
-node74, node81 1.07, node76 0.52, node78 8.08
-node75, node92 68.53, node76 0.81, node77 1.19
-node76, node85 13.18, node77 0.45, node78 2.36
-node77, node80 8.94, node78 0.98, node86 64.32
-node78, node98 355.90, node81 2.59
-node79, node81 0.09, node85 1.45, node91 22.35
-node80, node92 121.87, node88 28.78, node98 264.34
-node81, node94 99.78, node89 39.52, node92 99.89
-node82, node91 47.44, node88 28.05, node93 11.99
-node83, node94 114.95, node86 8.75, node88 5.78
-node84, node89 19.14, node94 30.41, node98 121.05
-node85, node97 94.51, node87 2.66, node89 4.90
-node86, node97 85.09
-node87, node88 0.21, node91 11.14, node92 21.23
-node88, node93 1.31, node91 6.83, node98 6.12
-node89, node97 36.97, node99 82.12
-node90, node96 23.53, node94 10.47, node99 50.99
-node91, node97 22.17
-node92, node96 10.83, node97 11.24, node99 34.68
-node93, node94 0.19, node97 6.71, node99 32.77
-node94, node98 5.91, node96 2.03
-node95, node98 6.17, node99 0.27
-node96, node98 3.32, node97 0.43, node99 5.87
-node97, node98 0.30
-node98, node99 0.33
-node99,
diff --git a/lectures/likelihood_ratio_process.md b/lectures/likelihood_ratio_process.md
index 687f16f14..0bdb94d02 100644
--- a/lectures/likelihood_ratio_process.md
+++ b/lectures/likelihood_ratio_process.md
@@ -1725,7 +1725,7 @@ markov_results = analyze_markov_chains(P_f, P_g)
 Likelihood processes play an important role in Bayesian learning, as described in {doc}`likelihood_bayes`
 and as applied in {doc}`odu`.
 
-Likelihood ratio processes are central to  Lawrence Blume and David Easley's answer to their question "If you're so smart, why aren't you rich?" {cite}`blume2006if`, the subject of the lecture{doc}`likelihood_ratio_process_2`.
+Likelihood ratio processes are central to  Lawrence Blume and David Easley's answer to their question "If you're so smart, why aren't you rich?" {cite}`Blume_Easley2006`, the subject of the lecture{doc}`likelihood_ratio_process_2`.
 
 Likelihood ratio processes also appear  in {doc}`advanced:additive_functionals`, which contains another illustration of the **peculiar property** of likelihood ratio processes described above.
 
diff --git a/lectures/likelihood_ratio_process_2.md b/lectures/likelihood_ratio_process_2.md
index cf37fb979..29b966e87 100644
--- a/lectures/likelihood_ratio_process_2.md
+++ b/lectures/likelihood_ratio_process_2.md
@@ -29,7 +29,7 @@ kernelspec:
 ## Overview
 
 A likelihood ratio process lies behind Lawrence Blume and David Easley's answer to their question
-"If you're so smart, why aren't you rich?" {cite}`blume2006if`.  
+"If you're so smart, why aren't you rich?" {cite}`Blume_Easley2006`.  
 
 Blume and Easley constructed formal models to study how differences of opinions about probabilities governing risky income processes would influence outcomes and be reflected in prices of stocks, bonds, and insurance policies that individuals use to share and hedge risks.
 
diff --git a/lectures/merging_of_opinions.md b/lectures/merging_of_opinions.md
new file mode 100644
index 000000000..a8812675e
--- /dev/null
+++ b/lectures/merging_of_opinions.md
@@ -0,0 +1,1309 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.16.4
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+
+(merging_of_opinions)=
+```{raw} jupyter
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# Merging of Opinions: The Blackwell–Dubins Theorem
+
+```{contents} Contents
+:depth: 2
+```
+
+## Overview
+
+This lecture studies the merging-of-opinions theorem of {cite:t}`blackwell1962`.
+
+The theorem asks a simple question:
+
+> If two agents hold different prior beliefs about a stochastic process but observe the same stream of data indefinitely, will their probability assessments eventually converge?
+
+The answer is yes under an absolute-continuity condition.
+
+If $Q \ll P$ (that is, $P$ dominates $Q$), then the conditional distributions under $P$ and $Q$ over the entire future path merge in total variation, $Q$-almost surely.
+
+If in addition $P \ll Q$ (so that $P \sim Q$), then the same conclusion holds under both agents' probabilities.
+
+This result connects to several other ideas:
+
+- Bayesian consistency: posterior predictions approach the truth when the prior lies in the right absolute-continuity class ({doc}`likelihood_bayes`).
+- Agreement results: common data can eliminate disagreement even when agents start from different priors ({cite:t}`aumann1976`).
+- Kakutani's dichotomy: for product measures, equivalence versus singularity can be read from a Hellinger criterion.
+
+We develop the theory in discrete time and then sketch the continuous-time analogue.
+
+Throughout, we use the Beta–Bernoulli model as a running example.
+
+Two agents observe the same stream of coin flips but start from different priors over the coin's bias.
+
+Let us start with some imports.
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy.stats import beta as beta_dist
+from scipy.special import betaln
+```
+
+
+## Probability measures on sequence spaces
+
+### The sequence space and its filtration
+
+Let $(S, \mathscr{S})$ be a standard Borel space (i.e., a measurable space isomorphic to a Borel subset of a complete separable metric space), called the signal space.
+
+The standard Borel assumption guarantees the existence of regular conditional distributions, which the theorem requires.
+
+Set $\Omega = S^{\mathbb{N}}$, the set of all infinite sequences
+$\omega = (x_1, x_2, \ldots)$ with $x_n \in S$, equipped with the product
+$\sigma$-algebra $\mathscr{F} = \mathscr{S}^{\otimes \mathbb{N}}$.
+
+For each $n \geq 1$, define the *finite-horizon* $\sigma$-algebra
+
+$$
+\mathscr{F}_n = \sigma(x_1, \ldots, x_n),
+$$
+
+so $\mathscr{F}_1 \subseteq \mathscr{F}_2 \subseteq \cdots \subseteq \mathscr{F}$.
+
+Define the **tail $\sigma$-algebra** $\mathscr{F}_\infty = \sigma\!\left(\bigcup_{n \geq 1} \mathscr{F}_n\right)$, which encodes everything that can eventually be learned.
+
+The collection $\{\mathscr{F}_n\}_{n \geq 1}$ is the **natural filtration**
+generated by the observation process; $\mathscr{F}_n$ encodes everything
+that can be learned from the first $n$ data points.
+
+Let $P$ and $Q$ denote two probability measures on $(\Omega, \mathscr{F})$.
+
+Write $P_n = P|_{\mathscr{F}_n}$ and $Q_n = Q|_{\mathscr{F}_n}$ for their
+restrictions to the history up to time $n$.
+
+### Absolute continuity
+
+```{prf:definition} Absolute Continuity
+:label: absolute_continuity
+
+$P$ is **absolutely continuous** with respect to $Q$, written $P \ll Q$, if
+$Q(A) = 0$ implies $P(A) = 0$ for every $A \in \mathscr{F}$.
+
+They are **mutually absolutely continuous**, or **equivalent**, written $P \sim Q$,
+if both $P \ll Q$ and $Q \ll P$.
+
+$P$ is **locally absolutely continuous** with respect to $Q$ if $P_n \ll Q_n$
+for every $n \geq 1$.
+```
+
+Global absolute continuity $P \ll Q$ implies local absolute continuity, but
+not conversely.
+
+Mutual absolute continuity means the two agents agree on which events are *possible*.
+
+They can disagree about probabilities, but neither agent rules out an event the other deems possible.
+
+### Total variation distance
+
+```{prf:definition} Total Variation Distance
+:label: total_variation_distance
+
+For two probability measures $\mu$ and $\nu$ on $(E, \mathscr{E})$,
+
+$$
+\|\mu - \nu\|_{\mathrm{TV}}
+= \sup_{A \in \mathscr{E}} |\mu(A) - \nu(A)|
+= \frac{1}{2} \int_E \left|\frac{d\mu}{d\lambda} - \frac{d\nu}{d\lambda}\right| d\lambda,
+$$
+
+where $\lambda$ is any **dominating measure**, meaning $\mu \ll \lambda$ and $\nu \ll \lambda$ (for example, $\lambda = \mu + \nu$).
+
+Equivalently, $\|\mu - \nu\|_{\mathrm{TV}} \in [0,1]$, with 0 meaning $\mu = \nu$ and 1 meaning $\mu \perp \nu$ (mutual singularity).
+```
+
+When $\mu \ll \nu$ with $f = d\mu/d\nu$,
+
+$$
+\|\mu - \nu\|_{\mathrm{TV}} = \mathbb{E}_\nu[(f-1)^+] = 1 - \mathbb{E}_\nu[\min(f,1)].
+$$
+
+```{exercise}
+:label: tv_derivation
+
+Show the identity above.
+
+*Hint:* Start from $\|\mu - \nu\|_{\mathrm{TV}} = \tfrac{1}{2}\,\mathbb{E}_\nu[|f - 1|]$ (which follows from taking $\nu$ as the dominating measure) and use the fact that $\mathbb{E}_\nu[f] = 1$.
+```
+
+```{solution} tv_derivation
+:class: dropdown
+
+Since $\mu \ll \nu$, we can use $\nu$ as the dominating measure, so $d\mu/d\nu = f$ and $d\nu/d\nu = 1$, giving
+
+$$
+\|\mu - \nu\|_{\mathrm{TV}} = \tfrac{1}{2}\,\mathbb{E}_\nu[|f - 1|].
+$$
+
+Write $|f-1| = (f-1)^+ + (1-f)^+$.
+
+Since $\mu$ is a probability measure, $\mathbb{E}_\nu[f] = 1$, so the two parts contribute equally: $\mathbb{E}_\nu[(f-1)^+] = \mathbb{E}_\nu[(1-f)^+]$.
+
+Therefore $\tfrac{1}{2}\,\mathbb{E}_\nu[|f-1|] = \mathbb{E}_\nu[(f-1)^+]$.
+
+Next, note that $(f-1)^+ = f - \min(f,1)$, so $\mathbb{E}_\nu[(f-1)^+] = \mathbb{E}_\nu[f] - \mathbb{E}_\nu[\min(f,1)] = 1 - \mathbb{E}_\nu[\min(f,1)]$.
+```
+
+Total variation is one of the strongest standard notions of distance between probability measures.
+
+If two measures are close in total variation, then their probabilities of every event are close.
+
+### The merging question
+
+The Blackwell–Dubins theorem studies the conditional distribution of the *future* given the *past*.
+
+At time $n$, after observing $(x_1,\ldots,x_n)$, each agent forms a conditional distribution over all future events:
+
+$$
+P(\,\cdot\,|\,\mathscr{F}_n)(\omega), \qquad
+Q(\,\cdot\,|\,\mathscr{F}_n)(\omega).
+$$
+
+These are probability measures on the whole future path, not just the next observation.
+
+The merging question asks whether
+
+$$
+d_n \;:=\; \bigl\|P(\,\cdot\,|\,\mathscr{F}_n) - Q(\,\cdot\,|\,\mathscr{F}_n)\bigr\|_{\mathrm{TV}}
+\;\longrightarrow\; 0
+$$
+
+almost surely as $n \to \infty$.
+
+
+## The likelihood-ratio martingale
+
+Our main tool is the Radon–Nikodym derivative process.
+
+### The likelihood ratio
+
+Since $Q \ll P$ implies $Q_n \ll P_n$ for every $n$, the Radon–Nikodym
+theorem guarantees the existence of the likelihood ratio
+
+$$
+Z_n = \frac{dQ_n}{dP_n}, \qquad Z_n \geq 0 \;\; P\text{-a.s.},
+\qquad \mathbb{E}_P[Z_n] = 1.
+$$
+
+The key structural property is that global absolute continuity $Q \ll P$
+implies the existence of an overall Radon–Nikodym derivative $Z = dQ/dP$
+on all of $(\Omega, \mathscr{F})$, and
+
+$$
+Z_n = \mathbb{E}_P[Z \,|\, \mathscr{F}_n] \qquad P\text{-a.s.}
+$$
+
+That is, $\{Z_n, \mathscr{F}_n\}_{n \geq 1}$ is a non-negative, uniformly
+integrable $P$-martingale.
+
+```{prf:lemma} Martingale Convergence
+:label: martingale_convergence
+
+The likelihood-ratio process $\{Z_n\}$ satisfies:
+
+1. $Z_n \to Z_\infty$ $P$-almost surely as $n \to \infty$.
+2. $Z_\infty = \mathbb{E}_P[Z \,|\, \mathscr{F}_\infty]$ $P$-a.s.
+3. $Z_n \to Z_\infty$ in $L^1(P)$: $\;\mathbb{E}_P[|Z_n - Z_\infty|] \to 0$.
+
+*Proof sketch.*  Non-negativity and the martingale property give boundedness
+in $L^1(P)$.
+
+Then almost-sure convergence follows from Doob's martingale
+convergence theorem {cite:t}`doob1953`.
+
+Uniform integrability (which follows
+from $Z \in L^1(P)$ via the conditional Jensen inequality) upgrades this to
+$L^1(P)$ convergence. $\square$
+```
+
+### Connecting conditional measures to the likelihood ratio
+
+The following identity connects the likelihood ratio to the conditional distributions.
+
+On the set $\{Z_n > 0\}$, the Radon–Nikodym derivative of
+$Q(\,\cdot\,|\,\mathscr{F}_n)$ with respect to $P(\,\cdot\,|\,\mathscr{F}_n)$
+is
+
+$$
+\frac{d\,Q(\,\cdot\,|\,\mathscr{F}_n)}{d\,P(\,\cdot\,|\,\mathscr{F}_n)}
+= \frac{Z_\infty}{Z_n}
+\qquad P\text{-a.s. on } \{Z_n > 0\}.
+$$
+
+Applying the total-variation formula with $f = Z_\infty / Z_n$ then gives
+
+$$
+d_n
+= \mathbb{E}_{P(\cdot|\mathscr{F}_n)}\!\left[\left(\frac{Z_\infty}{Z_n} - 1\right)^{\!+}\right]
+= 1 - \mathbb{E}_{P(\cdot|\mathscr{F}_n)}\!\left[\min\!\left(\frac{Z_\infty}{Z_n},\,1\right)\right].
+$$
+
+Multiplying through by $Z_n$ and taking the $P$-expectation (then using $\mathbb{E}_P[Z_n \, g(\mathscr{F}_n)] = \mathbb{E}_Q[g(\mathscr{F}_n)]$ for $\mathscr{F}_n$-measurable $g$):
+
+$$
+2\,\mathbb{E}_Q[d_n] \;=\; \mathbb{E}_P[|Z_\infty - Z_n|],
+$$
+
+So the $L^1(P)$ convergence of the martingale controls how fast the total variation distance goes to zero.
+
+
+## The Blackwell–Dubins theorem
+
+```{prf:theorem} Blackwell–Dubins (1962)
+:label: blackwell_dubins
+
+Let $P$ and $Q$ be probability measures on $(\Omega, \mathscr{F})$ with
+$Q \ll P$.
+
+Define
+
+$$
+d_n = \bigl\|P(\,\cdot\,|\,\mathscr{F}_n) - Q(\,\cdot\,|\,\mathscr{F}_n)\bigr\|_{\mathrm{TV}}.
+$$
+
+Then $d_n \to 0$ $Q$-almost surely.
+```
+
+The proof has three steps.
+
+Step 1. Representation of $d_n$ via $Z_n$.
+
+As shown above, $d_n$ can be written in terms of $Z_\infty / Z_n$, where $Z_n = \mathbb{E}_P[Z \,|\, \mathscr{F}_n]$ and $Z = dQ/dP$.
+
+This reduces the problem to a statement about one martingale under $P$.
+
+Step 2. $\{d_n\}$ is a non-negative supermartingale.
+
+Conditioning on more information reduces distinguishability on average.
+
+Formally, because
+$P(\,\cdot\,|\,\mathscr{F}_n) = \mathbb{E}[P(\,\cdot\,|\,\mathscr{F}_{n+1})\,|\,\mathscr{F}_n]$
+and total variation is convex,
+
+$$
+\mathbb{E}_Q[d_{n+1}\,|\,\mathscr{F}_n] \leq d_n \qquad Q\text{-a.s.}
+$$
+
+So $\{d_n, \mathscr{F}_n\}$ is a non-negative $Q$-supermartingale in $[0,1]$.
+
+By Doob's theorem, $d_n \to d_\infty$ $Q$-almost surely for some $[0,1]$-valued random variable $d_\infty$.
+
+Step 3. The almost-sure limit is zero.
+
+From Step 1 and the $L^1$ bound:
+
+$$
+\mathbb{E}_Q[d_n] = \tfrac{1}{2}\,\mathbb{E}_P[|Z_\infty - Z_n|] \to 0.
+$$
+
+The right-hand side vanishes by $L^1(P)$ convergence of the martingale.
+
+Hence $d_n \to 0$ in $L^1(Q)$ and therefore in $Q$-probability.
+
+Since $d_n$ already converges $Q$-almost surely, its limit must satisfy $d_\infty = 0$ $Q$-a.s. $\square$
+
+```{prf:remark} One-Sided vs. Mutual Absolute Continuity
+:label: one_sided_vs_mutual
+
+The theorem requires only $Q \ll P$, not $P \ll Q$.
+
+One-sided absolute continuity $Q \ll P$ gives merging $Q$-almost surely.  
+
+Since $Q \ll P$ means every $P$-null set is also $Q$-null, $Q$-a.s. convergence does *not* automatically imply $P$-a.s. convergence.
+
+To conclude that $d_n \to 0$ under *both* agents' measures, one needs mutual absolute continuity $P \sim Q$.  
+
+With $P \ll Q$ added, the proof can be run with the roles of $P$ and $Q$ swapped, yielding $d_n \to 0$ $P$-a.s. as well.
+```
+
+```{prf:remark} Sharpness
+:label: sharpness
+
+Absolute continuity matters.
+
+When $P$ and $Q$ are singular, merging can fail completely.
+
+The point-mass example below has $d_n = 1$ for every $n$.
+
+For product measures, Kakutani's theorem later gives a sharp equivalence-versus-singularity dichotomy.
+```
+
+
+## The Beta–Bernoulli model
+
+Before turning to Python, we introduce the main example used throughout
+the simulations.
+
+### Model
+
+Suppose the data stream $(x_1, x_2, \ldots)$ consists of IID Bernoulli
+draws with unknown probability $p^* \in (0,1)$.  
+
+Agent $i$ has a Beta prior:
+
+$$
+p \sim \mathrm{Beta}(\alpha_i, \beta_i), \qquad i = 1, 2.
+$$
+
+After observing $n$ draws with $k$ successes, Bayes' rule yields the
+posterior
+
+$$
+p \,|\, x^n \;\sim\; \mathrm{Beta}(\alpha_i + k,\; \beta_i + n - k),
+$$
+
+and the one-step-ahead predictive probability is
+
+$$
+\hat{p}_i^n = \mathbb{E}[p\,|\,x^n] = \frac{\alpha_i + k}{\alpha_i + \beta_i + n}.
+$$
+
+By the strong law of large numbers, $k/n \to p^*$ almost surely, so both
+$\hat{p}_1^n$ and $\hat{p}_2^n$ converge to $p^*$ regardless of the
+agents' initial priors $(\alpha_i, \beta_i)$.
+
+### The marginal likelihood and likelihood ratio
+
+For each fixed value of $p \in (0,1)$, let $P_p$ denote the IID Bernoulli$(p)$
+probability law on infinite sequences.
+
+Agent $i$ does not know $p$.
+
+Instead, agent $i$ places the prior density $\pi_i$ on $p$, which induces a
+probability measure $P_i$ on data sequences via
+
+$$
+P_i(A) = \int_0^1 P_p(A)\,\pi_i(p)\,dp
+\qquad \text{for every event } A.
+$$
+
+So $P_i$ is the agent's marginal probability measure over
+histories after averaging over uncertainty about $p$.
+
+In particular, if $x^n$ is an exact observed history with $k$ successes, then
+$P_i(x^n)$ means the probability that agent $i$ assigns to that history under
+this mixture measure.
+
+To compute it, start from the Beta density
+
+$$
+\pi_i(p)
+= \frac{p^{\alpha_i - 1} (1-p)^{\beta_i - 1}}{B(\alpha_i, \beta_i)},
+\qquad 0 < p < 1.
+$$
+
+Given $p$, the probability of that ordered history is $p^k (1-p)^{n-k}$.
+
+Therefore
+
+$$
+\begin{aligned}
+P_i(x^n)
+&= \int_0^1 p^k (1-p)^{n-k} \pi_i(p)\, dp \\
+&= \frac{1}{B(\alpha_i, \beta_i)}
+\int_0^1 p^{\alpha_i + k - 1} (1-p)^{\beta_i + n - k - 1}\, dp \\
+&= \frac{B(\alpha_i + k,\; \beta_i + n - k)}{B(\alpha_i,\, \beta_i)}.
+\end{aligned}
+$$
+
+where $B(a,b) = \Gamma(a)\Gamma(b)/\Gamma(a+b)$ is the beta function.
+
+This expression is the probability of the ordered history $x^n$.
+
+It depends on the data only through the count $k$, so histories with the same number of successes receive the same probability.
+
+The likelihood ratio at time $n$ is therefore
+
+$$
+Z_n = \frac{P_{1,n}(x^n)}{P_{2,n}(x^n)}
+= \frac{B(\alpha_2,\, \beta_2)}{B(\alpha_1,\, \beta_1)}
+\cdot
+\frac{B(\alpha_1 + k,\, \beta_1 + n - k)}{B(\alpha_2 + k,\, \beta_2 + n - k)}.
+$$
+
+This is a martingale under $P_2$ (agent 2's probability) and converges
+almost surely to a finite positive limit $Z_\infty$, reflecting the fact
+that $P_1 \sim P_2$ for any Beta priors with positive parameters.
+
+### The exact Blackwell–Dubins distance
+
+For the Beta–Bernoulli model, there is a clean formula for $d_n$.
+
+By de Finetti's theorem, each agent's conditional distribution of the
+*future infinite sequence* given the past is a mixture of IID Bernoulli$(p)$
+processes, where $p$ is drawn from the posterior Beta distribution.
+
+Since the Bernoulli$(p)^{\infty}$ measures for different $p$ are mutually
+singular (the empirical frequency identifies $p$ exactly), the TV distance
+between the two conditional distributions over the future equals the TV
+distance between the two posterior distributions over the parameter $p$.
+
+The TV distance is
+
+$$
+d_n
+= \bigl\|\mathrm{Beta}(\alpha_1 + k_n,\,\beta_1 + n - k_n)
+- \mathrm{Beta}(\alpha_2 + k_n,\,\beta_2 + n - k_n)\bigr\|_{\mathrm{TV}}.
+$$
+
+As $k_n/n \to p^*$ and $n \to \infty$, both posterior Betas concentrate around $p^*$ with variance of order $1/n$, so $d_n \to 0$.
+
+The following code implements the Beta–Bernoulli updating, predictive probabilities, TV distance, and likelihood-ratio computations described above.
+
+```{code-cell} ipython3
+def beta_bernoulli_update(data, a0, b0):
+    """
+    Sequential Beta-Bernoulli Bayesian updating.
+    """
+    n = len(data)
+    cum_k = np.concatenate([[0], np.cumsum(data)])   # cumulative successes
+    ns    = np.arange(n + 1)                          # 0, 1, ..., n
+    a_post = a0 + cum_k
+    b_post = b0 + (ns - cum_k)
+    return a_post, b_post
+
+
+def predictive_prob(a_post, b_post):
+    """One-step-ahead predictive probability P(X=1 | data)."""
+    return a_post / (a_post + b_post)
+
+
+def tv_distance_beta(a1, b1, a2, b2, n_grid=2000):
+    """
+    TV distance between Beta(a1,b1) and Beta(a2,b2) via grid quadrature.
+    Uses a fine grid on (0,1).
+    """
+    x  = np.linspace(1e-8, 1 - 1e-8, n_grid)
+    dx = x[1] - x[0]
+    p1 = beta_dist.pdf(x, a1, b1)
+    p2 = beta_dist.pdf(x, a2, b2)
+    return 0.5 * np.sum(np.abs(p1 - p2)) * dx
+
+
+def log_likelihood_ratio(data, a1, b1, a2, b2):
+    """
+    Log likelihood ratio log Z_n = log P1_n(data) - log P2_n(data)
+    for every prefix of `data`.
+
+    Returns an array of length len(data) + 1, starting at 0 (before data).
+    """
+    a1p, b1p = beta_bernoulli_update(data, a1, b1)
+    a2p, b2p = beta_bernoulli_update(data, a2, b2)
+    log_P1 = betaln(a1p, b1p) - betaln(a1, b1)
+    log_P2 = betaln(a2p, b2p) - betaln(a2, b2)
+    return log_P1 - log_P2
+
+
+def run_simulation(p_true, a1, b1, a2, b2, n_steps, seed=0):
+    """
+    Simulate one realisation of the merging experiment.
+
+    Returns a dict with arrays of length n_steps + 1 (index 0 = prior).
+    """
+    rng  = np.random.default_rng(seed)
+    data = rng.binomial(1, p_true, n_steps)
+
+    a1p, b1p = beta_bernoulli_update(data, a1, b1)
+    a2p, b2p = beta_bernoulli_update(data, a2, b2)
+
+    pred1    = predictive_prob(a1p, b1p)
+    pred2    = predictive_prob(a2p, b2p)
+    tv_1step = np.abs(pred1 - pred2)
+
+    # TV between posterior Betas; in this model this equals d_n
+    tv_beta = np.array([
+        tv_distance_beta(a1p[i], b1p[i], a2p[i], b2p[i])
+        for i in range(n_steps + 1)
+    ])
+
+    log_Z = log_likelihood_ratio(data, a1, b1, a2, b2)
+
+    return dict(data=data, pred1=pred1, pred2=pred2,
+                tv_1step=tv_1step, tv_beta=tv_beta, log_Z=log_Z)
+```
+
+### Simulation
+
+We choose two agents with very different beliefs about the bias of a coin whose true probability of heads is $p^* = 0.65$.
+
+- Agent 1 (skeptic): prior $\mathrm{Beta}(1, 8)$, so
+  $\hat{p}_1^0 = 1/9 \approx 0.11$.
+- Agent 2 (optimist): prior $\mathrm{Beta}(8, 1)$, so
+  $\hat{p}_2^0 = 8/9 \approx 0.89$.
+
+Both priors are supported on all of $(0,1)$, so $P_1 \sim P_2$.
+
+Blackwell–Dubins guarantees merging.
+
+The figure below shows what that merging looks like.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: |
+      Merging in the Beta–Bernoulli example.
+      The four panels show posterior predictive means, the total-variation distance $d_n$, the log likelihood ratio $\log Z_n$, and posterior densities at selected horizons.
+    name: fig-merging-of-opinions-beta-bernoulli
+---
+p_true = 0.65
+a1, b1 = 1.0, 8.0    # skeptic
+a2, b2 = 8.0, 1.0    # optimist
+n_steps = 600
+
+sim = run_simulation(p_true, a1, b1, a2, b2, n_steps, seed=7)
+steps = np.arange(n_steps + 1)
+
+fig, axes = plt.subplots(2, 2, figsize=(11, 7))
+ax = axes[0, 0]
+ax.plot(steps, sim['pred1'], color='steelblue', lw=2,
+        label=r'Agent 1 $\hat p_1^n$ (prior: skeptic)')
+ax.plot(steps, sim['pred2'], color='firebrick', lw=2,
+        label=r'Agent 2 $\hat p_2^n$ (prior: optimist)')
+ax.axhline(p_true, color='black', lw=1.0, ls='--',
+           label=f'Truth $p^*={p_true}$')
+ax.set_xlabel('observations $n$')
+ax.set_ylabel('predictive probability')
+ax.set_title('(a) posterior predictive means')
+ax.legend(fontsize=8)
+ax.set_ylim(0, 1)
+
+ax = axes[0, 1]
+ax.semilogy(steps, sim['tv_beta'] + 1e-10, color='mediumpurple', lw=2)
+ax.set_xlabel('observations $n$')
+ax.set_ylabel(
+    r'$d_n = \|P(\cdot|\mathscr{F}_n)'
+    r' - Q(\cdot|\mathscr{F}_n)\|_{\mathrm{TV}}$'
+)
+ax.set_title(r'(b) total-variation distance $d_n$')
+ax.set_ylim(bottom=1e-4)
+
+ax = axes[1, 0]
+ax.plot(steps, sim['log_Z'], color='darkorange', lw=2)
+ax.axhline(0, color='black', lw=0.8, ls=':')
+ax.set_xlabel('observations $n$')
+ax.set_ylabel(r'$\log Z_n$')
+ax.set_title(r'(c) log likelihood ratio')
+
+ax = axes[1, 1]
+xs = np.linspace(0.01, 0.99, 500)
+epochs = [0, 20, 100, n_steps]
+colors = plt.cm.viridis(np.linspace(0.2, 0.85, len(epochs)))
+
+for epoch, col in zip(epochs, colors):
+    k_e = int(np.sum(sim['data'][:epoch]))
+    pdf1 = beta_dist.pdf(xs, a1 + k_e, b1 + epoch - k_e)
+    pdf2 = beta_dist.pdf(xs, a2 + k_e, b2 + epoch - k_e)
+    ax.plot(xs, pdf1, color=col, lw=2, ls='-')
+    ax.plot(xs, pdf2, color=col, lw=2, ls='--')
+
+ax.axvline(p_true, color='black', lw=1.0, ls=':', label=f'$p^*={p_true}$')
+ax.set_xlabel('$p$')
+ax.set_ylabel('posterior density')
+ax.set_title('(d) posterior densities')
+
+from matplotlib.lines import Line2D
+handles = [
+    Line2D([0], [0], color='black', lw=2, label='agent 1'),
+    Line2D([0], [0], color='black', lw=2, ls='--', label='agent 2'),
+]
+for epoch, col in zip(epochs, colors):
+    handles.append(Line2D([0], [0], color=col, lw=2, label=f'$n={epoch}$'))
+handles.append(
+    Line2D([0], [0], color='black', lw=1.0, ls=':', label=f'$p^*={p_true}$')
+)
+ax.legend(handles=handles, fontsize=8)
+ax.set_ylim(bottom=0)
+
+plt.tight_layout()
+plt.show()
+```
+
+The four panels show:
+
+- Panel (a): Starting from $\hat{p}_1^0 \approx 0.11$ and
+  $\hat{p}_2^0 \approx 0.89$, both agents' predictive probabilities
+  converge to $p^* = 0.65$.
+- Panel (b): The total-variation distance $d_n$ decays to zero on a
+  logarithmic scale, consistent with the theorem.
+- Panel (c): The log likelihood ratio $\log Z_n$ converges to a finite
+  value, which is consistent with mutual absolute continuity in this example.
+- Panel (d): The posterior Beta densities for the two agents start far
+  apart (one near 0, one near 1) and progressively concentrate to the same
+  distribution centred on the truth.
+
+
+### Almost-sure convergence across many paths
+
+To illustrate the almost-sure character of the theorem, we run many independent replications.
+
+The theorem concerns almost every path under the reference measure, not just averages across paths.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: |
+      Almost-sure merging across many sample paths.
+      The left panel plots the total-variation distance and the right panel plots the log likelihood ratio $\log Z_n$.
+    name: fig-merging-of-opinions-many-paths
+---
+N_paths = 80
+n_steps = 500
+
+fig, axes = plt.subplots(1, 2, figsize=(11, 4))
+
+ax_tv  = axes[0]
+ax_log = axes[1]
+
+tv_all   = np.empty((N_paths, n_steps + 1))
+logZ_all = np.empty((N_paths, n_steps + 1))
+steps    = np.arange(n_steps + 1)
+
+for i in range(N_paths):
+    s = run_simulation(p_true, a1, b1, a2, b2, n_steps, seed=i)
+    tv_all[i]   = s['tv_beta']
+    logZ_all[i] = s['log_Z']
+
+for i in range(N_paths):
+    ax_tv.semilogy(steps, tv_all[i] + 1e-10, color='steelblue',
+                   lw=0.8, alpha=0.3)
+ax_tv.semilogy(steps, tv_all.mean(axis=0) + 1e-10,
+               color='black', lw=2, label='mean across paths')
+ax_tv.set_xlabel('observations $n$')
+ax_tv.set_ylabel(r'$d_n$ (log scale)')
+ax_tv.legend()
+
+for i in range(N_paths):
+    ax_log.plot(steps, logZ_all[i], color='firebrick',
+                lw=0.8, alpha=0.3)
+ax_log.plot(steps, logZ_all.mean(axis=0),
+            color='black', lw=2, label='mean across paths')
+ax_log.axhline(0, color='gray', lw=0.8, ls=':')
+ax_log.set_xlabel('observations $n$')
+ax_log.set_ylabel(r'$\log Z_n$')
+ax_log.legend()
+
+plt.tight_layout()
+plt.show()
+
+# Finite-horizon summary
+frac_below = np.mean(tv_all[:, -1] < 0.30)
+mean_final = tv_all[:, -1].mean()
+print(f"Fraction of paths with d_n < 0.30 at n = {n_steps}: {frac_below:.2f}")
+print(f"Mean distance at n = {n_steps}: {mean_final:.3f}")
+```
+
+At this finite horizon, the distances have moved down substantially from their initial levels, but they are not yet close to zero.
+
+That is still consistent with the theorem, because almost-sure convergence is an asymptotic statement.
+
+
+### The supermartingale property of $d_n$
+
+The proof relies on $\{d_n\}$ being a non-negative supermartingale.
+
+We can illustrate this numerically by looking at average increments across many paths.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: |
+      An illustration of the supermartingale property.
+      The plots show average increments of $d_n$ and their cumulative sum across many simulated paths.
+    name: fig-merging-of-opinions-supermartingale
+---
+diffs = np.diff(tv_all, axis=1)          # shape (N_paths, n_steps)
+mean_diffs = diffs.mean(axis=0)          # average increment at each step
+cum_sum   = np.cumsum(mean_diffs)        # cumulative average change
+
+fig, axes = plt.subplots(1, 2, figsize=(10, 4))
+
+ax = axes[0]
+ax.plot(mean_diffs[:200], color='purple', lw=2)
+ax.axhline(0, color='black', lw=0.8, ls='--')
+ax.fill_between(range(200), mean_diffs[:200], 0,
+                where=(mean_diffs[:200] < 0), alpha=0.25,
+                color='purple', label='negative increments')
+ax.fill_between(range(200), mean_diffs[:200], 0,
+                where=(mean_diffs[:200] > 0), alpha=0.25,
+                color='red', label='positive increments')
+ax.set_xlabel('observations $n$')
+ax.set_ylabel(r'$\mathbb{E}[d_{n+1} - d_n]$')
+ax.legend(fontsize=8)
+
+ax = axes[1]
+ax.plot(cum_sum[:200], color='darkorange', lw=2)
+ax.axhline(0, color='black', lw=0.8, ls='--')
+ax.set_xlabel('observations $n$')
+ax.set_ylabel(r'cumulative average change in $d_n$')
+
+plt.tight_layout()
+plt.show()
+
+frac_decrease = np.mean(mean_diffs < 0)
+print(f"Fraction of steps with average decrement: {frac_decrease:.2%}")
+```
+
+The average increment is negative at most steps, and the cumulative drift is downward.
+
+This is only an illustration, not a proof, because it uses unconditional averages rather than the full conditional expectation in the theorem.
+
+
+## Failure of merging: mutual singularity
+
+What happens when the hypothesis $Q \ll P$ fails?
+
+The singular case is the cleanest counterexample.
+
+### Point-mass priors
+
+Suppose both agents hold degenerate (point-mass) priors:
+
+- Agent P: certain that $p = p_P = 0.30$.
+- Agent Q: certain that $p = p_Q = 0.75$.
+
+Since $P$ charges only sequences whose empirical frequency converges to $0.30$, and $Q$ charges only sequences whose empirical frequency converges to $0.75$, the two measures are mutually singular: $P \perp Q$.
+
+The conditional distributions do not update, because both agents are already certain of their model.
+
+For the theorem's object, namely the conditional law of the entire future path,
+
+$$
+\|P(\,\cdot\,|\,\mathscr{F}_n) - Q(\,\cdot\,|\,\mathscr{F}_n)\|_{\mathrm{TV}}
+= \|P - Q\|_{\mathrm{TV}} = 1
+\quad \text{for all } n.
+$$
+
+This equality holds because the infinite-product Bernoulli measures with distinct success probabilities are singular.
+
+If we look only one step ahead, the predictive distance is $|p_P - p_Q| = 0.45$.
+
+That is smaller than one, but it is not the quantity that appears in Blackwell–Dubins.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: |
+      Failure of merging under singular priors.
+      The full future-path distance stays at one,
+      while the one-step predictive gap stays
+      at $|p_P - p_Q|$.
+    name: fig-merging-of-opinions-singular-priors
+---
+p_P = 0.30
+p_Q = 0.75
+n_steps = 500
+
+tv_singular_full = np.ones(n_steps + 1)
+tv_singular_1step = np.full(n_steps + 1, np.abs(p_P - p_Q))
+
+sim_abs_cont = run_simulation(
+    p_Q, 1.0, 8.0, 8.0, 1.0, n_steps, seed=1
+)
+
+fig, ax = plt.subplots(figsize=(8, 4))
+ax.plot(np.arange(n_steps + 1), tv_singular_full,
+        color='firebrick', lw=2,
+        label=r'singular: full-path $d_n = 1$')
+ax.plot(np.arange(n_steps + 1), tv_singular_1step,
+        color='gray', lw=2, ls=':',
+        label=r'one-step gap $= |p_P - p_Q|$')
+ax.plot(np.arange(n_steps + 1),
+        sim_abs_cont['tv_beta'],
+        color='steelblue', lw=2,
+        label=(r'$\mathrm{Beta}(1,8)$ vs'
+               r' $\mathrm{Beta}(8,1)$'))
+ax.set_xlabel('observations $n$')
+ax.set_ylabel(r'$d_n$')
+ax.legend(fontsize=8)
+ax.set_ylim(0, 1.05)
+
+plt.tight_layout()
+plt.show()
+```
+
+The contrast is sharp.
+
+With mutually absolutely continuous priors, $d_n$ decays to zero.
+
+With singular point-mass priors, the full future-path distance stays at one forever.
+
+More data does not reconcile the agents, because each rules out paths the other assigns positive probability.
+
+
+## Kakutani's theorem: when does merging hold?
+
+A natural question is: for which product measures does the Blackwell–Dubins
+hypothesis $Q \ll P$ hold?
+
+For infinite product measures, the answer is
+given by a classical result of {cite:t}`kakutani1948`.
+
+### Hellinger affinities
+
+```{prf:definition} Hellinger Affinity
+:label: hellinger_affinity
+
+For probability measures $P_n$ and $Q_n$ on $(S, \mathscr{S})$ with common
+dominating measure $\lambda$, the **Hellinger affinity** is
+
+$$
+\rho_n = \int_S \sqrt{\frac{dP_n}{d\lambda} \cdot \frac{dQ_n}{d\lambda}}\,d\lambda
+\;\in\; [0, 1].
+$$
+
+$\rho_n = 1$ if and only if $P_n = Q_n$; $\rho_n = 0$ if and only if $P_n \perp Q_n$.
+```
+
+For two specific one-dimensional families:
+
+- Gaussian: $P_n = \mathcal{N}(\mu_n, 1)$ vs $Q_n = \mathcal{N}(0,1)$:
+
+$$
+\rho_n^{\text{Gauss}} = \exp\!\left(-\frac{\mu_n^2}{8}\right).
+$$
+
+- Bernoulli: $P_n = \mathrm{Bernoulli}(p)$ vs $Q_n = \mathrm{Bernoulli}(q)$:
+
+$$
+\rho_n^{\text{Bern}} = \sqrt{pq} + \sqrt{(1-p)(1-q)}.
+$$
+
+### Kakutani's dichotomy
+
+```{prf:theorem} Kakutani (1948)
+:label: kakutani_dichotomy
+
+Let $P = \bigotimes_{n=1}^\infty P_n$ and $Q = \bigotimes_{n=1}^\infty Q_n$
+be infinite product measures whose factors are pairwise equivalent: $P_n \sim Q_n$ for every $n$.
+
+Then either $P \sim Q$ or $P \perp Q$; there
+is no intermediate case.
+
+Specifically,
+
+$$
+P \sim Q
+\quad \iff \quad
+\prod_{n=1}^\infty \rho_n > 0
+\quad \iff \quad
+\sum_{n=1}^\infty (1 - \rho_n) < \infty.
+$$
+
+If $\prod_{n=1}^\infty \rho_n = 0$, then $P \perp Q$.
+
+*Proof idea.*
+A standard proof studies the likelihood-ratio martingale
+$Z_N = \prod_{n=1}^N (dP_n/dQ_n)$ together with the identity
+$\mathbb{E}_Q[\sqrt{Z_N}] = \prod_{n=1}^N \rho_n$.
+
+The product staying positive corresponds to equivalence, while the product collapsing to zero corresponds to singularity.
+
+$\square$
+```
+
+### Implication for merging
+
+For IID-type sequences, Kakutani's theorem gives the following picture:
+
+| Scenario | $\sum_n (1-\rho_n)$ | Conclusion | Merging? |
+|---|---|---|---|
+| $P_n = Q_n$ for all $n$ | $0$ | $P = Q$ | Trivially yes |
+| $P_n \ne Q_n$ with $\sum_n (1-\rho_n) < \infty$ | Finite | $P \sim Q$ | Yes; Blackwell–Dubins applies |
+| $P_n = P \ne Q = Q_n$ fixed, $n \ge 1$ | $\infty$ | $P \perp Q$ | No |
+
+The IID case with different fixed marginals is the standard no-merging example.
+
+If two agents assign permanently different distributions to each observation, they end up in disjoint probability worlds.
+
+### A Gaussian product-measure example
+
+We illustrate Kakutani's dichotomy with Gaussian product measures.
+
+Take $Q = \mathcal{N}(0,1)^{\otimes\mathbb{N}}$ as the reference measure and $P = \bigotimes_n \mathcal{N}(\mu_n,1)$ as the alternative.
+
+Three choices of $\mu_n$:
+
+1. $\mu_n = \mu > 0$ constant ($\sum (1-\rho_n) = \infty$) $\Rightarrow P \perp Q$.
+2. $\mu_n = c/\!\sqrt{n}$ ($\sum (1-\rho_n) \approx \sum c^2/(8n) = \infty$) $\Rightarrow P \perp Q$.
+3. $\mu_n = c/n$ ($\sum (1-\rho_n) \approx \sum c^2/(8n^2) < \infty$) $\Rightarrow P \sim Q$.
+
+```{code-cell} ipython3
+N_max  = 2000
+ns     = np.arange(1, N_max + 1)
+c      = 2.0
+N_plot = 400
+rng    = np.random.default_rng(0)
+
+cases = [
+    (r'$\mu_n = c$ (constant)', np.full(N_max, c)),
+    (r'$\mu_n = c/\sqrt{n}$', c / np.sqrt(ns)),
+    (r'$\mu_n = c/n$', c / ns),
+]
+```
+
+With constant drift, $\log Z_N$ drifts to $-\infty$ under $Q$, so $Z_N \to 0$ $Q$-a.s. and $P \perp Q$.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: |
+      Constant drift $\mu_n = c$: the
+      likelihood ratio collapses ($P \perp Q$).
+    name: fig-kakutani-constant
+---
+label, μ_seq = cases[0]
+x = rng.standard_normal(N_plot)
+log_Z_inc = μ_seq[:N_plot] * x - μ_seq[:N_plot]**2 / 2
+log_Z = np.concatenate([[0], np.cumsum(log_Z_inc)])
+
+fig, ax = plt.subplots(figsize=(8, 3))
+ax.plot(np.arange(N_plot + 1), log_Z,
+        color='darkorange', lw=2, label=label)
+ax.axhline(0, color='black', lw=0.8, ls=':')
+ax.set_xlabel('horizon $N$')
+ax.set_ylabel(r'$\log Z_N$ under $Q$')
+ax.legend(fontsize=8)
+plt.tight_layout()
+plt.show()
+```
+
+The $\mu_n = c/\sqrt{n}$ case shows the same qualitative picture: despite the drift vanishing, it does so too slowly.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: |
+      Drift $\mu_n = c/\sqrt{n}$: still
+      singular ($P \perp Q$).
+    name: fig-kakutani-sqrt
+---
+label, μ_seq = cases[1]
+x = rng.standard_normal(N_plot)
+log_Z_inc = μ_seq[:N_plot] * x - μ_seq[:N_plot]**2 / 2
+log_Z = np.concatenate([[0], np.cumsum(log_Z_inc)])
+
+fig, ax = plt.subplots(figsize=(8, 3))
+ax.plot(np.arange(N_plot + 1), log_Z,
+        color='purple', lw=2, label=label)
+ax.axhline(0, color='black', lw=0.8, ls=':')
+ax.set_xlabel('horizon $N$')
+ax.set_ylabel(r'$\log Z_N$ under $Q$')
+ax.legend(fontsize=8)
+plt.tight_layout()
+plt.show()
+```
+
+Only with $\mu_n = c/n$ does $\sum (1-\rho_n) < \infty$ hold, so the likelihood ratio remains nondegenerate and $P \sim Q$.
+
+Blackwell–Dubins applies only in this case.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: |
+      Drift $\mu_n = c/n$: the likelihood
+      ratio stabilises ($P \sim Q$).
+    name: fig-kakutani-inv-n
+---
+label, μ_seq = cases[2]
+x = rng.standard_normal(N_plot)
+log_Z_inc = μ_seq[:N_plot] * x - μ_seq[:N_plot]**2 / 2
+log_Z = np.concatenate([[0], np.cumsum(log_Z_inc)])
+
+fig, ax = plt.subplots(figsize=(8, 3))
+ax.plot(np.arange(N_plot + 1), log_Z,
+        color='steelblue', lw=2, label=label)
+ax.axhline(0, color='black', lw=0.8, ls=':')
+ax.set_xlabel('horizon $N$')
+ax.set_ylabel(r'$\log Z_N$ under $Q$')
+ax.legend(fontsize=8)
+plt.tight_layout()
+plt.show()
+```
+
+
+## Extension to continuous time
+
+The same logic extends to continuous time.
+
+### Girsanov's theorem and the likelihood-ratio process
+
+On the canonical Wiener space with $Q$ the Wiener measure (standard
+Brownian motion $W$), suppose agent $P$ believes the process has an
+additional drift $\theta = \{\theta_s\}_{s \geq 0}$:
+
+$$
+W_t = \widetilde{W}_t + \int_0^t \theta_s\, ds,
+$$
+
+where $\widetilde{W}$ is a $P$-Brownian motion.
+
+The Girsanov–Cameron–Martin theorem {cite:p}`girsanov1960` gives the
+likelihood-ratio process as the stochastic exponential
+
+$$
+Z_t
+= \exp\!\left(\int_0^t \theta_s\, dW_s - \frac{1}{2}\int_0^t \theta_s^2\, ds\right).
+$$
+
+$Z_t$ is always a non-negative $Q$-local martingale; it is a true martingale
+if and only if $\mathbb{E}_Q[Z_t] = 1$ for all $t$.
+
+Novikov's condition {cite:p}`novikov1972`,
+$\mathbb{E}_Q\!\left[\exp\!\left(\tfrac{1}{2}\int_0^T \theta_s^2\,ds\right)\right] < \infty$ for all $T$,
+is sufficient.
+
+### The dichotomy at infinity
+
+A key subtlety on $[0,+\infty)$ is that local absolute continuity does *not* imply global absolute continuity on $\mathscr{F}_\infty$.
+
+```{prf:remark} Infinite-Horizon Subtlety
+:label: dichotomy_at_infinity
+
+Suppose $Z_t$ is a true $Q$-martingale for every finite horizon and let $Z_t \to Z_\infty$ $Q$-a.s.
+
+If $\{Z_t\}$ is uniformly integrable on $[0,\infty)$, then $P \ll Q$ on $\mathscr{F}_\infty$ with $dP/dQ = Z_\infty$.
+
+For the Blackwell–Dubins conclusion we need $Q \ll P$ on $\mathscr{F}_\infty$ (the reverse direction).  
+
+In many standard settings, including deterministic drifts satisfying the energy condition below, the measures are in fact *equivalent* ($P \sim Q$) on $\mathscr{F}_\infty$, so both directions hold.
+
+If uniform integrability fails, then global absolute continuity on $\mathscr{F}_\infty$ can fail.
+
+In many standard examples, including a non-zero constant drift, the measures are in fact singular on $\mathscr{F}_\infty$.
+```
+
+A convenient sufficient condition in deterministic-drift examples is the **energy condition**
+
+$$
+\int_0^\infty \theta_s^2\,ds < \infty \quad Q\text{-a.s.}
+$$
+
+Informally, this says the total amount of information separating the two measures over the infinite horizon is finite.
+
+Under the energy condition, $P \sim Q$ on $\mathscr{F}_\infty$, so Blackwell–Dubins applies and merging holds under both measures.
+
+When $\theta$ is a non-zero constant, the condition fails, the measures are singular on $\mathscr{F}_\infty$, and merging does not occur.
+
+Whenever $Q \ll P$ on $\mathscr{F}_\infty$ is established, the proof of the
+continuous-time Blackwell–Dubins result is identical to the discrete-time
+proof.
+
+$\{d_t, \mathscr{F}_t\}$ is a non-negative $Q$-supermartingale in
+$[0,1]$, so $d_t \to d_\infty$ $Q$-a.s.
+
+The $L^1$ bound
+$\mathbb{E}_Q[d_t] = \tfrac{1}{2}\mathbb{E}_P[|Z_t - Z_\infty|] \to 0$
+forces $d_\infty = 0$.
+
+
+## Applications
+
+### Bayesian learning
+
+The most direct application is Bayesian inference.
+
+Suppose data $(x_1, x_2, \ldots)$ are drawn from the true measure $Q^*$.
+
+An agent holds a prior $\pi$ over a family $\{Q_\theta : \theta \in \Theta\}$, inducing a marginal $P = \int Q_\theta\,\pi(d\theta)$.
+
+If $Q^* \ll P$ (i.e., the agent's marginal model dominates the truth), then Blackwell–Dubins gives
+
+$$
+\bigl\|P(\,\cdot\,|\,x_1,\ldots,x_n) - Q^*(\,\cdot\,|\,x_1,\ldots,x_n)\bigr\|_{\mathrm{TV}}
+\to 0 \quad Q^*\text{-a.s.}
+$$
+
+This is a strong form of Bayesian consistency: the agent's predictions merge with the truth under the true measure.
+
+A prior assigning positive mass to a neighbourhood of the true parameter typically guarantees *local* absolute continuity $Q^*_n \ll P_n$ for every finite horizon $n$, but not the global condition $Q^* \ll P$ on $\mathscr{F}_\infty$ that Blackwell–Dubins requires.
+
+For example, in the Beta–Bernoulli model with a non-atomic prior $\pi$, the mixture $P = \int \mathrm{Bernoulli}(p)^{\infty}\,\pi(dp)$ satisfies $Q^*_n \ll P_n$ for every $n$, yet $Q^* \not\ll P$ globally because the set $\{\lim k_n/n = p^*\}$ has $Q^*$-measure one but $P$-measure zero (different Bernoulli product measures are mutually singular).
+
+Global absolute continuity does hold under additional structure, for instance when the parameter space is finite or the model is sufficiently regular to admit a Doob-consistency argument.
+
+{cite:t}`DiaconisFreedman1986` study the consistency of Bayes estimates and show, among other results, that the interplay between local and global absolute continuity plays a central role in ensuring posterior convergence.
+
+When $P \perp Q^*$, there are events of probability one under $Q^*$ that have probability zero under $P$, so the agent's beliefs remain fundamentally misspecified.
+
+### Rational expectations and heterogeneous priors
+
+In macroeconomics, rational-expectations models typically impose a common prior.
+
+Blackwell–Dubins gives a dynamic justification for weaker initial agreement.
+
+If two agents start with equivalent priors and observe the same history, their conditional forecasts eventually agree on every event.
+
+{cite:t}`aumann1976`'s agreement theorem strengthens this: agents with
+a common prior cannot "agree to disagree" on posterior probabilities.
+
+Blackwell–Dubins complements Aumann by showing that equivalent priors are enough for eventual agreement.
+
+### Ergodic Markov chains
+
+For a Markov chain with transition kernel $\Pi$ and two initial
+distributions $\mu$ and $\nu$, the $n$-step distributions are $\mu\Pi^n$
+and $\nu\Pi^n$.
+
+If $\Pi$ is ergodic with unique stationary distribution
+$\pi$, both converge to $\pi$, so
+
+$$
+\|\mu\Pi^n - \nu\Pi^n\|_{\mathrm{TV}}
+\leq \|\mu\Pi^n - \pi\|_{\mathrm{TV}} + \|\nu\Pi^n - \pi\|_{\mathrm{TV}}
+\to 0.
+$$
+
+This is a special form of merging that does *not* require absolute continuity, because ergodicity already forces both distributions to the same limit.
+
+Blackwell–Dubins is the right analogue for non-ergodic or non-Markovian environments, where no single invariant distribution need exist.
+
+
+## The rate of merging
+
+Blackwell–Dubins is qualitative.
+
+It tells us that $d_n \to 0$, but not how fast.
+
+The bound
+
+$$
+\mathbb{E}_Q[d_n] = \tfrac{1}{2}\,\mathbb{E}_P[|Z_n - Z_\infty|]
+$$
+
+shows that the rate of merging is controlled by the $L^1(P)$ convergence rate of the likelihood-ratio martingale.
+
+In regular parametric examples, one often sees $n^{-1/2}$-type behavior.
+
+The next figure checks that heuristic in the Beta–Bernoulli model.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: |
+      A log-log plot of the average merging distance in the Beta–Bernoulli model.
+      The fitted slope is close to $-1/2$, which is consistent with square-root decay in this experiment.
+    name: fig-merging-of-opinions-rate
+---
+N_paths_rate = 200
+n_steps_rate = 800
+
+tv_rate = np.empty((N_paths_rate, n_steps_rate + 1))
+for i in range(N_paths_rate):
+    s = run_simulation(p_true, a1, b1, a2, b2, n_steps_rate, seed=100 + i)
+    tv_rate[i] = s['tv_beta']
+
+ns_rate  = np.arange(1, n_steps_rate + 1)
+mean_tv  = tv_rate[:, 1:].mean(axis=0)    # mean d_n, n = 1, ..., n_steps_rate
+
+# Fit a reference line d_n ~ C / sqrt(n) using the later part of the sample
+fit_start = 200
+log_ns  = np.log(ns_rate[fit_start:])
+log_tv  = np.log(mean_tv[fit_start:] + 1e-12)
+coeffs  = np.polyfit(log_ns, log_tv, 1)
+slope   = coeffs[0]
+
+# Reference curve C/sqrt(n)
+C_ref   = np.exp(coeffs[1])
+ref_curve = C_ref / np.sqrt(ns_rate)
+
+fig, ax = plt.subplots(figsize=(8, 4))
+ax.loglog(ns_rate, mean_tv,  color='steelblue', lw=2,
+          label=r'$\mathbb{E}_Q[d_n]$ (Monte Carlo)')
+ax.loglog(ns_rate, ref_curve, color='firebrick', lw=2, ls='--',
+          label=(rf'Reference $C/\sqrt{{n}}$'
+                 rf'  (fitted slope $\approx {slope:.2f}$)'))
+ax.set_xlabel('sample size $n$')
+ax.set_ylabel(r'$\mathbb{E}_Q[d_n]$')
+ax.legend()
+plt.tight_layout()
+plt.show()
+
+print(f"Fitted log-log slope: {slope:.3f}  (predicted: -0.50)")
+```
+
+Fitting the later part of the sample gives a slope close to $-0.5$.
+
+That is consistent with $n^{-1/2}$ scaling in this simulation.
+
+
+## Summary and extensions
+
+The logical flow underlying the Blackwell–Dubins theorem is:
+
+$$
+Q \ll P
+\;\Longrightarrow\;
+Z = \frac{dQ}{dP} \in L^1(P)
+\;\Longrightarrow\;
+Z_n = \mathbb{E}_P[Z \,|\, \mathscr{F}_n]
+\xrightarrow{L^1(P)}
+Z_\infty
+\;\Longrightarrow\;
+d_n \xrightarrow{Q\text{-a.s.}} 0.
+$$
+
+Takeaways:
+
+1. One-sided absolute continuity $Q \ll P$ gives merging $Q$-almost surely.  For merging under *both* measures, one needs mutual absolute continuity $P \sim Q$.
+
+2. The likelihood-ratio martingale $Z_n = \mathbb{E}_P[Z|\mathscr{F}_n]$ and its $L^1(P)$ convergence drive the result.
+
+3. More data can only reduce (in expectation) the difficulty of distinguishing two hypotheses.
+
+4. For infinite product measures, Kakutani's theorem gives a sharp equivalence-versus-singularity dichotomy: either $P \sim Q$ (when $\sum_n (1 - \rho_n) < \infty$) or $P \perp Q$ (when the sum diverges), with no intermediate case.
+
+5. When $P \sim Q$, Blackwell–Dubins applies and merging occurs under both measures; when $P \perp Q$, disagreement persists forever.
+
+### Applications in economics
+
+Some influential applications and extensions are:
+
+- {cite}`KalaiLehrer1993Nash`: repeated-game learning drives play toward Nash behavior when priors are absolutely continuous with respect to the truth.
+- {cite}`KalaiLehrer1993Subjective`: subjective and objective equilibria coincide asymptotically under the same condition.
+- {cite}`KalaiLehrer1994Merging`: weak and strong notions of merging are introduced for environments where full total-variation convergence is too strong.
+- {cite}`KalaiLehrerSmorodinsky1999`: merging is linked to calibrated forecasting.
+- {cite}`JacksonKalaiSmorodinsky1999`: de Finetti-style representations are connected to Bayesian learning and posterior convergence.
+- {cite}`JacksonKalai1999`: social learning erodes reputational effects that rely on persistent disagreement across cohorts.
+- {cite}`Sandroni1998Nash`: near-absolute-continuity conditions are shown to suffice for Nash-type convergence in repeated games.
+- {cite}`MillerSanchirico1999`: gives an alternative proof and an economic interpretation of persistent disagreement in terms of mutually favorable bets.
+- {cite}`LehrerSmorodinsky1996Compatible`: studies broader compatibility notions beyond Blackwell--Dubins absolute continuity.
+- {cite}`LehrerSmorodinsky1996Learning`: surveys merging and learning in repeated strategic environments.
+- {cite}`Nyarko1994`: relates Bayesian learning under absolute continuity to convergence toward correlated equilibrium.
+- {cite}`PomattoAlNajjarSandroni2014`: extends the theorem to finitely additive probabilities and connects merging to test manipulability.
+- {cite}`AcemogluChernozhukovYildiz2016`: shows how disagreement can persist when agents are uncertain about the signal structure itself.
+
+### A companion result from probability
+
+{cite}`DiaconisFreedman1986` study the consistency of Bayes estimates, proving equivalences involving posterior convergence and providing counterexamples that highlight the role of the prior.
+
+Their work is in the same intellectual tradition as Blackwell–Dubins and is routinely co-cited with the merging theorem in the economics learning literature.
diff --git a/lectures/organization_capital.md b/lectures/organization_capital.md
new file mode 100644
index 000000000..f6aa4e467
--- /dev/null
+++ b/lectures/organization_capital.md
@@ -0,0 +1,898 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.11.1
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
+# Organization Capital
+
+```{index} single: Organization Capital
+```
+
+## Overview
+
+This lecture describes a theory of **organization capital** proposed by
+{cite:t}`Prescott_Visscher_1980`.
+
+Prescott and Visscher define organization capital as information that a firm accumulates
+about its employees, teams, and production processes.
+
+This information is an *asset* to the firm because it affects the production possibility set
+and is produced jointly with output.
+
+Costs of adjusting the stock of organization capital constrain the firm's growth rate,
+providing an explanation for
+
+1. why firm growth rates are independent of firm size (Gibrat's Law)
+1. why adjustment costs for rapid growth arise endogenously rather than being assumed
+
+The paper offers three examples of organization capital:
+
+* *Personnel information*: knowledge about the match between workers and tasks
+* *Team information*: knowledge about how well groups of workers mesh
+* *Firm-specific human capital*: skills of employees enhanced by on-the-job training
+
+In each case, the investment possibilities lead firms to grow at a common rate,
+yielding constant returns to scale together with increasing costs of rapid size adjustment.
+
+```{note}
+The theory is related to ideas of {cite:t}`Coase_1937` and {cite:t}`Williamson_1975` about the nature of the firm.
+
+Prescott and Visscher stress the firm's role as a storehouse of information and argue that
+incentives within the firm are created for efficient accumulation and use of that information.
+```
+
+Let's start with some imports:
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy.stats import norm
+from scipy.optimize import brentq
+```
+
+## The basic idea
+
+The firm is a storehouse of information.
+
+Within the firm, incentives are created for the efficient accumulation and use of that information.
+
+Prescott and Visscher exploit this concept to explain certain facts about firm growth and
+size distribution.
+
+The key insight: the process by which information is accumulated naturally leads to
+
+1. *constant returns to scale*, and
+2. *increasing costs to rapid firm size adjustment*
+
+Constant returns to scale explain the absence of an observed unique optimum firm size
+(see {cite:t}`Stigler_1958`).
+
+Without costs of adjustment, the pattern of investment
+by firms in the face of a change in market demand would exhibit
+discontinuities we do not observe.
+
+Further, without a cost penalty to rapid growth, the first firm to
+discover a previously untapped market would preempt competition by
+usurping all profitable investments as they appear, thus implying
+monopoly more prevalent than it is.
+
+
+## Personnel information as organization capital
+
+```{index} single: Organization Capital; Personnel Information
+```
+
+The first example of organization capital is information about the
+match between workers and tasks.
+
+### Setup
+
+Workers have different sets of skills and talents.
+
+A variable $\theta$ measures the aptitude of a worker for a particular kind of work.
+
+* Workers with high $\theta$ have comparative advantage in tasks requiring repeated attention to detail
+* Workers with low $\theta$ have comparative advantage in work requiring broadly defined duties
+
+The population distribution of $\theta$ is normal with mean zero and precision (inverse of variance) $\pi$:
+
+$$
+\theta \sim N(0, 1/\pi)
+$$
+
+When a worker is hired from the labor pool, neither the worker nor the employer knows $\theta$.
+
+Both know only the population distribution.
+
+### Three tasks
+
+If $q$ units of output are produced, assume:
+
+* $\varphi_1 q$ workers are assigned to *task 1* (screening)
+* $\varphi q$ workers are assigned to *task 2*
+* the remaining workers are assigned to *task 3*
+
+where $\varphi_1 + 2\varphi = 1$.
+
+```{note}
+The fixed coefficients technology requires a constant ratio between the number of
+personnel in jobs 2 and 3 and the number assigned to job 1.
+```
+
+For task 1, the screening task, per unit cost of production is *invariant* to the $\theta$-values of the individuals assigned.
+
+However, the larger a worker's $\theta$, the larger is his product in task 2 relative to
+his product in task 3.
+
+Consequently:
+
+* a worker with a highly positive $\theta$ is much better suited for task 2
+* a worker with a highly negative $\theta$ is much better suited for task 3
+
+### Bayesian learning
+
+Performance in tasks 2 or 3 cannot be observed at the individual level.
+
+But information about a worker's $\theta$-value can be obtained from observing
+performance in task 1, the screening task.
+
+The expert supervising the apprentice determines a value of $z$ each period:
+
+$$
+z_{it} = \theta_i + \epsilon_{it}
+$$ (eq:signal)
+
+where $\epsilon_{it} \sim N(0, 1)$ are independently distributed over both workers $i$ and periods $t$.
+
+After $n$ observations on a worker in the screening job, the *posterior distribution* of $\theta$ is normal with
+
+*posterior mean:*
+
+$$
+m = \frac{1}{\pi + n} \sum_{k=1}^{n} z_k
+$$ (eq:post_mean)
+
+*posterior precision:*
+
+$$
+h = \pi + n
+$$ (eq:post_prec)
+
+Knowledge of an individual is thus completely characterized by the pair $(m, h)$.
+
+```{code-cell} ipython3
+def bayesian_update(z_observations, prior_precision):
+    """
+    Compute posterior mean and precision after observing signals.
+    """
+    n = len(z_observations)
+    h = prior_precision + n
+    m = np.sum(z_observations) / h
+    return m, h
+```
+
+Let's visualize how the posterior evolves as we observe a worker whose true $\theta = 0.8$:
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Posterior mean convergence and uncertainty
+    name: fig-posterior-evolution
+---
+np.random.seed(0)
+
+θ_true = 0.8
+π = 1.0
+
+T = 20
+ε = np.random.randn(T)
+z_signals = θ_true + ε
+
+posterior_means = []
+posterior_stds = []
+
+for n in range(1, T + 1):
+    m, h = bayesian_update(z_signals[:n], π)
+    posterior_means.append(m)
+    posterior_stds.append(1 / np.sqrt(h))
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 5))
+
+ax = axes[0]
+ax.plot(range(1, T + 1), posterior_means, '-o', markersize=4, lw=2,
+        label='Posterior mean $m$')
+ax.axhline(θ_true, color='r', linestyle='--',
+           label=fr'True $\theta = {θ_true}$')
+ax.set_xlabel('number of observations $n$')
+ax.set_ylabel('posterior mean $m$')
+ax.legend()
+
+ax = axes[1]
+ax.plot(range(1, T + 1), posterior_stds, '-o', markersize=4, lw=2,
+        label=r'Posterior std $1/\sqrt{h}$')
+ax.set_xlabel('number of observations $n$')
+ax.set_ylabel('posterior standard deviation')
+ax.legend()
+
+plt.tight_layout()
+plt.show()
+```
+
+As the number of screening observations $n$ increases, the posterior mean converges
+to the true $\theta$, and the posterior uncertainty shrinks at rate $1/\sqrt{n}$.
+
+### Per unit costs of production
+
+Under the nonsequential assignment rule, employees with the greatest seniority
+are assigned to jobs 2 and 3, while newer employees remain in the screening task.
+
+Workers with $m > 0$ are assigned to task 2, and those with $m \leq 0$ to task 3.
+
+Per unit costs of production, assuming this assignment after $n$ screening periods, are:
+
+$$
+c(n) = c_1 + c_2 + c_3 - E\{\theta \mid m > 0\} + E\{\theta \mid m \leq 0\}
+$$ (eq:unit_cost)
+
+Because $m$ is normally distributed, evaluation of the conditional expectation in
+{eq}`eq:unit_cost` yields per unit costs as a function of $n$:
+
+$$
+c(n) = c - 0.7978 \frac{n}{\pi(\pi + n)}
+$$ (eq:cost_n)
+
+where $c = c_1 + c_2 + c_3$ and $0.7978 = 2 \int_0^{\infty} \frac{t}{\sqrt{2\pi}} e^{-t^2/2} dt$.
+
+```{note}
+The constant $0.7978 \approx \sqrt{2/\pi}$ is the mean of the standard half-normal distribution.
+
+It arises from computing $E[\theta \mid m > 0] - E[\theta \mid m \leq 0]$ for a normal distribution.
+```
+
+The function $c(n)$ decreases at a *decreasing rate* in $n$.
+
+More screening observations reduce costs but with diminishing returns.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Per unit costs by screening time
+    name: fig-cost-screening
+---
+def cost_per_unit(n_vals, π, c_bar=1.0):
+    """
+    Per unit cost as a function of screening periods n.
+    """
+    n_vals = np.asarray(n_vals, dtype=float)
+    return c_bar - 0.7978 * n_vals / (π * (π + n_vals))
+
+
+fig, ax = plt.subplots(figsize=(10, 6))
+
+n_vals = np.linspace(0.1, 50, 200)
+
+for π in [0.5, 1.0, 2.0, 5.0]:
+    costs = cost_per_unit(n_vals, π)
+    ax.plot(n_vals, costs, lw=2, label=fr'$\pi = {π}$')
+
+ax.set_xlabel('screening periods $n$')
+ax.set_ylabel('per unit cost $c(n)$')
+ax.legend()
+ax.set_xlim(0, 50)
+plt.tight_layout()
+plt.show()
+```
+
+The figure shows that:
+
+* costs decrease with more screening time $n$
+* the decrease is at a declining rate (diminishing returns to screening)
+* for smaller prior precision $\pi$ (more initial uncertainty about worker types), the gains from screening are larger
+
+This diminishing-returns structure is the source of the *increasing costs of rapid adjustment*.
+
+
+### Growth rate and screening time
+
+The greater the growth rate, the smaller must be $n$ --- the time spent in the screening
+task before assignment to job 2 or 3.
+
+If $\gamma$ is the growth rate of output and $\rho$ is the quit rate, and $y_i$ is the current number
+of vintage $i$ employees, then
+
+$$
+(1 + \gamma) y_{i+1} = (1 - \rho) y_i
+$$
+
+Letting $\xi = (1 - \rho)/(1 + \gamma)$, from the above $y_i = \xi^i y_0$.
+
+For the fixed coefficients technology, the fraction of present personnel with vintage
+greater than $n$ must equal $2\varphi / (\varphi_1 + 2\varphi)$, which gives:
+
+$$
+\xi^{n+1} = \frac{2\varphi}{\varphi_1 + 2\varphi}
+$$ (eq:cutoff)
+
+Solving for $n$ as a function of $\gamma$:
+
+$$
+n(\gamma) = \frac{\log(2\varphi) - \log(\varphi_1 + 2\varphi)}{\log(1 - \rho) - \log(1 + \gamma)} - 1 \quad \text{for } \gamma > -\rho
+$$ (eq:n_gamma)
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Screening time vs. growth rate
+    name: fig-screening-growth
+---
+def screening_time(γ, ρ, φ1, φ):
+    """
+    Screening time n as a function of growth rate γ.
+    """
+    γ = np.asarray(γ, dtype=float)
+    numerator = np.log(2 * φ) - np.log(φ1 + 2 * φ)
+    denominator = np.log(1 - ρ) - np.log(1 + γ)
+    return numerator / denominator - 1
+
+
+ρ = 0.1
+φ1 = 0.5
+φ = 0.25
+
+γ_vals = np.linspace(-0.05, 0.30, 200)
+
+valid = γ_vals > -ρ
+γ_valid = γ_vals[valid]
+n_vals = screening_time(γ_valid, ρ, φ1, φ)
+mask = n_vals > 0
+γ_plot = γ_valid[mask]
+n_plot = n_vals[mask]
+
+fig, ax = plt.subplots(figsize=(10, 6))
+ax.plot(γ_plot, n_plot, lw=2)
+ax.set_xlabel(r'growth rate $\gamma$')
+ax.set_ylabel(r'screening periods $n(\gamma)$')
+ax.set_xlim(γ_plot[0], γ_plot[-1])
+plt.tight_layout()
+plt.show()
+```
+
+The figure shows the key trade-off: *faster growth forces shorter screening periods*.
+
+When growth is rapid, new workers must be promoted from the screening task to
+productive tasks more quickly, so less information is gathered about each worker
+before assignment.
+
+
+### Combined effect: growth rate and per unit costs
+
+Composing the functions $c(n)$ and $n(\gamma)$ reveals how per unit costs depend on the
+growth rate:
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Per unit costs vs. growth rate
+    name: fig-cost-growth
+---
+fig, ax = plt.subplots(figsize=(10, 6))
+
+π = 1.0
+c_bar = 1.0
+
+n_of_γ = screening_time(γ_plot, ρ, φ1, φ)
+costs_of_γ = cost_per_unit(n_of_γ, π, c_bar)
+
+ax.plot(γ_plot, costs_of_γ, lw=2)
+ax.set_xlabel(r'growth rate $\gamma$')
+ax.set_ylabel(r'per unit cost $c(n(\gamma))$')
+ax.set_xlim(γ_plot[0], γ_plot[-1])
+plt.tight_layout()
+plt.show()
+```
+
+This establishes the key result: *increasing costs of rapid adjustment arise endogenously*
+from the trade-off between screening and growth.
+
+The faster the firm grows, the less time it has to screen workers, the poorer the
+match between workers and tasks, and the higher the per unit production costs.
+
+
+## Industry equilibrium
+
+```{index} single: Organization Capital; Industry Equilibrium
+```
+
+Firm growth rates are independent of firm size in this model because the
+mathematical structure of the technology constraint is the same as that
+considered in {cite:t}`lucas1967adjustment`, except that the stock of organization capital
+is a vector rather than a scalar.
+
+The technology set facing price-taking firms is a **convex cone**: there are
+constant returns to scale.
+
+Constant returns and internal adjustment costs, along with some costs of
+transferring capital between firms, yield an optimum rate of firm growth
+*independent of the firm's size* --- this is Gibrat's Law.
+
+The bounded, downward-sloping, inverse industry demand function is
+
+$$
+P_t = p(Q_t, u_t)
+$$
+
+where $Q_t$ is the sum of output over all firms and $u_t$ is a demand shock
+subject to a stationary Markov process.
+
+Prescott and Visscher show that a competitive equilibrium exists using the
+framework of {cite:t}`Lucas_Prescott_1971`.
+
+The discounted consumer surplus to be maximized is
+
+$$
+\sum_{t=0}^{\infty} \beta^t \left\{ \int_0^{Q_t} p(y, u_t) dy - Bw - Q_t \sum_i (A_{i2t} + A_{i3t}) c(i) / \sum_i (A_{i2t} + A_{i3t}) \right\}
+$$ (eq:surplus)
+
+where $A_{i2t}, A_{i3t}$, and $B$ are obtained by summing $a_{i2t}$, $a_{i3t}$, and $b$,
+respectively, over all firms in the industry.
+
+
+### Key property: growth rates independent of size
+
+If two firms have organization capital vectors $\underline{k}$ that are proportional at a point in time,
+they will be proportional in all future periods.
+
+That is, *growth rates are independent of firm size*.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Firm output levels and growth rates
+    name: fig-firm-growth
+---
+def simulate_firm_growth(T, γ, ρ, q0, seed=42):
+    """
+    Simulate firm output growth with stochastic shocks.
+    """
+    rng = np.random.default_rng(seed)
+    output = np.zeros(T)
+    output[0] = q0
+    for t in range(1, T):
+        shock = rng.normal(0, 0.02)
+        output[t] = output[t-1] * (1 + γ + shock)
+    return output
+
+
+T = 50
+γ_eq = 0.05
+ρ = 0.1
+
+fig, axes = plt.subplots(1, 2, figsize=(14, 5))
+
+ax = axes[0]
+for q0, label in [(10, 'Small firm'), (50, 'Medium firm'),
+                   (200, 'Large firm')]:
+    output = simulate_firm_growth(T, γ_eq, ρ, q0,
+                                  seed=int(q0))
+    ax.plot(range(T), output, lw=2, label=f'{label} ($q_0={q0}$)')
+ax.set_xlabel('period')
+ax.set_ylabel('output $q_t$')
+ax.legend()
+
+ax = axes[1]
+for q0, label in [(10, 'Small firm'), (50, 'Medium firm'),
+                   (200, 'Large firm')]:
+    output = simulate_firm_growth(T, γ_eq, ρ, q0,
+                                  seed=int(q0))
+    ax.plot(range(T), np.log(output), lw=2,
+            label=f'{label} ($q_0={q0}$)')
+ax.set_xlabel('period')
+ax.set_ylabel(r'$\log(q_t)$')
+ax.legend()
+
+plt.tight_layout()
+plt.show()
+```
+
+The right panel shows that all firms grow at the same rate regardless of initial size ---
+the log output paths are parallel.
+
+This is **Gibrat's Law**: growth rates are independent of firm size.
+
+## Bayesian screening simulation
+
+```{index} single: Organization Capital; Bayesian Screening
+```
+
+Let's simulate the full screening and assignment process for a single firm.
+
+We draw workers from the population, observe their signals in the screening task,
+and then assign them to the appropriate productive task based on the posterior mean.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Screening and worker assignment accuracy
+    name: fig-screening-assignment
+---
+def simulate_screening(n_workers, n_screen, π, seed=123):
+    """
+    Simulate screening and assignment of workers.
+    """
+    rng = np.random.default_rng(seed)
+
+    θ = rng.normal(0, 1/np.sqrt(π), n_workers)
+    signals = (θ[:, None]
+               + rng.normal(0, 1, (n_workers, n_screen)))
+    posterior_means = signals.sum(axis=1) / (π + n_screen)
+
+    assignment = np.where(posterior_means > 0, 2, 3)
+    correct_assignment = np.where(θ > 0, 2, 3)
+    misassignment_rate = np.mean(assignment != correct_assignment)
+
+    return {
+        'theta': θ,
+        'posterior_means': posterior_means,
+        'assignment': assignment,
+        'correct_assignment': correct_assignment,
+        'misassignment_rate': misassignment_rate
+    }
+
+
+π = 1.0
+n_workers = 5000
+screening_periods = [1, 3, 5, 10, 20, 50]
+
+fig, axes = plt.subplots(2, 3, figsize=(15, 10))
+axes = axes.flatten()
+
+misassignment_rates = []
+
+for idx, n_screen in enumerate(screening_periods):
+    results = simulate_screening(n_workers, n_screen, π)
+    misassignment_rates.append(results['misassignment_rate'])
+
+    ax = axes[idx]
+    θ = results['theta']
+    m = results['posterior_means']
+
+    correct = results['assignment'] == results['correct_assignment']
+    ax.scatter(θ[correct], m[correct], alpha=0.1, s=5,
+               color='blue', label='Correct')
+    ax.scatter(θ[~correct], m[~correct], alpha=0.3, s=5,
+               color='red', label='Misassigned')
+    ax.axhline(0, color='k', linewidth=0.5)
+    ax.axvline(0, color='k', linewidth=0.5)
+    mis = results['misassignment_rate']
+    ax.set_xlabel(r'true $\theta$')
+    ax.set_ylabel('posterior mean $m$')
+    if idx == 0:
+        ax.legend(markerscale=5, loc='upper left')
+
+plt.tight_layout()
+plt.show()
+```
+
+Red dots are workers who are *misassigned* --- placed in the wrong productive task
+because the posterior mean had the wrong sign relative to their true $\theta$.
+
+As $n$ increases:
+* The posterior mean $m$ becomes more strongly correlated with $\theta$
+* Misassignment rates fall
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Misassignment rate by screening time
+    name: fig-misassignment-rate
+---
+fig, ax = plt.subplots(figsize=(10, 6))
+
+n_range = np.arange(1, 51)
+mis_rates = []
+for n_screen in n_range:
+    results = simulate_screening(n_workers, n_screen, π)
+    mis_rates.append(results['misassignment_rate'])
+
+ax.plot(n_range, mis_rates, '-o', markersize=3, lw=2)
+ax.set_xlabel('screening periods $n$')
+ax.set_ylabel('misassignment rate')
+plt.tight_layout()
+plt.show()
+```
+
+This confirms the theoretical prediction: the cost savings from better assignment
+exhibit *diminishing returns* in the screening time $n$.
+
+## Team information
+
+```{index} single: Organization Capital; Team Information
+```
+
+Personnel information need not be valuable only because it facilitates the matching of
+workers to tasks.
+
+Another equally valuable use of personnel information is in the *matching of workers to workers*.
+
+What is important to performance in many activities within the firm is not just
+the aptitude of an individual assigned to a task, but also how well the
+characteristics of the individual mesh with those of others performing related duties.
+
+### Structure
+
+Suppose workers are grouped into teams, and team $i$ assigned to a screening task
+has an observed productivity indicator
+
+$$
+z_{it} = \theta_i + \epsilon_{it}
+$$
+
+where:
+* $\theta_i$ is a deterministic component directly related to how well team workers are paired
+* $\epsilon_{it} \sim N(0, 1)$ are i.i.d. stochastic components
+
+The $\theta$ from all possible teams are approximately independently and normally distributed
+$N(\mu, 1/\pi)$.
+
+After $n$ observations on team $i$, the posterior distribution on $\theta_i$ is normal with
+
+$$
+m = \mu + \frac{1}{\pi + n} \sum_{k=1}^{n} (z_k - \mu)
+$$
+
+and precision $h = \pi + n$.
+
+If dissolution of a team also dissolves the accrued information, the team information
+model has the *same mathematical structure* as the personnel information model.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Team quality estimates by screening periods
+    name: fig-team-screening
+---
+def simulate_team_screening(n_teams, n_screen, π, μ=0.5,
+                            seed=456):
+    """
+    Simulate team screening with Bayesian updating.
+    """
+    rng = np.random.default_rng(seed)
+
+    θ = rng.normal(μ, 1/np.sqrt(π), n_teams)
+    signals = (θ[:, None]
+               + rng.normal(0, 1, (n_teams, n_screen)))
+    z_bar = signals.mean(axis=1)
+    post_means = μ + n_screen * (z_bar - μ) / (π + n_screen)
+    post_prec = π + n_screen
+
+    return {
+        'theta': θ,
+        'posterior_means': post_means,
+        'posterior_precision': post_prec
+    }
+
+
+fig, axes = plt.subplots(1, 3, figsize=(15, 5))
+
+for idx, n_screen in enumerate([1, 5, 20]):
+    results = simulate_team_screening(500, n_screen, π=1.0, μ=0.5)
+
+    ax = axes[idx]
+    ax.scatter(results['theta'], results['posterior_means'],
+               alpha=0.4, s=10)
+    lims = [-1.5, 2.5]
+    ax.plot(lims, lims, 'r--', alpha=0.5, lw=2, label='45° line')
+    ax.set_xlabel(r'true team quality $\theta$')
+    ax.set_ylabel('posterior mean $m$')
+    ax.set_xlim(lims)
+    ax.set_ylim(lims)
+    ax.legend()
+    ax.set_aspect('equal')
+
+plt.tight_layout()
+plt.show()
+```
+
+As with individual screening, more observations improve the precision of team quality
+estimates.
+
+Rapid growth forces fewer observations before team assignments must be finalized, leading
+to higher costs.
+
+
+## Firm-specific human capital
+
+```{index} single: Organization Capital; Human Capital
+```
+
+The third example: organization capital consists of the **human capital** of the firm's employees.
+
+The capacity of the organization to function effectively as a production unit is
+determined largely by the level and meshing of the skills of the employees.
+
+```{note}
+The case for the human capital of employees being part of the capital stock of the firm
+is well established (see {cite:t}`Becker_1975`).
+
+Productivity in the future depends on levels of human capital in the future, but to acquire
+human capital for the future, a sacrifice in real resources is required in the present.
+```
+
+The key features are:
+
+* Output and skill enhancement are **joint products** resulting from the combination of
+  labor inputs possessing different skill levels
+
+* Experienced and inexperienced workers are combined in one of several available technical
+  processes to generate the firm's product, and in the process, the overall competence
+  of the work force is improved
+
+* The transformation frontier between current output and future human capital is
+  *concave* and linearly homogeneous
+
+This gives the technology set the structure of a closed convex cone with a vertex at the
+origin --- sufficient for optimal proportional growth by firms.
+
+### Concave transformation frontier
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Concave transformation frontier
+    name: fig-transformation-frontier
+---
+def transformation_frontier(q, α=0.7):
+    """
+    Concave transformation frontier between output and human capital.
+    """
+    q = np.asarray(q, dtype=float)
+    return (1 - q**α)**(1/α)
+
+
+fig, ax = plt.subplots(figsize=(8, 8))
+
+q_vals = np.linspace(0, 1, 200)
+
+for α in [0.5, 0.7, 1.0, 1.5]:
+    hk = transformation_frontier(q_vals, α)
+    ax.plot(q_vals, hk,
+            label=fr'$\alpha = {α}$', lw=2)
+
+ax.set_xlabel('current output $q$ (fraction of capacity)')
+ax.set_ylabel('future human capital increment $\\Delta h$')
+ax.legend()
+ax.set_xlim(0, 1.05)
+ax.set_ylim(0, 1.05)
+ax.set_aspect('equal')
+plt.tight_layout()
+plt.show()
+```
+
+The concavity of the transformation frontier means that moving from an extremely
+unbalanced bundle of production and learning activity to a more balanced bundle
+entails little sacrifice.
+
+But a workday consisting primarily of learning also has diminishing returns,
+creating the cost of rapid adjustment.
+
+
+## Costs of transferring organization capital
+
+```{index} single: Organization Capital; Transfer Costs
+```
+
+If there were no cost to transferring organization capital from one firm to another,
+the model would not place constraints on the firm's growth rate.
+
+Firms could then merge, divest, or pirate each other's personnel without a cost penalty
+and thus produce a pattern of growth not restricted by the model.
+
+Organization capital is *not* costlessly moved, however:
+
+1. *Moving is disruptive*: relocating from one locale to another is disruptive to both
+   employee and family
+
+2. *Information is firm-specific*: the information set that makes a person productive
+   in one organization may not make that person as productive in another, even if both
+   firms produce identical output
+
+   * Facility with a computer system at one firm
+   * Knowing whom to ask when problems arise
+   * Rapport with buyers or sellers
+
+These are types of organization capital in one firm that *cannot be transferred costlessly*
+to another.
+
+
+## Summary and implications
+
+The Prescott-Visscher model provides a unified framework in which:
+
+* The firm exists as an entity because it is an efficient structure for accumulating,
+  storing, and using information
+
+* *Constant returns to scale* arise because once the best combinations of worker types
+  are discovered, nothing prevents the firm from replicating those combinations with
+  proportional gains in product
+
+* *Increasing adjustment costs* arise endogenously from the trade-off between
+  current production and investment in organization capital
+
+* *Gibrat's Law* --- growth rates independent of firm size --- is a natural implication
+
+* Large firms should have growth rates that display *less variance* than small firms
+  because large firms are essentially portfolios of smaller production units
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Growth rate distributions by firm size
+    name: fig-growth-rate-dist
+---
+def simulate_growth_rate_distribution(n_firms, n_subunits, γ,
+                                      σ, T=100, seed=789):
+    """
+    Simulate growth rate distributions for firms of different sizes.
+    """
+    rng = np.random.default_rng(seed)
+    subunit_growth = rng.normal(γ, σ,
+                                (n_firms, n_subunits, T))
+    firm_growth = subunit_growth.mean(axis=1)
+    return firm_growth.mean(axis=1)
+
+
+fig, ax = plt.subplots(figsize=(10, 6))
+
+sizes = {'Small (1 unit)': 1,
+         'Medium (5 units)': 5,
+         'Large (20 units)': 20}
+
+γ = 0.05
+σ = 0.10
+
+for label, n_sub in sizes.items():
+    rates = simulate_growth_rate_distribution(
+        2000, n_sub, γ, σ)
+    ax.hist(rates, bins=50, alpha=0.5, density=True,
+            label=f'{label}: std={rates.std():.4f}')
+
+ax.set_xlabel('average growth rate')
+ax.set_ylabel('density')
+ax.legend()
+ax.axvline(γ, color='k', linestyle='--',
+           label=r'$\gamma$', alpha=0.5)
+plt.tight_layout()
+plt.show()
+```
+
+The figure shows that although all firms have the *same mean growth rate* (Gibrat's Law),
+large firms display *less variance* in realized growth rates because they are effectively
+portfolios of independent subunits.
+
+This is consistent with the empirical findings of {cite:t}`Mansfield_1962` and {cite:t}`Hymer_Pashigian_1962`.
+
+The essence of the Prescott-Visscher theory is that the nature of the firm is tied to
+*organization capital*.
+
+What distinguishes the firm from other relationships is that it is a structure within which
+agents have the incentive to acquire and reveal information in a manner that is less
+costly than in possible alternative institutions.
+
diff --git a/lectures/rational_expectations.md b/lectures/rational_expectations.md
index c153ca79b..eaade4f14 100644
--- a/lectures/rational_expectations.md
+++ b/lectures/rational_expectations.md
@@ -42,7 +42,7 @@ tags: [hide-output]
 This lecture introduces the concept of a *rational expectations equilibrium*.
 
 To illustrate it, we describe a linear quadratic version of a  model
-due to Lucas and Prescott {cite}`LucasPrescott1971`.
+due to Lucas and Prescott {cite}`Lucas_Prescott_1971`.
 
 That 1971 paper is one of a small number of research articles that ignited a *rational expectations revolution*.
 
@@ -203,7 +203,7 @@ This type of outcome provides an intellectual justification for liking a competi
 
 References for this lecture include
 
-* {cite}`LucasPrescott1971`
+* {cite}`Lucas_Prescott_1971`
 * {cite}`Sargent1987`, chapter XIV
 * {cite}`Ljungqvist2012`, chapter 7
 
@@ -439,7 +439,7 @@ Fortunately,  another method  works here.
 The method exploits a  connection between equilibrium and Pareto optimality expressed in
 the fundamental theorems of welfare economics (see, e.g, {cite}`MCWG1995`).
 
-Lucas and Prescott {cite}`LucasPrescott1971` used this method to construct a rational expectations equilibrium.
+Lucas and Prescott {cite}`Lucas_Prescott_1971` used this method to construct a rational expectations equilibrium.
 
 Some details follow.
 
diff --git a/lectures/survival_recursive_preferences.md b/lectures/survival_recursive_preferences.md
new file mode 100644
index 000000000..6d1053307
--- /dev/null
+++ b/lectures/survival_recursive_preferences.md
@@ -0,0 +1,1248 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.11.1
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
+(survival_recursive_preferences)=
+```{raw} jupyter
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# Survival and Long-Run Dynamics under Recursive Preferences
+
+```{index} single: Survival; Recursive Preferences
+```
+
+```{contents} Contents
+:depth: 2
+```
+
+## Overview
+
+This lecture studies the theory of long-run survival in {cite:t}`Borovicka2020`.
+
+The classical **market selection hypothesis** says that agents with less accurate beliefs are driven out of the market in the long run.
+
+This result was established rigorously by {cite:t}`Sandroni2000Markets` and {cite:t}`Blume_Easley2006` for economies with separable CRRA preferences.
+
+Borovicka shows that the conclusion can fail under Epstein-Zin recursive preferences.
+
+With recursive preferences, agents with distorted beliefs can survive and can even dominate.
+
+The key mechanism is that recursive preferences separate risk aversion from the intertemporal elasticity of substitution.
+
+That separation creates three channels that matter for survival:
+
+1. The *risk premium channel* rewards the more optimistic agent for holding more of the risky asset.
+1. The *speculative volatility channel* penalizes aggressive positions through log-return volatility.
+1. The *saving channel* changes consumption and saving decisions when the IES differs from one.
+
+Under separable preferences, only the first two channels remain.
+
+Under recursive preferences, the saving channel can overturn market selection.
+
+```{note}
+The paper builds on the continuous-time recursive utility formulation of {cite:t}`Duffie_Epstein1992a`,
+using the planner's problem approach of {cite:t}`Dumas_Uppal_Wang2000`.
+
+Important foundations for the market selection hypothesis were laid by
+{cite:t}`DeLong_etal1991` and {cite:t}`Blume_Easley1992`.
+```
+
+We start with some imports.
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+```
+
+## Environment
+
+The economy contains two infinitely lived agents, indexed by $n \in \{1, 2\}$.
+
+The agents have identical recursive preferences but different beliefs about aggregate endowment growth.
+
+We write Borovička's belief distortions $u^n$ as $\omega^n$.
+
+### Aggregate endowment
+
+Under the true probability measure $P$, aggregate endowment satisfies
+
+$$
+d \log Y_t = \mu_Y dt + \sigma_Y dW_t, \quad Y_0 > 0
+$$ (eq:srp_endowment)
+
+where $W$ is a standard Brownian motion, $\mu_Y$ is the drift, and $\sigma_Y > 0$ is the volatility.
+
+### Heterogeneous beliefs
+
+Agent $n$ believes that the drift is $\mu_Y + \omega^n \sigma_Y$ instead of $\mu_Y$.
+
+The parameter $\omega^n$ measures optimism when $\omega^n > 0$ and pessimism when $\omega^n < 0$.
+
+Agent $n$'s subjective probability measure $Q^n$ is defined by the Radon–Nikodym derivative
+
+$$
+M_t^n = \frac{dQ^n}{dP}\bigg|_t = \exp\left(-\frac{1}{2} |\omega^n|^2 t + \omega^n W_t\right)
+$$ (eq:radon_nikodym)
+
+Under $Q^n$, the process $W_t^n = W_t - \omega^n t$ is a Brownian motion, and agent $n$ perceives
+
+$$
+d \log Y_t = (\mu_Y + \omega^n \sigma_Y) dt + \sigma_Y dW_t^n .
+$$
+
+An agent with $\omega^n > 0$ is optimistic about endowment growth, while an agent with $\omega^n < 0$ is pessimistic.
+
+### Recursive preferences
+
+Both agents have Epstein-Zin recursive preferences.
+
+We use $\gamma > 0$ for relative risk aversion, $\rho > 0$ for the inverse of the IES, and $\beta > 0$ for the time-preference rate.
+
+The Duffie-Epstein-Zin felicity function is
+
+$$
+F(C, \nu)
+= \beta \frac{C^{1-\gamma}}{1-\gamma}
+\cdot
+\left(\frac{(1-\gamma) - (1-\rho)\nu / \beta}{\rho - \gamma}\right)^{(\gamma - \rho)/(1-\rho)}
+$$ (eq:felicity)
+
+where $\nu$ is the endogenous discount rate.
+
+```{note}
+In discrete time, Epstein-Zin preferences aggregate current consumption with a certainty equivalent of future utility via a CES aggregator (see {doc}`doubts_or_variability`).
+
+In continuous time there is no "next-period $V_{t+1}$," so {cite:t}`Duffie_Epstein1992a` recast the recursion as a felicity function $F(C,\nu)$ that depends on the agent's own continuation-value rate $\nu$.
+
+The two formulations encode the same separation of risk aversion $\gamma$ from the inverse IES $\rho$.
+
+When $\gamma = \rho$, preferences reduce to the standard separable CRRA case.
+```
+
+## Planner's problem
+
+Following {cite:t}`Dumas_Uppal_Wang2000`, we study equilibrium allocations through a social planner's problem.
+
+The planner chooses consumption shares $z^1$ and $z^2 = 1 - z^1$ and discount-rate processes $\nu^n$ for the two agents.
+
+### Modified discount factors
+
+It is convenient to absorb belief distortions into the modified discount factors $\tilde{\lambda}^n = \lambda^n M^n$, where $M^n$ is the Radon-Nikodym derivative {eq}`eq:radon_nikodym`.
+
+These processes satisfy
+
+$$
+d \log \tilde{\lambda}_t^n
+= -\left(\nu_t^n + \frac{1}{2} (\omega^n)^2\right) dt + \omega^n dW_t .
+$$ (eq:modified_discount)
+
+```{exercise}
+:label: ex_modified_discount
+
+Derive {eq}`eq:modified_discount`.
+
+*Hint:* Use $\log \tilde{\lambda}^n = \log \lambda^n + \log M^n$. The Pareto weight $\lambda^n$ evolves as $d\log \lambda_t^n = -\nu_t^n \, dt$, and $\log M_t^n$ is given by {eq}`eq:radon_nikodym`.
+```
+
+```{solution-start} ex_modified_discount
+:class: dropdown
+```
+
+From the definition $\tilde{\lambda}^n = \lambda^n M^n$, we have
+
+$$
+\log \tilde{\lambda}_t^n = \log \lambda_t^n + \log M_t^n.
+$$
+
+The Pareto weight satisfies $d\log \lambda_t^n = -\nu_t^n \, dt$.
+
+From {eq}`eq:radon_nikodym`, $\log M_t^n = -\frac{1}{2}|\omega^n|^2 t + \omega^n W_t$, so
+
+$$
+d \log M_t^n = -\tfrac{1}{2}(\omega^n)^2 \, dt + \omega^n \, dW_t.
+$$
+
+Adding the two:
+
+$$
+d \log \tilde{\lambda}_t^n = -\nu_t^n \, dt - \tfrac{1}{2}(\omega^n)^2 \, dt + \omega^n \, dW_t = -\left(\nu_t^n + \tfrac{1}{2}(\omega^n)^2\right) dt + \omega^n \, dW_t.
+$$
+
+```{solution-end}
+```
+
+### State variable: Pareto share
+
+The key state variable is the Pareto share of agent 1:
+
+$$
+\upsilon = \frac{\tilde{\lambda}^1}{\tilde{\lambda}^1 + \tilde{\lambda}^2} \in (0, 1)
+$$ (eq:pareto_share)
+
+It captures the relative weight of agent 1 in the planner's allocation.
+
+Define the log-odds ratio $\vartheta = \log(\upsilon / (1 - \upsilon))$.
+
+Its dynamics are
+
+$$
+d\vartheta_t = \underbrace{\left[\nu_t^2 + \frac{1}{2}(\omega^2)^2 - \nu_t^1 - \frac{1}{2}(\omega^1)^2\right]}_{m_{\vartheta}(\upsilon_t)} dt + (\omega^1 - \omega^2) dW_t
+$$ (eq:log_odds)
+
+The drift $m_\vartheta(\upsilon)$ determines the long-run behavior of the Pareto share.
+
+```{exercise}
+:label: ex_log_odds
+
+Derive {eq}`eq:log_odds` from {eq}`eq:modified_discount` and the definition $\vartheta = \log(\upsilon/(1-\upsilon))$.
+
+*Hint:* First show that $\vartheta = \log \tilde{\lambda}^1 - \log \tilde{\lambda}^2$, then subtract the two SDEs.
+```
+
+```{solution-start} ex_log_odds
+:class: dropdown
+```
+
+Since $\upsilon = \tilde{\lambda}^1 / (\tilde{\lambda}^1 + \tilde{\lambda}^2)$, we have $1 - \upsilon = \tilde{\lambda}^2 / (\tilde{\lambda}^1 + \tilde{\lambda}^2)$, so
+
+$$
+\vartheta = \log\frac{\upsilon}{1-\upsilon} = \log \tilde{\lambda}^1 - \log \tilde{\lambda}^2.
+$$
+
+From {eq}`eq:modified_discount`, the two log-discount-factor SDEs are
+
+$$
+d\log \tilde{\lambda}^1_t = -\left(\nu_t^1 + \tfrac{1}{2}(\omega^1)^2\right)dt + \omega^1 dW_t,
+$$
+
+$$
+d\log \tilde{\lambda}^2_t = -\left(\nu_t^2 + \tfrac{1}{2}(\omega^2)^2\right)dt + \omega^2 dW_t.
+$$
+
+Subtracting the second from the first:
+
+$$
+d\vartheta_t = \left[\nu_t^2 + \tfrac{1}{2}(\omega^2)^2 - \nu_t^1 - \tfrac{1}{2}(\omega^1)^2\right]dt + (\omega^1 - \omega^2)dW_t.
+$$
+
+```{solution-end}
+```
+
+### HJB equation
+
+Homotheticity reduces the planner's problem to a nonlinear ODE in the single state variable $\upsilon$.
+
+Because each agent's utility is homogeneous of degree $1-\gamma$ in consumption, the planner's value function factors as $J(\upsilon, Y) = \tilde{J}(\upsilon) \cdot Y^{1-\gamma}/(1-\gamma)$, eliminating $Y$ as a state variable.
+
+#### From discrete to continuous time
+
+In discrete time, a planner maximizes a weighted sum of agents' utilities by choosing allocations at each date.
+
+The Bellman equation is
+
+$$
+\tilde{J}(\upsilon) = \max_{z^1, z^2} \left\{ \upsilon \, u(z^1) + (1-\upsilon) \, u(z^2) + \beta \, \mathbb{E}\left[\tilde{J}(\upsilon')\right] \right\}.
+$$
+
+In continuous time, the period length shrinks to $dt$.
+
+The "flow payoff" over $[t, t+dt)$ becomes $\left[\upsilon F(z^1, \nu^1) + (1-\upsilon)F(z^2, \nu^2)\right] dt$, where $F$ is the Duffie-Epstein-Zin felicity {eq}`eq:felicity`.
+
+The expected change in the value function over $dt$ is captured by the **infinitesimal generator** $\mathcal{L}$.
+
+For a diffusion $d\upsilon = m \, dt + s \, dW$, Itô's lemma gives
+
+$$
+\mathcal{L}\tilde{J}(\upsilon) = m(\upsilon)\,\tilde{J}'(\upsilon) + \tfrac{1}{2} s(\upsilon)^2 \, \tilde{J}''(\upsilon),
+$$
+
+where $m$ and $s$ are the drift and diffusion of the Pareto share.
+
+This is the continuous-time analogue of $\beta \, \mathbb{E}[\tilde{J}(\upsilon')] - \tilde{J}(\upsilon)$: it measures how the value function drifts and fluctuates as $\upsilon$ evolves.
+
+Setting flow payoff plus expected capital gain equal to zero gives the schematic HJB equation:
+
+$$
+0 = \sup_{(z^1,z^2,\nu^1,\nu^2)} \left\{ \upsilon F(z^1, \nu^1) + (1-\upsilon) F(z^2, \nu^2) + \mathcal{L} \tilde{J}(\upsilon) \right\}
+$$ (eq:hjb_sketch)
+
+subject to $z^1 + z^2 \leq 1$.
+
+#### Exact reduced ODE
+
+Proposition 2.3 of {cite:t}`Borovicka2020` gives the exact HJB equation after substituting the homogeneity reduction $J(\tilde{\lambda}, Y) = (\tilde{\lambda}^1 + \tilde{\lambda}^2) Y^{1-\gamma} \tilde{J}(\upsilon)$ and the dynamics of $\upsilon$ and $Y$:
+
+$$
+0 = \sup_{(z^1, z^2, \nu^1, \nu^2)} \;
+\upsilon \, F(z^1, \nu^1) + (1 - \upsilon) \, F(z^2, \nu^2)
+$$ (eq:hjb)
+
+$$
++ \left[
+-\upsilon \nu^1 - (1-\upsilon)\nu^2
++ \bigl(\upsilon \omega^1 + (1-\upsilon)\omega^2\bigr)(1-\gamma)\sigma_Y
++ (1-\gamma)\mu_Y
++ \tfrac{1}{2}(1-\gamma)^2 \sigma_Y^2
+\right] \tilde{J}(\upsilon)
+$$
+
+$$
++ \upsilon(1-\upsilon)
+\left[\nu^2 - \nu^1 + (\omega^1 - \omega^2)(1-\gamma)\sigma_Y\right]
+\tilde{J}'(\upsilon)
+$$
+
+$$
++ \tfrac{1}{2}\upsilon^2(1-\upsilon)^2 (\omega^1 - \omega^2)^2 \,
+\tilde{J}''(\upsilon)
+$$
+
+subject to $z^1 + z^2 \leq 1$.
+
+The first line is the flow payoff from the two agents' felicity functions.
+
+The second line multiplies $\tilde{J}(\upsilon)$ by a term that combines the agents' discount rates, belief-weighted endowment drift, and a variance correction — these arise from absorbing the $Y^{1-\gamma}$ factor via Itô's lemma.
+
+The third line multiplies $\tilde{J}'(\upsilon)$ by the drift of the Pareto share, which depends on the difference in discount rates and the belief-weighted response to endowment risk.
+
+The fourth line multiplies $\tilde{J}''(\upsilon)$ by the squared diffusion of the Pareto share.
+
+The boundary conditions are $\tilde{J}(0) = \tilde{V}^2$ and $\tilde{J}(1) = \tilde{V}^1$, where $\tilde{V}^n$ is the continuation value in the homogeneous economy populated by agent $n$ alone.
+
+This is the continuous-time counterpart of the discrete-time planner's problem in {cite:t}`Blume_Easley2006` (see also {doc}`likelihood_ratio_process_2`).
+
+
+## Survival conditions
+
+The central result characterizes survival by the boundary behavior of $m_\vartheta(\upsilon)$.
+
+```{prf:proposition}
+:label: survival_conditions
+
+Define the following repelling conditions (i) and (ii) and their attracting
+counterparts (i') and (ii'):
+
+$$
+\text{(i)} \lim_{\upsilon \searrow 0} m_\vartheta(\upsilon) > 0, \qquad
+\text{(i')} \lim_{\upsilon \searrow 0} m_\vartheta(\upsilon) < 0
+$$
+
+$$
+\text{(ii)} \lim_{\upsilon \nearrow 1} m_\vartheta(\upsilon) < 0, \qquad
+\text{(ii')} \lim_{\upsilon \nearrow 1} m_\vartheta(\upsilon) > 0
+$$
+
+Then:
+
+*(a)* If (i) and (ii) hold, both agents survive under $P$.
+
+*(b)* If (i) and (ii') hold, agent 1 dominates in the long run under $P$.
+
+*(c)* If (i') and (ii) hold, agent 2 dominates in the long run under $P$.
+
+*(d)* If (i') and (ii') hold, each agent dominates with strictly positive probability.
+```
+
+The proof uses the Feller classification of boundary behavior for diffusions, as in {cite:t}`Karlin_Taylor1981`.
+
+Condition (i) says that when agent 1 is close to extinction, there is a force pushing her share back up.
+
+Condition (ii) says that when agent 1 is close to absorbing the whole economy, there is a force pushing her share back down.
+
+When both forces are present, the Pareto share is recurrent and both agents survive.
+
+## Wealth dynamics decomposition
+
+We now rewrite the survival conditions from {prf:ref}`survival_conditions` in terms of equilibrium wealth dynamics.
+
+Agent 1 survives near extinction if and only if her wealth grows faster than agent 2's when she is negligibly small.
+
+When $\upsilon \searrow 0$, prices are set entirely by agent 2, as if the economy were homogeneous.
+
+Agent 1 is a price-taker in agent 2's economy.
+
+Let $m_A^n(\upsilon)$ denote the expected log growth rate of agent $n$'s wealth.
+
+The difference decomposes into two channels:
+
+$$
+\lim_{\upsilon \searrow 0} [m_A^1(\upsilon) - m_A^2(\upsilon)]
+= \underbrace{\lim_{\upsilon \searrow 0} [m_R^1(\upsilon) - m_R^2(\upsilon)]}_{\text{portfolio returns}}
++ \underbrace{\lim_{\upsilon \searrow 0} [(y^2(\upsilon))^{-1} - (y^1(\upsilon))^{-1}]}_{\text{consumption-wealth ratios}}
+$$ (eq:wealth_decomp)
+
+The first term measures how much faster agent 1's portfolio grows.
+
+The second measures how much less agent 1 consumes out of wealth — a lower consumption-wealth ratio means more saving and faster wealth accumulation.
+
+When this total difference is positive, agent 1 survives; when negative, she shrinks toward extinction.
+
+```{exercise}
+:label: ex_wealth_decomp
+
+Derive {eq}`eq:wealth_decomp`.
+
+Let $A^n$ denote agent $n$'s wealth and $C^n$ her consumption. 
+
+The budget constraint is $dA^n = A^n dR^n - C^n dt$, where $dR^n$ is the return on agent $n$'s portfolio. 
+
+Define the consumption-wealth ratio $c^n = C^n / A^n = (y^n)^{-1}$.
+
+Show that $d\log A^n = m_R^n \, dt - (y^n)^{-1} dt + \ldots$, so the difference in expected log wealth growth is $m_A^1 - m_A^2 = (m_R^1 - m_R^2) + [(y^2)^{-1} - (y^1)^{-1}]$.
+```
+
+```{solution-start} ex_wealth_decomp
+:class: dropdown
+```
+
+Dividing the budget constraint by $A^n$:
+
+$$
+\frac{dA^n}{A^n} = dR^n - (y^n)^{-1} dt.
+$$
+
+By Itô's lemma, $d\log A^n = \frac{dA^n}{A^n} - \frac{1}{2}\left(\frac{dA^n}{A^n}\right)^2$.
+
+Write $dR^n = m_R^n \, dt + \sigma_R^n \, dW$ (the portfolio return under $P$). 
+
+Then
+
+$$
+d\log A^n = \left(m_R^n - (y^n)^{-1} - \tfrac{1}{2}(\sigma_R^n)^2\right) dt + \sigma_R^n \, dW.
+$$
+
+Taking the difference for agents 1 and 2:
+
+$$
+m_A^1 - m_A^2 = (m_R^1 - m_R^2) + \left[(y^2)^{-1} - (y^1)^{-1}\right] - \tfrac{1}{2}\left[(\sigma_R^1)^2 - (\sigma_R^2)^2\right].
+$$
+
+The volatility terms $\tfrac{1}{2}[(\sigma_R^1)^2 - (\sigma_R^2)^2]$ are absorbed into $m_R^1 - m_R^2$ when we define $m_R^n$ as the expected log portfolio return (i.e., the drift of $\log R^n$ rather than the arithmetic return), giving {eq}`eq:wealth_decomp`.
+
+```{solution-end}
+```
+
+### Portfolio returns
+
+At the boundary $\upsilon \searrow 0$, the difference in expected log portfolio returns is
+
+$$
+\lim_{\upsilon \searrow 0} [m_R^1 - m_R^2]
+= \underbrace{\frac{\omega^1 - \omega^2}{\gamma \sigma_Y}}_{\text{difference in risky shares}}
+\cdot \underbrace{(\gamma \sigma_Y^2 - \omega^2 \sigma_Y)}_{\text{risk premium}}
+- \underbrace{\frac{\omega^1 - \omega^2}{\gamma}
+\left(\sigma_Y + \frac{\omega^1 - \omega^2}{2\gamma}\right)}_{\text{volatility term}}
+$$ (eq:portfolio_returns)
+
+An optimistic agent ($\omega^1 > \omega^2$) overweights the risky asset by $(\omega^1 - \omega^2)/(\gamma \sigma_Y)$ relative to agent 2 and earns the equity risk premium on that extra exposure.
+
+The subtracted *volatility penalty* reflects the cost of holding a more extreme portfolio: higher variance of log returns drags down expected log wealth growth.
+
+This term depends on risk aversion $\gamma$ but not on the IES, because portfolio choice is determined by risk aversion alone.
+
+```{exercise}
+:label: ex_portfolio_returns
+
+Derive {eq}`eq:portfolio_returns`.
+
+At the boundary $\upsilon \searrow 0$, agent $n$'s optimal risky-asset share is $\pi^n = 1 + (\omega^n - \omega^2)/(\gamma \sigma_Y)$ (see {eq}`eq:portfolio`). 
+
+Let $\bar{\mu}_R = \mu_Y + \gamma \sigma_Y^2 - \omega^2 \sigma_Y$ denote the expected return on the risky asset under $P$, and $r$ the risk-free rate.
+
+The continuously rebalanced portfolio has expected log return $m_R^n = r + \pi^n(\bar{\mu}_R - r) - \frac{1}{2}(\pi^n)^2 \sigma_Y^2$.
+
+Compute $m_R^1 - m_R^2$ and simplify.
+```
+
+```{solution-start} ex_portfolio_returns
+:class: dropdown
+```
+
+Using $m_R^n = r + \pi^n(\bar{\mu}_R - r) - \frac{1}{2}(\pi^n)^2 \sigma_Y^2$, the difference is
+
+$$
+m_R^1 - m_R^2 = (\pi^1 - \pi^2)(\bar{\mu}_R - r) - \tfrac{1}{2}[(\pi^1)^2 - (\pi^2)^2]\sigma_Y^2.
+$$
+
+The difference in risky shares is $\pi^1 - \pi^2 = (\omega^1 - \omega^2)/(\gamma \sigma_Y)$.
+
+The arithmetic equity premium is $\bar{\mu}_R - r = \gamma \sigma_Y^2 - \omega^2 \sigma_Y$, so:
+
+$$
+(\pi^1 - \pi^2)(\bar{\mu}_R - r) = \frac{\omega^1 - \omega^2}{\gamma \sigma_Y} \cdot (\gamma \sigma_Y^2 - \omega^2 \sigma_Y).
+$$
+
+For the volatility term, write $(\pi^1)^2 - (\pi^2)^2 = (\pi^1 - \pi^2)(\pi^1 + \pi^2)$ and note $\pi^1 + \pi^2 = 2 + (\omega^1 + \omega^2 - 2\omega^2)/(\gamma \sigma_Y)$. 
+
+After simplification:
+
+$$
+\tfrac{1}{2}[(\pi^1)^2 - (\pi^2)^2]\sigma_Y^2 = \frac{\omega^1 - \omega^2}{\gamma}\left(\sigma_Y + \frac{\omega^1 - \omega^2}{2\gamma}\right).
+$$
+
+Combining the two pieces gives {eq}`eq:portfolio_returns`.
+
+```{solution-end}
+```
+
+### Consumption-wealth ratios
+
+The difference in consumption-wealth ratios at the boundary is
+
+$$
+\lim_{\upsilon \searrow 0} [(y^2)^{-1} - (y^1)^{-1}]
+= \frac{1-\rho}{\rho} \left[(\omega^1 - \omega^2)\sigma_Y + \frac{(\omega^1 - \omega^2)^2}{2\gamma}\right]
+$$ (eq:consumption_rates)
+
+The term in brackets is the difference in *subjective* expected portfolio returns — what agent 1 believes she earns relative to agent 2.
+
+The factor $(1-\rho)/\rho$ translates this perceived return advantage into a saving response.
+
+- When IES $> 1$ ($\rho < 1$), the factor is positive: a higher perceived return makes the agent save more, because the substitution effect dominates the income effect.
+- When IES $< 1$ ($\rho > 1$), the factor is negative: the income effect dominates and the agent saves less, working against survival.
+- When IES $= 1$ ($\rho = 1$), the two effects cancel and the saving channel vanishes entirely.
+
+This is the channel through which recursive preferences alter survival outcomes by separating $\gamma$ from $\rho$.
+
+```{exercise}
+:label: ex_consumption_wealth
+
+Derive {eq}`eq:consumption_rates`.
+
+In the homogeneous economy populated by agent 2, the consumption-wealth ratio is $(y(0))^{-1} = \beta - (1-\rho)\mu_V^2$, where $\mu_V^2$ is agent 2's expected log return on wealth. 
+
+Agent 1, as a negligible price-taker, has consumption-wealth ratio $(y^1)^{-1} = \beta - (1-\rho)\mu_V^1$, where $\mu_V^1$ is her own expected log return.
+
+Use $(y^2)^{-1} - (y^1)^{-1} = (1-\rho)(\mu_V^1 - \mu_V^2)$ and express $\mu_V^1 - \mu_V^2$ in terms of agent 1's *subjective* expected excess return.
+
+*Hint:* Under agent 1's beliefs, her portfolio earns an extra $(\omega^1 - \omega^2)\sigma_Y + (\omega^1 - \omega^2)^2/(2\gamma)$ in expected log returns relative to agent 2's portfolio.
+```
+
+```{solution-start} ex_consumption_wealth
+:class: dropdown
+```
+
+The consumption-wealth ratio for agent $n$ satisfies $(y^n)^{-1} = \beta - (1-\rho)\mu_V^n$, where $\mu_V^n$ is the expected log return on agent $n$'s wealth under her own subjective measure.
+
+Taking the difference:
+
+$$
+(y^2)^{-1} - (y^1)^{-1} = (1-\rho)(\mu_V^1 - \mu_V^2).
+$$
+
+Agent 1's subjective expected log portfolio return exceeds agent 2's by the amount she believes she gains from tilting toward the risky asset.
+
+Her extra risky share is $\pi^1 - 1 = (\omega^1 - \omega^2)/(\gamma\sigma_Y)$, and under her subjective measure $Q^1$ the risky asset's expected excess log return is $(\gamma\sigma_Y^2 + (\omega^1 - \omega^2)\sigma_Y - \omega^2\sigma_Y) - r - \frac{1}{2}\sigma_Y^2$.
+
+After simplification, the subjective expected log return difference is
+
+$$
+\mu_V^1 - \mu_V^2 = (\omega^1 - \omega^2)\sigma_Y + \frac{(\omega^1 - \omega^2)^2}{2\gamma}.
+$$
+
+Substituting and dividing through by $\rho$ (from the relationship between $(y^n)^{-1}$ and $\beta$):
+
+$$
+(y^2)^{-1} - (y^1)^{-1} = \frac{1-\rho}{\rho}\left[(\omega^1 - \omega^2)\sigma_Y + \frac{(\omega^1 - \omega^2)^2}{2\gamma}\right].
+$$
+
+```{solution-end}
+```
+
+### Two comparative statics
+
+Survival depends on $\gamma$, $\rho$, and the signal-to-noise ratios $\omega^1 / \sigma_Y$ and $\omega^2 / \sigma_Y$, not on $\omega^1$, $\omega^2$, and $\sigma_Y$ separately.
+
+The survival conditions do not depend on $\beta$ or $\mu_Y$, which affect the level of consumption and prices but not relative wealth dynamics at the boundary.
+
+```{code-cell} ipython3
+def portfolio_return_diff(ω_1, ω_2, γ, σ_y):
+    """
+    Difference in expected log portfolio returns at the boundary.
+    """
+    Δω = ω_1 - ω_2
+    risky_share_diff = Δω / (γ * σ_y)
+    risk_premium = γ * σ_y**2 - ω_2 * σ_y
+    volatility_term = (Δω / γ) * (σ_y + 0.5 * Δω / γ)
+    return risky_share_diff * risk_premium - volatility_term
+
+
+def saving_channel(ω_1, ω_2, γ, ρ, σ_y):
+    """
+    Difference in consumption-wealth ratios at the boundary.
+    """
+    Δω = ω_1 - ω_2
+    subjective_return_diff = Δω * σ_y + Δω**2 / (2 * γ)
+    return (1 - ρ) / ρ * subjective_return_diff
+
+
+def boundary_drift(ω_1, ω_2, γ, ρ, σ_y):
+    """
+    Boundary drift m_ϑ when agent 1 becomes negligible.
+
+    Positive drift means agent 1 survives (repelling boundary).
+    """
+    return γ * (
+        portfolio_return_diff(ω_1, ω_2, γ, σ_y)
+        + saving_channel(ω_1, ω_2, γ, ρ, σ_y)
+    )
+```
+
+## Survival regions
+
+A central contribution of {cite:t}`Borovicka2020` is the characterization of survival regions in the $(\gamma, \rho)$ plane.
+
+Under separable preferences, $\gamma = \rho$, the agent with more accurate beliefs always dominates.
+
+Under recursive preferences, all four outcomes in {prf:ref}`survival_conditions` can occur.
+
+Figure 2 in the paper studies the case where agent 2 has correct beliefs, so $\omega^2 = 0$.
+
+The next cell follows that figure.
+
+```{code-cell} ipython3
+def compute_survival_boundary(ω_1, ω_2, σ_y, γ_grid, boundary="lower"):
+    """
+    Compute the curve in (γ, ρ) space where the boundary drift is zero.
+
+    For boundary='lower', agent 1 is the small agent.
+    For boundary='upper', agent 2 is the small agent.
+    """
+    ρ_boundary = []
+
+    if boundary == "lower":
+        small_agent = (ω_1, ω_2)
+    else:
+        small_agent = (ω_2, ω_1)
+
+    ω_small, ω_large = small_agent
+
+    for γ in γ_grid:
+        pr = portfolio_return_diff(ω_small, ω_large, γ, σ_y)
+        Δω = ω_small - ω_large
+        subj_ret = Δω * σ_y + Δω**2 / (2 * γ)
+
+        if abs(subj_ret) < 1e-14:
+            ρ_boundary.append(np.nan)
+            continue
+
+        denom = subj_ret - pr
+        if abs(denom) < 1e-14:
+            ρ_boundary.append(np.nan)
+        else:
+            ρ_boundary.append(subj_ret / denom)
+
+    return np.asarray(ρ_boundary)
+
+
+def compute_limit_boundary(γ_grid, boundary="lower"):
+    """
+    Boundary curves for the limit |ω_1| / σ_y -> ∞.
+
+    This is equivalent to the constant-endowment case discussed in the paper.
+    """
+    if boundary == "lower":
+        return γ_grid / (1 + γ_grid)
+
+    ρ = np.full_like(γ_grid, np.nan, dtype=float)
+    mask = γ_grid < 1
+    ρ[mask] = γ_grid[mask] / (1 - γ_grid[mask])
+    return ρ
+```
+
+```{code-cell} ipython3
+---
+tags: [hide-input]
+mystnb:
+  figure:
+    caption: Survival regions corresponding to Figure 2 in Borovicka (2020)
+    name: fig-survival-regions
+---
+σ_y = 0.02
+γ_vals = np.linspace(0.01, 6.0, 500)
+ρ_vals = np.linspace(0.01, 2.0, 400)
+G, R = np.meshgrid(γ_vals, ρ_vals)
+
+panel_specs = [
+    ("finite", 0.10, r"$\omega^1 = 0.10$"),
+    ("finite", 0.20, r"$\omega^1 = 0.20$"),
+    ("limit", None, r"$|\omega^1| / \sigma_Y \to \infty$"),
+    ("finite", -0.25, r"$\omega^1 = -0.25$"),
+]
+
+fig, axes = plt.subplots(2, 2, figsize=(13, 10), sharex=True, sharey=True)
+
+for idx, (case, value, label) in enumerate(panel_specs):
+    ax = axes.flat[idx]
+
+    if case == "limit":
+        ρ_1 = compute_limit_boundary(γ_vals, boundary="lower")
+        ρ_2 = compute_limit_boundary(γ_vals, boundary="upper")
+        # Limit boundary drifts: use closed-form expressions
+        # m0 > 0 (agent 1 survives) when ρ < γ/(1+γ)
+        m0 = G - (1 + G) * R
+        # m1 < 0 (agent 2 survives) when ρ < γ/(1-γ) for γ<1, always for γ>=1
+        m1 = (1 - G) * R - G
+    else:
+        ρ_1 = compute_survival_boundary(value, 0.0, σ_y, γ_vals,
+                                        boundary="lower")
+        ρ_2 = compute_survival_boundary(value, 0.0, σ_y, γ_vals,
+                                        boundary="upper")
+        # Evaluate boundary drifts on the grid
+        m0 = boundary_drift(value, 0.0, G, R, σ_y)
+        m1 = -boundary_drift(0.0, value, G, R, σ_y)
+
+    # Classify all four regions
+    both = (m0 > 0) & (m1 < 0)
+    ag1_dom = (m0 > 0) & (m1 > 0)
+    ag2_dom = (m0 < 0) & (m1 < 0)
+    either = (m0 < 0) & (m1 > 0)
+
+    # Shade coexistence region
+    ax.contourf(G, R, both.astype(float), levels=[0.5, 1.5],
+                colors=["C2"], alpha=0.18)
+    if idx == 0:
+        ax.fill_between([], [], color="C2", alpha=0.18,
+                        label="both survive")
+
+    # Plot boundary curves
+    ax.contour(G, R, m0, levels=[0], colors=["C0"],
+               linestyles="--", linewidths=2)
+    ax.contour(G, R, m1, levels=[0], colors=["C3"],
+               linestyles="-", linewidths=2)
+    if idx == 0:
+        ax.plot([], [], "--", color="C0", lw=2, label="agent 1 boundary")
+        ax.plot([], [], "-", color="C3", lw=2, label="agent 2 boundary")
+
+    ax.plot(
+        γ_vals, γ_vals, ":", color="black", lw=2,
+        label=r"$\gamma = \rho$" if idx == 0 else None
+    )
+
+    tkw = dict(ha="center", va="center", style="italic", color="0.15")
+    if case == "finite" and value == 0.10:
+        ax.text(0.31, 1.05, "either agent dominates", rotation=90,
+                fontsize=10, **tkw)
+        ax.text(1.8, 1.55, "agent 2\ndominates", fontsize=11, **tkw)
+        ax.text(3.5, 0.75, "both\nsurvive", fontsize=11, **tkw)
+        if ag1_dom.any():
+            ax.text(5.0, 0.25, "agent 1\ndominates", fontsize=11, **tkw)
+    elif case == "finite" and value == 0.20:
+        ax.text(0.31, 1.05, "either agent dominates", rotation=90,
+                fontsize=10, **tkw)
+        ax.text(2.5, 1.55, "agent 2\ndominates", fontsize=11, **tkw)
+        ax.text(3.8, 0.55, "both\nsurvive", fontsize=11, **tkw)
+        if ag1_dom.any():
+            ax.text(5.2, 0.08, "agent 1\ndominates", fontsize=9, **tkw)
+    elif case == "limit":
+        ax.text(0.31, 1.05, "either agent dominates", rotation=90,
+                fontsize=10, **tkw)
+        ax.text(3.0, 1.40, "agent 2\ndominates", fontsize=11, **tkw)
+        ax.text(3.5, 0.30, "both\nsurvive", fontsize=11, **tkw)
+    elif case == "finite" and value == -0.25:
+        ax.text(0.31, 1.05, "either agent dominates", rotation=90,
+                fontsize=10, **tkw)
+        ax.text(3.5, 1.20, "agent 2\ndominates", fontsize=11, **tkw)
+        ax.text(2.5, 0.18, "both\nsurvive", fontsize=11, **tkw)
+
+    ax.set_title(label, fontsize=12)
+    ax.set_xlim(0, 6)
+    ax.set_ylim(0, 2)
+    ax.set_xlabel(r"$\gamma$")
+    ax.set_ylabel(r"$\rho$")
+
+axes[0, 0].legend(loc="upper left", fontsize=9)
+plt.tight_layout()
+plt.show()
+```
+
+Each panel plots two curves in the $(\gamma, \rho)$ plane for a different value of agent 1's belief distortion $\omega^1$ (agent 2 has correct beliefs, $\omega^2 = 0$).
+
+- The dashed curve (blue) is where the boundary drift at $\upsilon = 0$ equals zero — condition (i) in {prf:ref}`survival_conditions`.
+- The solid curve (red) is where the boundary drift at $\upsilon = 1$ equals zero — condition (ii).
+- The shaded region between the two curves is where both agents survive.
+- The dotted diagonal $\gamma = \rho$ is the separable CRRA case, along which the agent with more accurate beliefs always dominates.
+
+Moderate optimism ($\omega^1 = 0.10$) produces a wide coexistence region that extends across most of the $\gamma$ range.
+
+Stronger optimism ($\omega^1 = 0.20$) narrows the region: the agent 2 boundary shifts out of the plotted range for moderate and large $\gamma$, shrinking the set of $(\gamma, \rho)$ pairs where both agents coexist.
+
+In the limit $|\omega^1|/\sigma_Y \to \infty$ (bottom-left), the boundaries simplify to closed-form expressions.
+
+The coexistence region narrows but extends to large $\gamma$ values below the agent 2 boundary curve.
+
+Pessimistic distortions ($\omega^1 = -0.25$, bottom-right) can also survive, but only in a much narrower part of the parameter space.
+
+## Three survival channels
+
+The decomposition above can be visualized directly.
+
+```{code-cell} ipython3
+def decompose_survival(ω_1, ω_2, γ_grid, ρ, σ_y):
+    """
+    Decompose the wealth-growth differential in proposition 3.4.
+    """
+    Δω = ω_1 - ω_2
+    risk_premium_term = Δω * (γ_grid * σ_y - ω_2) / γ_grid
+    volatility_term = -(Δω / γ_grid) * (σ_y + 0.5 * Δω / γ_grid)
+    saving_term = (1 - ρ) / ρ * (Δω * σ_y + Δω**2 / (2 * γ_grid))
+    total = risk_premium_term + volatility_term + saving_term
+    return risk_premium_term, volatility_term, saving_term, total
+
+
+ω_1 = 0.25
+ω_2 = 0.0
+ρ = 0.67
+σ_y = 0.02
+γ_grid = np.linspace(0.5, 15.0, 300)
+
+risk_term, vol_term, save_term, total = decompose_survival(
+    ω_1, ω_2, γ_grid, ρ, σ_y
+)
+
+fig, ax = plt.subplots(figsize=(11, 6))
+ax.plot(γ_grid, risk_term, color="C0", lw=2, label="risk premium term")
+ax.plot(γ_grid, vol_term, "--", color="C3", lw=2, label="volatility term")
+ax.plot(γ_grid, save_term, "-.", color="C2", lw=2, label="saving term")
+ax.plot(γ_grid, total, color="black", lw=2, label="total")
+ax.axhline(0, color="gray", lw=1)
+ax.set_xlabel(r"risk aversion $\gamma$")
+ax.set_ylabel("contribution to wealth-growth differential")
+ax.legend()
+plt.tight_layout()
+plt.show()
+```
+
+This figure decomposes the boundary drift at $\upsilon = 0$ into three terms for an optimistic agent ($\omega^1 = {0.25}$, $\omega^2 = 0$) with IES $= 1/\rho \approx 1.49$ and $\sigma_Y = 0.02$.
+
+- The risk premium term (blue) is positive throughout because the optimistic agent overweights the risky asset and earns the equity premium.
+- The volatility term (red dashed) is negative and large at low $\gamma$, reflecting the cost of holding a volatile portfolio.
+- The saving term (green dash-dot) is positive when IES $> 1$ because the optimistic agent perceives a high return on wealth and saves more aggressively.
+- The total (black) crosses zero at the critical $\gamma$ below which the volatility penalty dominates and the agent cannot survive.
+
+## Varying the IES
+
+The sign of the saving term is pinned down by the IES.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Boundary decomposition for different IES values
+    name: fig-survival-ies-panels
+---
+fig, axes = plt.subplots(1, 3, figsize=(16, 4.5), sharey=True)
+
+ω_1 = 0.25
+ω_2 = 0.0
+σ_y = 0.02
+γ_grid = np.linspace(0.5, 25.0, 300)
+
+ies_values = [0.5, 1.0, 1.5]
+
+for idx, ies in enumerate(ies_values):
+    ρ = 1.0 / ies
+    risk_term, vol_term, save_term, total = decompose_survival(
+        ω_1, ω_2, γ_grid, ρ, σ_y
+    )
+
+    ax = axes[idx]
+    ax.plot(γ_grid, risk_term, color="C0", lw=2, label="risk premium")
+    ax.plot(γ_grid, vol_term, "--", color="C3", lw=2, label="volatility")
+    ax.plot(γ_grid, save_term, "-.", color="C2", lw=2, label="saving")
+    ax.plot(γ_grid, total, color="black", lw=2, label="total")
+    ax.axhline(0, color="gray", lw=1)
+    ax.set_title(f"IES = {ies:.1f}", fontsize=12)
+    ax.set_xlabel(r"risk aversion $\gamma$")
+    ax.set_ylabel("contribution")
+
+axes[0].legend(fontsize=9)
+plt.tight_layout()
+plt.show()
+```
+
+Each panel shows the same three-term decomposition as the previous figure, but now for three different values of the IES ($\omega^1 = 0.25$, $\omega^2 = 0$, $\sigma_Y = 0.02$).
+
+- Left panel (IES $= 0.5$): the saving term is negative, so the optimistic agent actually saves less, working against survival.
+- Center panel (IES $= 1.0$): the saving term vanishes entirely, so only the portfolio return and volatility channels remain. 
+
+    - This eliminates the saving channel but does not by itself reproduce the full separable CRRA benchmark, which requires $\gamma = \rho$ (i.e., IES $= 1/\gamma$), not merely $\rho = 1$.
+- Right panel (IES $= 1.5$): the saving term is positive and shifts the total drift upward, expanding the range of $\gamma$ values for which the optimistic agent survives.
+
+## Asymptotic results
+
+Borovicka derives several useful asymptotic results.
+
+1. As $\gamma \searrow 0$, each agent dominates with strictly positive probability.
+1. As $\gamma \nearrow \infty$, the relatively more optimistic agent dominates.
+1. As $\rho \searrow 0$, the relatively more optimistic agent always survives.
+   - The relatively more pessimistic agent can also survive when risk aversion is sufficiently low.
+1. As $\rho \nearrow \infty$, a nondegenerate long-run equilibrium cannot exist.
+
+The next figure illustrates the first result by plotting both boundary drifts as $\gamma$ becomes small.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Boundary drifts for small risk aversion
+    name: fig-boundary-drifts-small-gamma
+---
+ω_1 = 0.25
+ω_2 = 0.0
+ρ = 0.67
+σ_y = 0.02
+γ_grid = np.linspace(0.05, 5.0, 300)
+
+drift_at_0 = np.array([boundary_drift(ω_1, ω_2, γ, ρ, σ_y) for γ in γ_grid])
+drift_at_1 = np.array([-boundary_drift(ω_2, ω_1, γ, ρ, σ_y) for γ in γ_grid])
+
+fig, ax = plt.subplots(figsize=(10, 5))
+ax.plot(γ_grid, drift_at_0, color="C0", lw=2, label=r"$\upsilon \to 0$")
+ax.plot(γ_grid, drift_at_1, "--", color="C3", lw=2, label=r"$\upsilon \to 1$")
+ax.axhline(0, color="gray", lw=1)
+ax.set_xlabel(r"risk aversion $\gamma$")
+ax.set_ylabel("boundary drift")
+ax.legend()
+plt.tight_layout()
+plt.show()
+```
+
+This figure plots the two boundary drifts as a function of $\gamma$ ($\omega^1 = 0.25$, $\omega^2 = 0$, IES $\approx 1.49$).
+
+- The solid blue curve is the drift $m_\vartheta$ at $\upsilon \to 0$ (agent 1 near extinction); coexistence requires this to be positive (condition (i)).
+- The dashed red curve is the drift $m_\vartheta$ at $\upsilon \to 1$ (agent 2 near extinction); coexistence requires this to be negative (condition (ii)).
+
+The figure illustrates asymptotic result 1.
+
+For small $\gamma$, the blue curve is negative and the red curve is positive.
+
+Both boundaries are attracting: near $\upsilon = 0$ the negative drift pulls $\upsilon$ toward 0, and near $\upsilon = 1$ the positive drift pushes $\upsilon$ toward 1.
+
+This is outcome (d) in {prf:ref}`survival_conditions`: neither boundary is repelling, so whichever agent happens to get ahead early will dominate, with each agent having strictly positive probability of dominance depending on the realized Brownian path.
+
+As $\gamma$ increases past roughly 1, the blue curve crosses zero and becomes positive while the red curve stays negative.
+
+Now both boundaries are repelling and we enter the coexistence region — outcome (a).
+
+## The separable case
+
+When $\gamma = \rho$, the model collapses to the separable CRRA benchmark.
+
+In that case, the log-odds process becomes
+
+$$
+d\vartheta_t = \frac{1}{2}\left[(\omega^2)^2 - (\omega^1)^2\right] dt + (\omega^1 - \omega^2) dW_t .
+$$
+
+The drift is constant and depends only on the relative entropy of the two belief distortions.
+
+The agent with the smaller $|\omega^n|$ dominates under $P$.
+
+If the two agents have equal magnitudes of belief distortions, neither becomes extinct almost surely, but no nondegenerate stationary wealth distribution exists.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Pareto-share paths in the separable benchmark
+    name: fig-crra-pareto-paths
+---
+def simulate_crra_pareto(ω_1, ω_2, T, dt, n_paths, seed=42):
+    """
+    Simulate Pareto-share dynamics in the separable benchmark.
+    """
+    rng = np.random.default_rng(seed)
+    n_steps = int(T / dt)
+    t_grid = np.linspace(0, T, n_steps + 1)
+
+    drift = 0.5 * (ω_2**2 - ω_1**2)
+    volatility = ω_1 - ω_2
+
+    θ = np.zeros((n_paths, n_steps + 1))
+    dW = rng.normal(0.0, np.sqrt(dt), size=(n_paths, n_steps))
+
+    for t in range(n_steps):
+        θ[:, t + 1] = θ[:, t] + drift * dt + volatility * dW[:, t]
+
+    υ_paths = 1.0 / (1.0 + np.exp(-θ))
+    return t_grid, υ_paths
+
+
+ω_1 = 0.10
+ω_2 = 0.0
+t_grid, υ_paths = simulate_crra_pareto(ω_1, ω_2, T=200, dt=0.01, n_paths=50)
+
+fig, ax = plt.subplots(figsize=(11, 5))
+
+for i in range(20):
+    ax.plot(t_grid, υ_paths[i], color="C0", alpha=0.25, lw=1)
+
+ax.axhline(0.5, color="gray", linestyle=":", lw=1)
+ax.set_xlabel("time")
+ax.set_ylabel(r"Pareto share $\upsilon_t$")
+ax.set_ylim(0, 1)
+plt.tight_layout()
+plt.show()
+```
+
+This figure simulates 20 sample paths of the Pareto share $\upsilon_t$ under separable CRRA preferences ($\gamma = \rho$) with $\omega^1 = 0.10$ and $\omega^2 = 0$.
+
+Agent 2 has correct beliefs, so the log-odds drift is negative and all paths trend toward $\upsilon = 0$.
+
+Agent 1 is driven to extinction — the classical market-selection result of {cite:t}`Blume_Easley2006`.
+
+## Asset pricing implications
+
+As one agent becomes negligible, current prices converge to those of the homogeneous economy populated by the large agent.
+
+When agent 2 is the large agent, Proposition 5.1 in {cite:t}`Borovicka2020` implies
+
+$$
+\lim_{\upsilon \searrow 0} r(\upsilon)
+= \beta + \rho \left(\mu_Y + \omega^2 \sigma_Y
++ \frac{1}{2} (1 - \gamma) \sigma_Y^2\right)
+- \frac{1}{2} \gamma \sigma_Y^2
+$$ (eq:riskfree)
+
+and
+
+$$
+\lim_{\upsilon \searrow 0} y(\upsilon)
+= \left[
+\beta - (1 - \rho)
+\left(
+\mu_Y + \omega^2 \sigma_Y + \frac{1}{2} (1 - \gamma) \sigma_Y^2
+\right)
+\right]^{-1} .
+$$ (eq:wc_ratio)
+
+The aggregate wealth dynamics also converge to those of the homogeneous economy:
+
+$$
+\lim_{\upsilon \searrow 0} m_A(\upsilon) = \mu_Y,
+\qquad
+\lim_{\upsilon \searrow 0} \sigma_A(\upsilon) = \sigma_Y .
+$$
+
+Proposition 5.3 then gives the negligible agent's own consumption-saving and portfolio choices.
+
+Her consumption-wealth ratio converges to
+
+$$
+\lim_{\upsilon \searrow 0} (y^1(\upsilon))^{-1}
+= (y(0))^{-1}
+- \frac{1-\rho}{\rho}
+\left[
+(\omega^1 - \omega^2)\sigma_Y
++ \frac{(\omega^1 - \omega^2)^2}{2 \gamma}
+\right] .
+$$
+
+The small agent's risky-asset share converges to
+
+$$
+\lim_{\upsilon \searrow 0} \pi^1(\upsilon)
+= 1 + \frac{\omega^1 - \omega^2}{\gamma \sigma_Y} .
+$$ (eq:portfolio)
+
+Hence optimism implies leverage, while sufficiently strong pessimism implies shorting.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Limiting risky-asset shares of the small agent
+    name: fig-limiting-portfolio-shares
+---
+ω_2 = 0.0
+σ_y = 0.02
+ω_grid = np.linspace(-0.5, 1.0, 300)
+
+fig, ax = plt.subplots(figsize=(10, 5))
+
+for γ in [2, 5, 10, 20]:
+    π_1 = 1 + (ω_grid - ω_2) / (γ * σ_y)
+    ax.plot(ω_grid, π_1, lw=2, label=rf"$\gamma = {γ}$")
+
+ax.axhline(1.0, color="gray", linestyle=":", lw=1)
+ax.axhline(0.0, color="gray", linestyle=":", lw=1)
+ax.axvline(0.0, color="gray", linestyle=":", lw=1)
+ax.set_xlabel(r"belief distortion $\omega^1$")
+ax.set_ylabel(r"risky share $\pi^1$")
+ax.legend()
+plt.tight_layout()
+plt.show()
+```
+
+This figure plots the limiting risky-asset share $\pi^1$ of the negligible agent as a function of her belief distortion $\omega^1$ ($\omega^2 = 0$, $\sigma_Y = 0.02$), for four levels of risk aversion.
+
+At $\omega^1 = 0$ the agent agrees with agent 2 and holds the market portfolio ($\pi^1 = 1$).
+
+Optimism ($\omega^1 > 0$) leads to leverage ($\pi^1 > 1$), while sufficient pessimism ($\omega^1 < 0$) leads to shorting ($\pi^1 < 0$).
+
+Higher risk aversion compresses these deviations toward one.
+
+## Optimistic and pessimistic distortions
+
+Optimistic and pessimistic beliefs affect survival asymmetrically.
+
+An optimistic agent benefits from the risk premium term and, when IES $> 1$, from the saving term as well.
+
+A pessimistic agent gives up the risk premium and can survive only if the saving effect is strong enough to offset that loss.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Total boundary drift for optimistic and pessimistic distortions
+    name: fig-optimistic-pessimistic-drifts
+---
+fig, axes = plt.subplots(1, 2, figsize=(14, 5), sharey=True)
+
+σ_y = 0.02
+ω_2 = 0.0
+ρ = 0.67
+γ_grid = np.linspace(0.5, 25.0, 300)
+
+ax = axes[0]
+for ω_1 in [0.1, 0.25, 0.5, 1.0]:
+    _, _, _, total = decompose_survival(ω_1, ω_2, γ_grid, ρ, σ_y)
+    ax.plot(γ_grid, total, lw=2, label=rf"$\omega^1 = {ω_1}$")
+ax.axhline(0, color="gray", lw=1)
+ax.set_title("optimistic", fontsize=12)
+ax.set_xlabel(r"risk aversion $\gamma$")
+ax.set_ylabel("boundary drift")
+ax.legend(fontsize=9)
+
+ax = axes[1]
+for ω_1 in [-0.1, -0.25, -0.5, -1.0]:
+    _, _, _, total = decompose_survival(ω_1, ω_2, γ_grid, ρ, σ_y)
+    ax.plot(γ_grid, total, lw=2, label=rf"$\omega^1 = {ω_1}$")
+ax.axhline(0, color="gray", lw=1)
+ax.set_title("pessimistic", fontsize=12)
+ax.set_xlabel(r"risk aversion $\gamma$")
+ax.legend(fontsize=9)
+
+plt.tight_layout()
+plt.show()
+```
+
+Both panels plot the total boundary drift at $\upsilon = 0$ as a function of $\gamma$ (IES $\approx 1.49$, $\omega^2 = 0$).
+
+Where the curve is positive, agent 1 survives near extinction.
+
+- Left panel (optimistic agent): larger $\omega^1$ means a bigger bet on the risky asset, so the volatility penalty dominates at low $\gamma$ but the drift turns positive once $\gamma$ is large enough.
+- Right panel (pessimistic agent): a pessimistic agent gives up the risk premium by underweighting the risky asset, so the drift is negative for most of the parameter space and survival requires saving motives strong enough to offset the portfolio losses.
+
+## Long-run consumption distribution
+
+When both agents survive, the Pareto share keeps moving across the whole interval $(0, 1)$.
+
+The next simulation is only a toy approximation.
+
+It interpolates the drift between its two boundary values, so it illustrates the recurrence logic without solving the full equilibrium ODE.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: A toy stationary Pareto-share simulation
+    name: fig-toy-stationary-pareto-share
+---
+def simulate_pareto_share_toy(ω_1, ω_2, γ, ρ, σ_y, T, dt, n_paths=20, seed=42):
+    """
+    Simulate a toy Pareto-share process by interpolating boundary drifts.
+    """
+    rng = np.random.default_rng(seed)
+    n_steps = int(T / dt)
+    t_grid = np.linspace(0, T, n_steps + 1)
+
+    volatility = ω_1 - ω_2
+    m_0 = boundary_drift(ω_1, ω_2, γ, ρ, σ_y)
+    m_1 = -boundary_drift(ω_2, ω_1, γ, ρ, σ_y)
+
+    θ = np.zeros((n_paths, n_steps + 1))
+    dW = rng.normal(0.0, np.sqrt(dt), size=(n_paths, n_steps))
+
+    for t in range(n_steps):
+        υ = 1.0 / (1.0 + np.exp(-θ[:, t]))
+        drift = m_0 * (1 - υ) + m_1 * υ
+        θ[:, t + 1] = θ[:, t] + drift * dt + volatility * dW[:, t]
+
+    υ_paths = 1.0 / (1.0 + np.exp(-θ))
+    return t_grid, υ_paths
+
+
+ω_1 = 0.25
+ω_2 = 0.0
+γ = 5.0
+ρ = 0.67
+σ_y = 0.02
+
+t_grid, υ_paths = simulate_pareto_share_toy(
+    ω_1, ω_2, γ, ρ, σ_y, T=500, dt=0.05, n_paths=50, seed=42
+)
+
+fig, axes = plt.subplots(1, 2, figsize=(14, 5))
+
+ax = axes[0]
+for i in range(20):
+    ax.plot(t_grid, υ_paths[i], color="C0", alpha=0.25, lw=1)
+ax.axhline(0.5, color="gray", linestyle=":", lw=1)
+ax.set_title("sample paths", fontsize=12)
+ax.set_xlabel("time")
+ax.set_ylabel(r"Pareto share $\upsilon_t$")
+ax.set_ylim(0, 1)
+
+ax = axes[1]
+_, υ_long = simulate_pareto_share_toy(
+    ω_1, ω_2, γ, ρ, σ_y, T=2000, dt=0.05, n_paths=5, seed=123
+)
+υ_stationary = υ_long[:, υ_long.shape[1] // 2:].ravel()
+ax.hist(υ_stationary, bins=80, density=True, color="steelblue",
+        edgecolor="white", alpha=0.7)
+ax.set_title("approximate stationary density", fontsize=12)
+ax.set_xlabel(r"Pareto share $\upsilon$")
+ax.set_ylabel("density")
+ax.set_xlim(0, 1)
+
+plt.tight_layout()
+plt.show()
+```
+
+The left panel shows 20 sample paths of the Pareto share $\upsilon_t$ under parameters inside the coexistence region ($\omega^1 = 0.25$, $\omega^2 = 0$, $\gamma = 5$, IES $\approx 1.49$).
+
+Unlike the separable case in {numref}`fig-crra-pareto-paths`, the paths do not drift to zero — they repeatedly visit a wide range of values, bouncing between the two repelling boundaries.
+
+The right panel approximates the stationary density by pooling the second half of longer simulations.
+
+The interior mode is consistent with neither agent being driven to extinction.
+
+However, this toy interpolation only illustrates the recurrence logic; it does not reproduce the quantitative stationary consumption-share density in Figure 4 of {cite:t}`Borovicka2020`, which requires solving the full interior equilibrium ODE.
+
+## Summary
+
+Recursive preferences weaken the classical market-selection result.
+
+The portfolio return channel still rewards more optimistic beliefs.
+
+The volatility channel still penalizes aggressive positions.
+
+But when IES $> 1$, the saving channel can be strong enough to keep a distorted-belief agent alive.
+
+This is why recursive-preference economies can support stationary long-run wealth distributions with persistent heterogeneity in beliefs and portfolio positions.
diff --git a/lectures/theil_2.md b/lectures/theil_2.md
index cd844df39..b342cda1f 100644
--- a/lectures/theil_2.md
+++ b/lectures/theil_2.md
@@ -268,7 +268,7 @@ His optimal rule takes the form
 u_t = \tilde{H}(x_t, z_t, Y_t).
 ```
 
-{cite:t}`bacsar2008h` and {cite:t}`hansen2008robustness` establish that at
+{cite:t}`bacsar2008h` and {cite:t}`HansenSargent2008` establish that at
 equilibrium (with "big $K$ = little $k$" imposed) this collapses to
 
 ```{math}
@@ -642,7 +642,7 @@ mystnb:
     caption: Standard vs robust consumption paths
     name: fig-std-vs-robust-paths
 ---
-np.random.seed(42)
+np.random.seed(0)
 T_sim = 100
 
 def simulate_ar1(φ, ν, shocks, mu0=0.0):