Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/added/uk-geography-assets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
UK constituency and local-authority output helpers can now resolve standard geography files locally or download them from GCS by default.
6 changes: 2 additions & 4 deletions docs/outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,22 +242,20 @@ for row in impacts.district_results:

### UK constituencies / local authorities

Constituency and local-authority breakdowns require externally-supplied weight matrices:
Constituency and local-authority breakdowns use externally-maintained weight matrices. The convenience helpers first look for the standard files locally, then download them from the PolicyEngine UK GCS bucket:

```python
from policyengine.outputs import compute_uk_constituency_impacts

impacts = compute_uk_constituency_impacts(
baseline_simulation=baseline,
reform_simulation=reform,
weight_matrix_path="parliamentary_constituency_weights.h5",
constituency_csv_path="constituencies_2024.csv",
year="2025",
)
impacts.constituency_results
```

`compute_uk_local_authority_impacts` follows the same pattern. See [Regions](regions.md).
`compute_uk_local_authority_impacts` follows the same pattern. Pass explicit paths to use specific local files instead of the default local/GCS lookup; missing explicit paths raise `FileNotFoundError` without falling back to GCS. Pass `download_missing_assets=False` to require the canonical files to exist locally or in the cache. Set `POLICYENGINE_UK_GEOGRAPHY_DATA_DIR` to choose the local lookup and download cache directory. See [Regions](regions.md).

## Writing your own

Expand Down
10 changes: 5 additions & 5 deletions docs/regions.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,21 +48,21 @@ for row in impacts.district_results:

## UK parliamentary constituencies

Constituency-level impacts reweight every household to each constituency's demographic profile using a pre-computed weight matrix, so both the weight file and a constituency metadata CSV are required inputs:
Constituency-level impacts reweight every household to each constituency's demographic profile using a pre-computed weight matrix. By default, PolicyEngine looks for the standard constituency files locally and downloads them from the PolicyEngine UK GCS bucket if they are not present:

```python
from policyengine.outputs import compute_uk_constituency_impacts

impacts = compute_uk_constituency_impacts(
baseline_simulation=baseline,
reform_simulation=reform,
weight_matrix_path="parliamentary_constituency_weights.h5",
constituency_csv_path="constituencies_2024.csv",
year="2025",
)
impacts.constituency_results
```

To force specific local files, pass `weight_matrix_path` and `constituency_csv_path`. If either provided path is missing, the helper raises `FileNotFoundError` and does not fall back to GCS. To require the canonical files to be available locally or in the cache, pass `download_missing_assets=False`. To set a reusable local data directory and download cache, set `POLICYENGINE_UK_GEOGRAPHY_DATA_DIR`.

## UK local authorities

```python
Expand All @@ -71,13 +71,13 @@ from policyengine.outputs import compute_uk_local_authority_impacts
impacts = compute_uk_local_authority_impacts(
baseline_simulation=baseline,
reform_simulation=reform,
weight_matrix_path="local_authority_weights.h5",
local_authority_csv_path="local_authorities_2021.csv",
year="2025",
)
impacts.local_authority_results
```

`compute_uk_local_authority_impacts` accepts explicit paths with `weight_matrix_path` and `local_authority_csv_path` when callers need to use specific local files instead of the default local/GCS lookup. It also accepts `download_missing_assets=False` for local-only canonical asset resolution.

## Region registries

`pe.us.model.region_registry` and `pe.uk.model.region_registry` enumerate supported sub-national units:
Expand Down
46 changes: 25 additions & 21 deletions src/policyengine/core/scoping_strategy.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,11 @@
a specific value (e.g., UK countries by 'country' field, US places by 'place_fips').

2. WeightReplacementStrategy: Replaces household weights from a pre-computed weight
matrix stored in GCS (e.g., UK constituencies and local authorities).
matrix resolved locally or from GCS (e.g., UK constituencies and local authorities).
"""

import logging
from abc import abstractmethod
from pathlib import Path
from typing import Annotated, Literal, Optional, Union

import numpy as np
Expand Down Expand Up @@ -93,7 +92,7 @@ class WeightReplacementStrategy(RegionScopingStrategy):

Used for UK constituencies and local authorities. Instead of removing
households, this strategy keeps all households but replaces their weights
with region-specific values from a weight matrix stored in GCS.
with region-specific values from a locally cached or downloaded weight matrix.

The weight matrix is an HDF5 file with shape (N_regions x N_households),
where each row contains household weights for a specific region.
Expand All @@ -106,42 +105,47 @@ class WeightReplacementStrategy(RegionScopingStrategy):
lookup_csv_bucket: str
lookup_csv_key: str
region_code: str
download_missing_assets: bool = True

def apply(
self,
entity_data: dict[str, MicroDataFrame],
group_entities: list[str],
year: int,
) -> dict[str, MicroDataFrame]:
from policyengine_core.tools.google_cloud import download_gcs_file
from policyengine.data.uk_geography_assets import (
UKGeographyAssetSpec,
resolve_uk_geography_asset_paths,
)

# Download lookup CSV and find region index
lookup_path = Path(
download_gcs_file(
bucket=self.lookup_csv_bucket,
file_path=self.lookup_csv_key,
)
paths = resolve_uk_geography_asset_paths(
UKGeographyAssetSpec(
geography_type="weight replacement",
weight_matrix_filename=self.weight_matrix_key,
lookup_csv_filename=self.lookup_csv_key,
bucket=self.weight_matrix_bucket,
weight_matrix_bucket=self.weight_matrix_bucket,
lookup_csv_bucket=self.lookup_csv_bucket,
),
download_missing_assets=self.download_missing_assets,
)
lookup_df = pd.read_csv(lookup_path)

lookup_df = pd.read_csv(paths.lookup_csv_path)

region_id = self._find_region_index(lookup_df, self.region_code)

# Download weight matrix and extract weights for this region.
# Load weight matrix and extract weights for this region.
# h5py is only needed here, so import lazily to keep
# `from policyengine.core import ...` light.
import h5py

weights_path = download_gcs_file(
bucket=self.weight_matrix_bucket,
file_path=self.weight_matrix_key,
)
with h5py.File(weights_path, "r") as f:
with h5py.File(paths.weight_matrix_path, "r") as f:
weights = f[str(year)][...]

region_weights = weights[region_id]

# Validate weight row length matches household count
household_df = pd.DataFrame(entity_data["household"])
household_df = pd.DataFrame(entity_data["household"]).copy()
if len(region_weights) != len(household_df):
raise ValueError(
f"Weight matrix row length ({len(region_weights)}) does not match "
Expand All @@ -152,9 +156,9 @@ def apply(
# Replace household weights
result = {}
for entity_name, mdf in entity_data.items():
df = pd.DataFrame(mdf)
df = pd.DataFrame(mdf).copy()
if entity_name == "household":
df["household_weight"] = region_weights
df.loc[:, "household_weight"] = region_weights
result[entity_name] = MicroDataFrame(df, weights="household_weight")
else:
weight_col = f"{entity_name}_weight"
Expand All @@ -174,7 +178,7 @@ def apply(
for hh_id in df[person_hh_col].values
]
)
df[weight_col] = new_weights
df.loc[:, weight_col] = new_weights

result[entity_name] = MicroDataFrame(
df,
Expand Down
31 changes: 18 additions & 13 deletions src/policyengine/countries/uk/regions.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,19 @@
RowFilterStrategy,
WeightReplacementStrategy,
)
from policyengine.data.uk_geography_assets import (
CONSTITUENCY_ASSET_SPEC,
LOCAL_AUTHORITY_ASSET_SPEC,
UK_GEOGRAPHY_BUCKET_URI,
)
from policyengine.provenance.manifest import resolve_region_dataset_path

if TYPE_CHECKING:
pass

logger = logging.getLogger(__name__)

UK_DATA_BUCKET = "gs://policyengine-uk-data-private"
UK_DATA_BUCKET = UK_GEOGRAPHY_BUCKET_URI

# UK countries
UK_COUNTRIES = {
Expand Down Expand Up @@ -54,8 +59,8 @@ def _load_constituencies_from_csv() -> list[dict]:

try:
csv_path = download(
gcs_bucket="policyengine-uk-data-private",
gcs_key="constituencies_2024.csv",
gcs_bucket=CONSTITUENCY_ASSET_SPEC.bucket,
gcs_key=CONSTITUENCY_ASSET_SPEC.lookup_csv_filename,
)
import pandas as pd

Expand Down Expand Up @@ -86,8 +91,8 @@ def _load_local_authorities_from_csv() -> list[dict]:

try:
csv_path = download(
gcs_bucket="policyengine-uk-data-private",
gcs_key="local_authorities_2021.csv",
gcs_bucket=LOCAL_AUTHORITY_ASSET_SPEC.bucket,
gcs_key=LOCAL_AUTHORITY_ASSET_SPEC.lookup_csv_filename,
)
import pandas as pd

Expand Down Expand Up @@ -159,10 +164,10 @@ def build_uk_region_registry(
region_type="constituency",
parent_code="uk",
scoping_strategy=WeightReplacementStrategy(
weight_matrix_bucket="policyengine-uk-data-private",
weight_matrix_key="parliamentary_constituency_weights.h5",
lookup_csv_bucket="policyengine-uk-data-private",
lookup_csv_key="constituencies_2024.csv",
weight_matrix_bucket=CONSTITUENCY_ASSET_SPEC.bucket,
weight_matrix_key=CONSTITUENCY_ASSET_SPEC.weight_matrix_filename,
lookup_csv_bucket=CONSTITUENCY_ASSET_SPEC.bucket,
lookup_csv_key=CONSTITUENCY_ASSET_SPEC.lookup_csv_filename,
region_code=const["code"],
),
)
Expand All @@ -180,10 +185,10 @@ def build_uk_region_registry(
region_type="local_authority",
parent_code="uk",
scoping_strategy=WeightReplacementStrategy(
weight_matrix_bucket="policyengine-uk-data-private",
weight_matrix_key="local_authority_weights.h5",
lookup_csv_bucket="policyengine-uk-data-private",
lookup_csv_key="local_authorities_2021.csv",
weight_matrix_bucket=LOCAL_AUTHORITY_ASSET_SPEC.bucket,
weight_matrix_key=LOCAL_AUTHORITY_ASSET_SPEC.weight_matrix_filename,
lookup_csv_bucket=LOCAL_AUTHORITY_ASSET_SPEC.bucket,
lookup_csv_key=LOCAL_AUTHORITY_ASSET_SPEC.lookup_csv_filename,
region_code=la["code"],
),
)
Expand Down
1 change: 1 addition & 0 deletions src/policyengine/data/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Static metadata and packaged data helpers."""
Loading