Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
c5f0c43
NRL-721 Create clone and delete scripts WIP
sandyforresternhs Jan 29, 2026
8a876b3
NRL-721 Reseed pointer scripts and tests
sandyforresternhs Feb 6, 2026
321d709
NRL-721 Add failed attempts to output
sandyforresternhs Feb 6, 2026
f10dba1
NRL-721 Implement lambda and eventbridge
sandyforresternhs Feb 6, 2026
9dae7ae
NRL-721 Get lambda working successfully
sandyforresternhs Feb 9, 2026
1c7c0ab
NRL-721 Make seed lambda ac wide and enable ac wide layers
sandyforresternhs Feb 12, 2026
3b56521
NRL-721 Add build to ac wide wf and deploy only when tables are listed
sandyforresternhs Feb 12, 2026
5aebcd6
NRL-721 Remove unused param
sandyforresternhs Feb 12, 2026
02b98bf
NRL-721 Correct make command
sandyforresternhs Feb 12, 2026
221c9bc
NRL-721 Add build dependencies and perms
sandyforresternhs Feb 12, 2026
f13c88e
NRL-721 Add build seed lamda
sandyforresternhs Feb 12, 2026
3f43b96
NRL-721 Save lambda artifacts for apply and move lambda build to dist
sandyforresternhs Feb 13, 2026
678477e
Merge branch 'develop' into feature/SAFO6-NRL-721-seed-sandbox-data
sandyforresternhs Feb 13, 2026
f630602
NRL-721 Increase schedule, correct table name & sqube fixes
sandyforresternhs Feb 13, 2026
476c4c5
NRL-721 Add kms perms to lambda policy
sandyforresternhs Feb 13, 2026
30cfd57
NRL-721 Add second table
sandyforresternhs Feb 13, 2026
f30fb60
NRL-721 Sonarqube fixes
sandyforresternhs Feb 13, 2026
490c0c0
NRL-721 Temp remove clone db script
sandyforresternhs Feb 13, 2026
1160b49
NRL-721 Add retrieval mechanism to seed pointers
sandyforresternhs Feb 13, 2026
b494013
NRL-721 Sonarqube fixes
sandyforresternhs Feb 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .github/workflows/deploy-account-wide-infra.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,27 @@ jobs:
echo "${HOME}/.asdf/bin" >> $GITHUB_PATH
poetry install --no-root

- name: Build Lambda Layers
run: |
make build-layers
make build-dependency-layer

- name: Build Seed Sandbox Lambda
run: make build-seed-sandbox-lambda

- name: Configure Management Credentials
uses: aws-actions/configure-aws-credentials@7474bc4690e29a8392af63c5b98e7449536d5c3a #v4.3.1
with:
aws-region: eu-west-2
role-to-assume: ${{ secrets.MGMT_ROLE_ARN }}
role-session-name: github-actions-ci-${{ inputs.environment }}-${{ github.run_id }}

- name: Add S3 Permissions to Lambda Layer
env:
ACCOUNT_NAME: ${{ vars.ACCOUNT_NAME }}
run: |
make get-s3-perms ENV=${ACCOUNT_NAME}

- name: Retrieve Server Certificates
env:
ACCOUNT_NAME: ${{ vars.ACCOUNT_NAME }}
Expand Down Expand Up @@ -92,6 +106,11 @@ jobs:
aws s3 cp terraform/account-wide-infrastructure/$ACCOUNT_NAME/tfplan.txt s3://nhsd-nrlf--mgmt--github-ci-logging/acc-$ACCOUNT_NAME/${{ github.run_id }}/tfplan.txt
aws s3 cp terraform/account-wide-infrastructure/modules/glue/files/src.zip s3://nhsd-nrlf--mgmt--github-ci-logging/acc-$ACCOUNT_NAME/${{ github.run_id }}/glue-src.zip

aws s3 cp dist/nrlf.zip s3://nhsd-nrlf--mgmt--github-ci-logging/acc-$ACCOUNT_NAME/${{ github.run_id }}/nrlf.zip
aws s3 cp dist/dependency_layer.zip s3://nhsd-nrlf--mgmt--github-ci-logging/acc-$ACCOUNT_NAME/${{ github.run_id }}/dependency_layer.zip
aws s3 cp dist/nrlf_permissions.zip s3://nhsd-nrlf--mgmt--github-ci-logging/acc-$ACCOUNT_NAME/${{ github.run_id }}/nrlf_permissions.zip
aws s3 cp dist/seed_sandbox.zip s3://nhsd-nrlf--mgmt--github-ci-logging/acc-$ACCOUNT_NAME/${{ github.run_id }}/seed_sandbox.zip

terraform-apply:
name: Terraform Apply - ${{ inputs.environment }}
needs: [terraform-plan]
Expand Down Expand Up @@ -126,6 +145,12 @@ jobs:
mkdir -p terraform/account-wide-infrastructure/modules/glue/files
aws s3 cp s3://nhsd-nrlf--mgmt--github-ci-logging/acc-$ACCOUNT_NAME/${{ github.run_id }}/glue-src.zip terraform/account-wide-infrastructure/modules/glue/files/src.zip

mkdir -p dist
aws s3 cp s3://nhsd-nrlf--mgmt--github-ci-logging/acc-$ACCOUNT_NAME/${{ github.run_id }}/nrlf.zip dist/nrlf.zip
aws s3 cp s3://nhsd-nrlf--mgmt--github-ci-logging/acc-$ACCOUNT_NAME/${{ github.run_id }}/dependency_layer.zip dist/dependency_layer.zip
aws s3 cp s3://nhsd-nrlf--mgmt--github-ci-logging/acc-$ACCOUNT_NAME/${{ github.run_id }}/nrlf_permissions.zip dist/nrlf_permissions.zip
aws s3 cp s3://nhsd-nrlf--mgmt--github-ci-logging/acc-$ACCOUNT_NAME/${{ github.run_id }}/seed_sandbox.zip dist/seed_sandbox.zip

- name: Retrieve Server Certificates
env:
ACCOUNT_NAME: ${{ vars.ACCOUNT_NAME }}
Expand Down
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,11 @@ check-deploy: ## check the deploy environment is setup correctly
check-deploy-warn:
@SHOULD_WARN_ONLY=true ./scripts/check-deploy-environment.sh

build: check-warn build-api-packages build-layers build-dependency-layer ## Build the project
build: check-warn build-api-packages build-layers build-dependency-layer build-seed-sandbox-lambda ## Build the project

build-seed-sandbox-lambda:
@echo "Building seed_sandbox Lambda"
@cd lambdas/seed_sandbox && make build

build-dependency-layer:
@echo "Building Lambda dependency layer"
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,8 +375,9 @@ In order to deploy to a sandbox environment (`dev-sandbox`, `qa-sandbox`, `int-s

### Sandbox database clear and reseed

Any workspace suffixed with `-sandbox` has a small amount of additional infrastructure deployed to clear and reseed the DynamoDB tables (auth and document pointers) using a Lambda running
on a cron schedule that can be found in the `cron/seed_sandbox` directory in the root of this project. The data used to seed the DynamoDB tables can found in the `cron/seed_sandbox/data` directory.
<!-- TODO Update this -->
<!-- Any workspace suffixed with `-sandbox` has a small amount of additional infrastructure deployed to clear and reseed the DynamoDB tables (auth and document pointers) using a Lambda running
on a cron schedule that can be found in the `cron/seed_sandbox` directory in the root of this project. The data used to seed the DynamoDB tables can found in the `cron/seed_sandbox/data` directory. -->

### Sandbox authorisation

Expand Down
29 changes: 29 additions & 0 deletions lambdas/seed_sandbox/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
.PHONY: build clean

build: clean
@echo "Building Lambda deployment package..."
mkdir -p build

# Copy the handler
cp index.py build/

# Copy the required scripts
mkdir -p build/scripts
cp ../../scripts/delete_all_table_items.py build/scripts/
cp ../../scripts/seed_sandbox_table.py build/scripts/
cp ../../scripts/seed_utils.py build/scripts/

# Copy the pointer template data
mkdir -p build/tests/data/samples
cp -r ../../tests/data/samples/*.json build/tests/data/samples/

# Create the zip file in root dist
mkdir -p ../../dist
cd build && zip -r ../../../dist/seed_sandbox.zip . -x "*.pyc" -x "__pycache__/*" -x ".DS_Store"

@echo "✓ Lambda package created: ../../dist/seed_sandbox.zip"

clean:
@echo "Cleaning build artifacts..."
rm -rf build
@echo "✓ Clean complete"
122 changes: 122 additions & 0 deletions lambdas/seed_sandbox/index.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
"""
Lambda handler for resetting specified DynamoDB tables with seed test data.

This Lambda function runs on a schedule to clear and reseed specified
pointers tables with fresh test data.
"""

# flake8: noqa: T201

import json
import os

from scripts.delete_all_table_items import delete_all_table_items
from scripts.seed_sandbox_table import seed_sandbox_table


def handler(event, context):
"""
Lambda handler that orchestrates the reset of specified tables

The tables to be reset are specified by the TABLE_NAMES environment variable
as a comma-separated list

Args:
event: Lambda event (from EventBridge schedule)
context: Lambda context

Returns:
dict: Response with status and details for each table
"""
table_names_str = os.environ.get("TABLE_NAMES", "")
pointers_per_type = int(os.environ.get("POINTERS_PER_TYPE", "2"))

if not table_names_str:
error_msg = "TABLE_NAMES environment variable is required"
print(f"ERROR: {error_msg}")
return {"statusCode": 500, "body": json.dumps({"error": error_msg})}

table_names = [name.strip() for name in table_names_str.split(",") if name.strip()]

if not table_names:
error_msg = "No valid table names provided in TABLE_NAMES"
print(f"ERROR: {error_msg}")
return {"statusCode": 500, "body": json.dumps({"error": error_msg})}

print(
f"Starting table reset for {len(table_names)} table(s): {', '.join(table_names)}"
)
print(f"Pointers per type: {pointers_per_type}")

results = []
failed_tables = []

for table_name in table_names:
print(f"\n{'='*60}")
print(f"Processing table: {table_name}")
print(f"{'='*60}")

try:
print("Step 1: Deleting all items from table...")
pointers_deleted_count = delete_all_table_items(table_name=table_name)
print(f"✓ Deleted {pointers_deleted_count} items")

print("Step 2: Seeding table with fresh data...")
seed_result = seed_sandbox_table(
table_name=table_name,
pointers_per_type=pointers_per_type,
force=True,
write_csv=False,
)
print(f"✓ Created {seed_result['successful']} pointers")

results.append(
{
"table_name": table_name,
"status": "success",
"pointers_deleted": pointers_deleted_count,
"pointers_created": seed_result["successful"],
"pointers_attempted": seed_result["attempted"],
"pointers_failed": seed_result["failed"],
}
)

except Exception as e:
error_msg = f"Failed to reset table {table_name}: {str(e)}"
print(f"ERROR: {error_msg}")
failed_tables.append(table_name)
results.append(
{
"table_name": table_name,
"status": "failed",
"error": str(e),
}
)

if failed_tables:
status_code = 500 if len(failed_tables) == len(table_names) else 207
message = (
f"Failed to reset {len(failed_tables)} table(s): {', '.join(failed_tables)}"
)
else:
status_code = 200
message = f"Successfully reset {len(table_names)} table(s)"

result = {
"statusCode": status_code,
"body": json.dumps(
{
"message": message,
"tables_processed": len(table_names),
"tables_succeeded": len(table_names) - len(failed_tables),
"tables_failed": len(failed_tables),
"results": results,
"pointers_per_type": pointers_per_type,
}
),
}

print(f"\n{'='*60}")
print(f"RESULT: {message}")
print(f"{'='*60}")
return result
1 change: 1 addition & 0 deletions layer/nrlf/core/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ class Source(Enum):
NRLF = "NRLF"
LEGACY = "NRL" # not actually used
PERFTEST = "NFT-SEED"
SANDBOX = "SANDBOX-SEED"


VALID_SOURCES = frozenset(item.value for item in Source.__members__.values())
Expand Down
88 changes: 88 additions & 0 deletions scripts/delete_all_table_items.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
#!/usr/bin/env python
import sys

import boto3
from botocore.exceptions import ClientError

# Needed for when the script is run in Lambda where modules are in scripts subdirectory
try:
import fire
except ImportError:
fire = None


def _handle_table_access_error(e, table_name):
error_code = e.response["Error"]["Code"]
if error_code == "ResourceNotFoundException":
print(f"Error: Table '{table_name}' does not exist")
elif error_code == "AccessDeniedException":
print(f"Error: No permission to access table '{table_name}'")
else:
print(f"Error accessing table: {e}")
sys.exit(1)


def _scan_and_delete_batch(table, scan_kwargs, deleted_count):
try:
response = table.scan(**scan_kwargs)
except ClientError as e:
if e.response["Error"]["Code"] == "ProvisionedThroughputExceededException":
print(f"\nWarning: Throttled at {deleted_count} items. Retrying...")
return scan_kwargs.get("ExclusiveStartKey"), deleted_count, True
raise

with table.batch_writer() as batch:
for item in response["Items"]:
batch.delete_item(Key=item)
deleted_count += 1

if deleted_count % 100 == 0:
print(f"Deleted {deleted_count} items...", end="\r")

return response.get("LastEvaluatedKey"), deleted_count, False


def delete_all_table_items(table_name):
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(table_name)

try:
key_names = [key["AttributeName"] for key in table.key_schema]
except ClientError as e:
_handle_table_access_error(e, table_name)

scan_kwargs = {"ProjectionExpression": ",".join(key_names)}
deleted_count = 0

try:
while True:
last_key, deleted_count, was_throttled = _scan_and_delete_batch(
table, scan_kwargs, deleted_count
)

if was_throttled:
if last_key:
scan_kwargs["ExclusiveStartKey"] = last_key
elif "ExclusiveStartKey" in scan_kwargs:
del scan_kwargs["ExclusiveStartKey"]
continue

if not last_key:
break

scan_kwargs["ExclusiveStartKey"] = last_key

except Exception as e:
print(f"\nError during deletion: {e}")
print(f"Successfully deleted {deleted_count} items before error")
sys.exit(1)

print(f"\n✓ Cleared {deleted_count} items from {table_name}")
return deleted_count


if __name__ == "__main__":
if fire is None:
print("Error: fire module not available")
sys.exit(1)
fire.Fire(delete_all_table_items)
55 changes: 55 additions & 0 deletions scripts/reset_sandbox_table.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/usr/bin/env python
"""
Resets a sandbox table by clearing all items and reseeding with fresh data

This script is for manual cli use to reset a sandbox table

There is a separate lambda function in place (../lambdas/seed_sandbox) which performs this same reset operation on a weekly schedule, but this script allows for on-demand resets without needing to wait for the scheduled job
"""
import sys

import fire
from delete_all_table_items import delete_all_table_items
from seed_sandbox_table import seed_sandbox_table


def reset_sandbox_table(table_name: str, pointers_per_type: int = 2):
"""
Reset a sandbox table by clearing all items and reseeding with fresh data.

Args:
table_name: Name of the DynamoDB table to reset
pointers_per_type: Number of pointers per type per custodian (default: 2)
"""
print(f"=== Resetting Sandbox Table: {table_name} ===\n")

print("Step 1: Deleting all existing items...")
try:
delete_all_table_items(table_name)
print()
except SystemExit as e:
print("✗ Failed to delete items. Aborting reset.")
sys.exit(e.code)
except Exception as e:
print(f"✗ Unexpected error during deletion: {e}")
sys.exit(1)

print("Step 2: Seeding with fresh pointer data...")
try:
result = seed_sandbox_table(table_name, pointers_per_type, force=True)
print("\n=== ✓ Reset Complete ===")
print(
f"Table '{table_name}' has been reset with {result['successful']} fresh pointers"
)
if result["failed"] > 0:
print(f"⚠️ {result['failed']} pointer(s) failed to create")
except SystemExit as e:
print("✗ Failed to seed table after deletion.")
sys.exit(e.code)
except Exception as e:
print(f"✗ Unexpected error during seeding: {e}")
sys.exit(1)


if __name__ == "__main__":
fire.Fire(reset_sandbox_table)
Loading