Add notebooks for AML investigation use case by fcogidi · Pull Request #63 · VectorInstitute/eval-agents

fcogidi · 2026-02-20T16:16:15Z

Summary

Add notebooks for AML investigation use case.

Clickup Ticket(s): N/A

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📝 Documentation update
🔧 Refactoring (no functional changes)
⚡ Performance improvement
🧪 Test improvements
🔒 Security fix

Changes Made

Add notebook to explore AML dataset and database tools.
Add notebook to explain how cases are built and how to run the agent on a case file.
Add notebook to showcase evaluation pipeline and explain metrics.

Testing

Tests pass locally (uv run pytest tests/)
Type checking passes (uv run mypy <src_dir>)
Linting passes (uv run ruff check src_dir/)
Manual testing performed (describe below)

Manual testing details:
Ran the notebooks.

Screenshots/Recordings

N/A

Related Issues

N/A

Deployment Notes

N/A

Checklist

Code follows the project's style guidelines
Self-review of code completed
Documentation updated (if applicable)
No sensitive information (API keys, credentials) exposed

Copilot

Pull request overview

This PR adds three comprehensive Jupyter notebooks that document and demonstrate the AML (Anti-Money Laundering) investigation use case for evaluating AI agents. The notebooks provide a step-by-step walkthrough from data exploration to running evaluations.

Changes:

Added notebook 01 to explore the IBM AML dataset, build a SQLite database, and demonstrate the ReadOnlySqlDatabase tool
Added notebook 02 to explain case file structures, case generation, and running the agent on individual cases
Added notebook 03 to showcase the full evaluation pipeline with item-level, trace-level, and run-level graders

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
implementations/aml_investigation/01_data_and_tools.ipynb	Introduces the AML dataset, demonstrates database setup and schema, and explains the ReadOnlySqlDatabase safety tool
implementations/aml_investigation/02_running_the_agent.ipynb	Documents the case file data structures, explains the four case types (TP/TN/FP/FN), and demonstrates running a single case through the agent
implementations/aml_investigation/03_evaluation.ipynb	Demonstrates the full evaluation pipeline including dataset upload to Langfuse, explains the three-tier grading system, and shows how to inspect results

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

implementations/aml_investigation/03_evaluation.ipynb

implementations/aml_investigation/02_running_the_agent.ipynb

amrit110

Great work @fcogidi! Can you actually try running everything from your implementations once on a coder workspace? I think if it all works, then merge this PR, and we should be good :)

fcogidi · 2026-02-20T20:02:39Z

Great work @fcogidi! Can you actually try running everything from your implementations once on a coder workspace? I think if it all works, then merge this PR, and we should be good :)

It works in Coder. I just had to change the OPENAI_API_KEY envvar to GOOGLE_API_KEY for the agent calls to work. I think this is an environment variable that is not explicitly passed in, so it's looking for the canonical names internally.

I can do this as a workaround: os.environ["GOOGLE_API_KEY"] = Configs().openai_api_key.get_secret_value()

I'd recommend changing to GOOGLE_API_KEY in the .env file and updating firestore.

Add notebooks for AML investigation use case

cccbf83

fcogidi requested review from amrit110, Copilot and lotif February 20, 2026 16:16

fcogidi self-assigned this Feb 20, 2026

fcogidi added the enhancement New feature or request label Feb 20, 2026

Copilot started reviewing on behalf of fcogidi February 20, 2026 16:16 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

fcogidi and others added 2 commits February 20, 2026 11:34

Update variable names and remove dtype_backend for consistency

a1f98b8

Merge branch 'main' into fco/aml_notebooks

084e0ef

amrit110 approved these changes Feb 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add notebooks for AML investigation use case#63

Add notebooks for AML investigation use case#63
fcogidi wants to merge 3 commits intomainfrom
fco/aml_notebooks

fcogidi commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amrit110 left a comment •

edited

Loading

Uh oh!

fcogidi commented Feb 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

fcogidi commented Feb 20, 2026

Summary

Type of Change

Changes Made

Testing

Screenshots/Recordings

Related Issues

Deployment Notes

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amrit110 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fcogidi commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amrit110 left a comment •

edited

Loading

fcogidi commented Feb 20, 2026 •

edited

Loading