-
Notifications
You must be signed in to change notification settings - Fork 2
Pathfinder test runner #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
912676b
add pathfinder models
uhbrar d7ae86c
update requirements
uhbrar 2b48560
add pathfinder testing examples and download
uhbrar 1827b52
pathfinder query generation
uhbrar d57a205
test runer
uhbrar 89fad42
fid test case id
uhbrar fef4572
black
uhbrar 29e5f93
remove graph validation test runner
uhbrar 9a54506
black and remove import
uhbrar 644edf8
fix tests
uhbrar 0c90263
black
uhbrar 262ff1c
fix test reporting
uhbrar 0bb363d
update pathfinder test runner
uhbrar 24c7c7f
bump model version
uhbrar 553b1a6
update field name
uhbrar bcb57f5
update test
uhbrar 331c6a3
black
uhbrar f4812c1
remove base query runner
uhbrar d127101
change report
uhbrar 052277a
fix queryrunner
uhbrar 9a8f2b8
black
uhbrar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,4 @@ | ||
| ARS_Test_Runner==0.2.4 | ||
| # benchmarks-runner==0.1.3 | ||
| # ui-test-runner==0.0.2 | ||
| graph-validation-test-runners==0.1.5 | ||
| # we need to manually pin the right reasoner-validator version | ||
| reasoner-validator==4.2.5 | ||
| # graph-validation-test-runners==0.1.5 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| from typing import Dict, Union, List | ||
|
|
||
|
|
||
| async def pathfinder_pass_fail_analysis( | ||
| report: Dict[str, any], | ||
| agent: str, | ||
| message: Dict[str, any], | ||
| path_nodes: List[List[str]], | ||
| minimum_required_path_nodes: int, | ||
| ) -> Dict[str, any]: | ||
| found_path_nodes = set() | ||
| unmatched_paths = set() | ||
| for analysis in message["results"][0]["analyses"]: | ||
| for path_bindings in analysis["path_bindings"].values(): | ||
| for path_binding in path_bindings: | ||
| path_id = path_binding["id"] | ||
| matching_path_nodes = set() | ||
| for edge_id in message["auxiliary_graphs"][path_id]["edges"]: | ||
| edge = message["knowledge_graph"]["edges"][edge_id] | ||
| for node_curies in path_nodes: | ||
| unhit_node = True | ||
| for curie in node_curies: | ||
| if curie in matching_path_nodes: | ||
| unhit_node = False | ||
| if unhit_node: | ||
| if edge["subject"] in node_curies: | ||
| matching_path_nodes.add(edge["subject"]) | ||
| if edge["object"] in node_curies: | ||
| matching_path_nodes.add(edge["object"]) | ||
| if len(matching_path_nodes) >= minimum_required_path_nodes: | ||
| found_path_nodes.add(",".join(matching_path_nodes)) | ||
| elif len(matching_path_nodes) > 0: | ||
| unmatched_paths.add(",".join(matching_path_nodes)) | ||
|
|
||
| if len(found_path_nodes) > 0: | ||
| report[agent]["status"] = "PASSED" | ||
| report[agent]["expected_nodes_found"] = "; ".join(found_path_nodes) | ||
| elif len(unmatched_paths) > 0: | ||
| report[agent]["status"] = "FAILED" | ||
| report[agent]["expected_nodes_found"] = "; ".join(unmatched_paths) | ||
| else: | ||
| report[agent]["status"] = "FAILED" | ||
|
|
||
| return report | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,26 +5,31 @@ | |
| import time | ||
| from tqdm import tqdm | ||
| import traceback | ||
| from typing import Dict | ||
| from typing import Dict, Union | ||
|
|
||
| from ARS_Test_Runner.semantic_test import pass_fail_analysis | ||
| from standards_validation_test_runner import StandardsValidationTest | ||
|
|
||
| # from standards_validation_test_runner import StandardsValidationTest | ||
|
|
||
| # from benchmarks_runner import run_benchmarks | ||
|
|
||
| from translator_testing_model.datamodel.pydanticmodel import TestCase | ||
| from translator_testing_model.datamodel.pydanticmodel import ( | ||
| TestCase, | ||
| PathfinderTestCase, | ||
| ) | ||
|
|
||
| from test_harness.runner.query_runner import QueryRunner | ||
| from test_harness.reporter import Reporter | ||
| from test_harness.slacker import Slacker | ||
| from test_harness.result_collector import ResultCollector | ||
| from test_harness.utils import get_tag, hash_test_asset | ||
| from test_harness.pathfinder_test_runner import pathfinder_pass_fail_analysis | ||
|
|
||
|
|
||
| async def run_tests( | ||
| reporter: Reporter, | ||
| slacker: Slacker, | ||
| tests: Dict[str, TestCase], | ||
| tests: Dict[str, Union[TestCase, PathfinderTestCase]], | ||
| logger: logging.Logger = logging.getLogger(__name__), | ||
| args: Dict[str, any] = {}, | ||
| ) -> Dict: | ||
|
|
@@ -71,7 +76,7 @@ async def run_tests( | |
| try: | ||
| test_id = await reporter.create_test(test, asset) | ||
| test_ids.append(test_id) | ||
| except Exception: | ||
| except Exception as e: | ||
| logger.error(f"Failed to create test: {test.id}") | ||
| continue | ||
|
|
||
|
|
@@ -91,9 +96,24 @@ async def run_tests( | |
| "pks": test_query["pks"], | ||
| "result": {}, | ||
| } | ||
| if isinstance(test, PathfinderTestCase): | ||
| report["test_details"] = { | ||
| "minimum_required_path_nodes": asset.minimum_required_path_nodes, | ||
| "expected_path_nodes": "; ".join( | ||
| [ | ||
| ",".join( | ||
| [ | ||
| normalized_curies[path_node_id] | ||
| for path_node_id in path_node.ids | ||
| ] | ||
| ) | ||
| for path_node in asset.path_nodes | ||
| ] | ||
| ), | ||
| } | ||
| for agent, response in test_query["responses"].items(): | ||
| report["result"][agent] = { | ||
| "trapi_validation": "NA", | ||
| # "trapi_validation": "NA", | ||
| } | ||
| agent_report = report["result"][agent] | ||
| try: | ||
|
|
@@ -117,28 +137,28 @@ async def run_tests( | |
| logger.warning( | ||
| f"Failed to parse basic response fields from {agent}: {e}" | ||
| ) | ||
| try: | ||
| svt = StandardsValidationTest( | ||
| test_asset=asset, | ||
| environment=test.test_env, | ||
| component=agent, | ||
| trapi_version=args["trapi_version"], | ||
| biolink_version="suppress", | ||
| runner_settings="Inferred", | ||
| ) | ||
| results = svt.test_case_processor( | ||
| trapi_response=response["response"] | ||
| ) | ||
| agent_report["trapi_validation"] = results[ | ||
| next(iter(results.keys())) | ||
| ][agent]["status"] | ||
| if agent_report["trapi_validation"] == "FAILED": | ||
| agent_report["status"] = "FAILED" | ||
| agent_report["message"] = "TRAPI Validation Error" | ||
| continue | ||
| except Exception as e: | ||
| logger.warning(f"Failed to run TRAPI validation with {e}") | ||
| agent_report["trapi_validation"] = "ERROR" | ||
| # try: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's also take out where |
||
| # svt = StandardsValidationTest( | ||
| # test_asset=asset, | ||
| # environment=test.test_env, | ||
| # component=agent, | ||
| # trapi_version=args["trapi_version"], | ||
| # biolink_version="suppress", | ||
| # runner_settings="Inferred", | ||
| # ) | ||
| # results = svt.test_case_processor( | ||
| # trapi_response=response["response"] | ||
| # ) | ||
| # agent_report["trapi_validation"] = results[ | ||
| # next(iter(results.keys())) | ||
| # ][agent]["status"] | ||
| # if agent_report["trapi_validation"] == "FAILED": | ||
| # agent_report["status"] = "FAILED" | ||
| # agent_report["message"] = "TRAPI Validation Error" | ||
| # continue | ||
| # except Exception as e: | ||
| # logger.warning(f"Failed to run TRAPI validation with {e}") | ||
| # agent_report["trapi_validation"] = "ERROR" | ||
| try: | ||
| if ( | ||
| response["response"]["message"].get("results") is None | ||
|
|
@@ -147,13 +167,28 @@ async def run_tests( | |
| agent_report["status"] = "DONE" | ||
| agent_report["message"] = "No results" | ||
| continue | ||
| await pass_fail_analysis( | ||
| report["result"], | ||
| agent, | ||
| response["response"]["message"]["results"], | ||
| normalized_curies[asset.output_id], | ||
| asset.expected_output, | ||
| ) | ||
| if isinstance(test, PathfinderTestCase): | ||
| await pathfinder_pass_fail_analysis( | ||
| report["result"], | ||
| agent, | ||
| response["response"]["message"], | ||
| [ | ||
| [ | ||
| normalized_curies[path_node_id] | ||
| for path_node_id in path_node.ids | ||
| ] | ||
| for path_node in asset.path_nodes | ||
| ], | ||
| asset.minimum_required_path_nodes, | ||
| ) | ||
| else: | ||
| await pass_fail_analysis( | ||
| report["result"], | ||
| agent, | ||
| response["response"]["message"]["results"], | ||
| normalized_curies[asset.output_id], | ||
| asset.expected_output, | ||
| ) | ||
| except Exception as e: | ||
| logger.error( | ||
| f"Failed to run acceptance test analysis on {agent}: {e}" | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple things here:
expected_path_nodesif someone passed the test? Let's rethink what all should be included in the report so that someone could get just about all the information they need just by looking at the test output.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's true. So I guess the best thing to do would be to include the minimum required path node number, as well as the expected intermediate nodes, regardless of whether or not the test is passed?
I do think providing the
expected_path_nodesis useful even if the test is passed. That way the test output makes clear exactly how the test is passed, by showing what pairs are actually found. Maybe the name should be changed?expected_paths_found?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all sounds good. I would maybe even include the
expected_nodes_found(note the updated name) for failed tests, to show how close/far they were from passing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of including the minimum number and expected intermediate nodes for every ARA, you could include it as a higher field in the whole report. Here's an example of the full report output:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added both of these outside of the individual agent reports, and also added more details to the agent reports as well.