-
Notifications
You must be signed in to change notification settings - Fork 3
Fix: Manifest content hash computation times out (#6123) #7258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Fix: Manifest content hash computation times out (#6123) #7258
Conversation
b5212ca to
7ed3673
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7258 +/- ##
===========================================
- Coverage 84.82% 84.82% -0.01%
===========================================
Files 157 157
Lines 23060 23084 +24
===========================================
+ Hits 19561 19581 +20
- Misses 3499 3503 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7ed3673 to
d7e9024
Compare
2a9c76f to
e4693d5
Compare
achave11-ucsc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ✅
src/azul/service/manifest_service.py
Outdated
| """ | ||
| return self._manifest_hash('bundles') | ||
|
|
||
| def _manifest_hash(self, base: str) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PL (modal parameter smell, literal string)
bd5d131 to
81a2b78
Compare
test/azul_test_case.py
Outdated
|
|
||
| @classmethod | ||
| def _patch_enable_bundle_notifications(cls): | ||
| cls.addClassPatch(patch.object(target=type(config), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use patch_config here?
test/azul_test_case.py
Outdated
| cls._patch_enable_bundle_notifications() | ||
|
|
||
| @classmethod | ||
| def _patch_enable_bundle_notifications(cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline this method, please. I don't see any other callsites or other reasons to have it.
db43db3 to
7a07d72
Compare
hannes-ucsc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No fixups next time, please. Push commits individually.
Index: src/azul/service/manifest_service.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/azul/service/manifest_service.py b/src/azul/service/manifest_service.py
--- a/src/azul/service/manifest_service.py (revision 7d93c67e2b241feab2bf219b0dd247dc9be8cafe)
+++ b/src/azul/service/manifest_service.py (date 1769619165737)
@@ -104,6 +104,7 @@
)
from azul.indexer.document import (
DocumentType,
+ EntityType,
FieldPath,
)
from azul.indexer.field import (
@@ -851,7 +852,7 @@
@property
@abstractmethod
- def entity_type(self) -> str:
+ def entity_type(self) -> EntityType:
"""
The type of the index entities this generator consumes. This controls
which aggregate Elasticsearch index is queried to fetch the aggregate
@@ -993,11 +994,7 @@
# The explicit filters are already normalized so we don't to do anything
# special to desensitize the hash to insignificat differences
filter_string = json.dumps(self.filters.explicit)
- # If incremental index changes are disabled, we don't need to worry
- # about individual bundles, only sources.
- content_hash = str(
- self.manifest_hash(by_bundle=config.enable_bundle_notifications)
- )
+ content_hash = self._content_hash(by_bundle=config.enable_bundle_notifications)
catalog = self.catalog
format = self.format()
manifest_hash_input = [
@@ -1047,9 +1044,7 @@
file_name = atlas + '-manifest-' + self.s3_object_key_base(manifest_key)
return file_name
- def _create_request(self, entity_type: str | None = None) -> Search:
- if entity_type is None:
- entity_type = self.entity_type
+ def _create_request(self, entity_type: EntityType) -> Search:
pipeline = self._create_pipeline()
request = self.service.create_request(self.catalog, entity_type)
request = pipeline.prepare_request(request)
@@ -1171,7 +1166,7 @@
return self.mirror_service.mirror_uri(source, file_cls, file)
@cache
- def manifest_hash(self, *, by_bundle: bool) -> int:
+ def _content_hash(self, *, by_bundle: bool) -> str:
"""
Return a hash of the input this generator builds the manifest from. The
input is the set of ES documents from the files index. For two generator
@@ -1204,14 +1199,14 @@
filters. This mode should *not* be used if the index is changing or is
likely to change due to the incremental incorporation of bundles.
"""
- log.debug('Computing content hash for manifest from %s using %r ...',
+ log.debug('Computing content hash from %s matching %r ...',
'bundles' if by_bundle else 'sources', self.filters)
start_time = time.time()
if by_bundle:
- request = self._create_request()
+ entity_type = None
else:
- root_entity_type = self.metadata_plugin.root_entity_type
- request = self._create_request(entity_type=root_entity_type)
+ entity_type = self.metadata_plugin.root_entity_type
+ request = self._create_request(entity_type)
request.aggs.metric(
'hash',
'scripted_metric',
@@ -1244,8 +1239,8 @@
request = request.extra(size=0)
response = request.execute()
assert len(response.hits) == 0
- hash_value = response.aggregations.hash.value
- log.info('Manifest content hash %i was computed in %.3fs using filters %r.',
+ hash_value = str(response.aggregations.hash.value)
+ log.info('Content hash %r was computed in %.3fs using filters %r.',
hash_value, time.time() - start_time, self.filters)
return hash_value
@@ -1450,7 +1445,7 @@
return 'curlrc'
@property
- def entity_type(self) -> str:
+ def entity_type(self) -> EntityType:
return 'files'
@cached_property
@@ -1704,7 +1699,7 @@
return 'tsv'
@property
- def entity_type(self) -> str:
+ def entity_type(self) -> EntityType:
return 'files'
@cached_property
@@ -1819,7 +1814,7 @@
return None
def _all_docs_sorted(self) -> Iterable[JSON]:
- request = self._create_request()
+ request = self._create_request(self.entity_type)
request = request.params(preserve_order=True).sort('entity_id.keyword')
for hit in request.scan():
doc = self._hit_to_doc(hit)
@@ -1849,7 +1844,7 @@
metaclass=ABCMeta):
@property
- def entity_type(self) -> str:
+ def entity_type(self) -> EntityType:
# Orphans only have projects/datasets as hubs, so we need to retrieve
# aggregates of those types in order to join against orphan replicas
root_entity_type = self.metadata_plugin.root_entity_type
Index: src/azul/plugins/__init__.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/azul/plugins/__init__.py b/src/azul/plugins/__init__.py
--- a/src/azul/plugins/__init__.py (revision 7d93c67e2b241feab2bf219b0dd247dc9be8cafe)
+++ b/src/azul/plugins/__init__.py (date 1769618757589)
@@ -496,7 +496,7 @@
raise NotImplementedError
@property
- def root_entity_type(self) -> str:
+ def root_entity_type(self) -> EntityType:
"""
The type of entity that sits at the root of the entity graph, and that
all other entities are directly or indirectly associated with.
@@ -509,8 +509,10 @@
"""
raise NotImplementedError
+ # REVIEW: Separate commit for the type hint changes
+
@property
- def hot_entity_types(self) -> Iterable[str]:
+ def hot_entity_types(self) -> Iterable[EntityType]:
"""
The types of inner entities that do not explicitly track their hubs in
replica documents in order to avoid a large list of hub references in7a07d72 to
448bff6
Compare
| @cached_property | ||
| def manifest_content_hash(self) -> int: | ||
| log.debug('Computing content hash for manifest using filters %r ...', self.filters) | ||
| @cache |
Check warning
Code scanning / CodeQL
Use of the return value of a procedure Warning
cache
18618c9 to
557c290
Compare
hannes-ucsc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your commit titles tend to be a little too specific. You typically don't need to identify the artifacts that you are modifying. The title should document the intent, specify the refactoring, or call out the issue being fixed. Which artifacts are being modified is usually immediately apparent from the diff. Please change the titles to this:
Add FIXME (#7183)
Use type alias for entity type
[A] Workaround: Manifest content hash computation times out (#6123)
Fix method type hint
Refactor unit test
src/azul/service/manifest_service.py
Outdated
| hash_value = response.aggregations.hash.value | ||
| log.info('Manifest content hash %i was computed in %.3fs using filters %r.', | ||
| hash_value = str(response.aggregations.hash.value) | ||
| log.info('Content hash %r was computed in %.3fs using filters %r.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Computed content hash %r from %s matching %r'
Not sure about period at the end. Match existing conventions.
hannes-ucsc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and no FIXUPS please next time and push commits individually so that there is a status check for each one.
557c290 to
b5e74ab
Compare
Linked issues: #6123
Checklist
Author
developissues/<GitHub handle of author>/<issue#>-<slug>1 when the issue title describes a problem, the corresponding PR
title is
Fix:followed by the issue titleAuthor (partiality)
ptag to titles of partial commitspartialor completely resolves all linked issuespartiallabelAuthor (reindex)
rtag to commit title or the changes introduced by this PR will not require reindexing of any deploymentreindex:devor the changes introduced by it will not require reindexing ofdevreindex:anvildevor the changes introduced by it will not require reindexing ofanvildevreindex:anvilprodor the changes introduced by it will not require reindexing ofanvilprodreindex:prodor the changes introduced by it will not require reindexing ofprodreindex:partialand its description documents the specific reindexing procedure fordev,anvildev,anvilprodandprodor requires a full reindex or carries none of the labelsreindex:dev,reindex:anvildev,reindex:anvilprodandreindex:prodAuthor (API changes)
APIor this PR does not modify a REST APIa(A) tag to commit title for backwards (in)compatible changes or this PR does not modify a REST APIapp.pyor this PR does not modify a REST APIAuthor (upgrading deployments)
make docker_images.jsonand committed the resulting changes or this PR does not modifyazul_docker_images, or any other variables referenced in the definition of that variableutag to commit title or this PR does not require upgrading deploymentsupgradeor does not require upgrading deploymentsdeploy:sharedor does not modifydocker_images.json, and does not require deploying thesharedcomponent for any other reasondeploy:gitlabor does not require deploying thegitlabcomponentdeploy:runneror does not require deploying therunnerimageAuthor (hotfixes)
Ftag to main commit title or this PR does not include permanent fix for a temporary hotfixanvilprodandprod) have temporary hotfixes for any of the issues linked to this PRAuthor (before every review)
develop, squashed fixups from prior reviewsmake requirements_updateor this PR does not modifyrequirements*.txt,common.mk,Makefile,Dockerfileorenvironment.bootRtag to commit title or this PR does not modifyrequirements*.txtreqsor does not modifyrequirements*.txtmake integration_testpasses in personal deployment or this PR does not modify functionality that could affect the IT outcomePeer reviewer (after approval)
Note that when requesting changes, the PR must be assigned back to the author.
System administrator (after approval)
demoorno demono demono sandboxN reviewslabel is accurateOperator
reindex:…labels andrcommit title tagno demodevelopOperator (deploy
.sharedand.gitlabcomponents)_select dev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unusedor this PR is not labeleddeploy:shared_select dev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab applyor this PR is not labeleddeploy:gitlab_select anvildev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unusedor this PR is not labeleddeploy:shared_select anvildev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab applyor this PR is not labeleddeploy:gitlabdeploy:gitlabdeploy:gitlabSystem administrator (post-deploy of
.gitlabcomponent)dev.gitlabare complete or this PR is not labeleddeploy:gitlabanvildev.gitlabare complete or this PR is not labeleddeploy:gitlabOperator (deploy runner image)
_select dev.gitlab && make -C terraform/gitlab/runneror this PR is not labeleddeploy:runner_select anvildev.gitlab && make -C terraform/gitlab/runneror this PR is not labeleddeploy:runnerOperator (sandbox build)
sandboxlabel or PR is labeledno sandboxdevor PR is labeledno sandboxanvildevor PR is labeledno sandboxsandboxdeployment or PR is labeledno sandboxanvilboxdeployment or PR is labeledno sandboxsandboxdeployment or PR is labeledno sandboxanvilboxdeployment or PR is labeledno sandboxsandboxor this PR does not remove catalogs or otherwise causes unreferenced indices indevanvilboxor this PR does not remove catalogs or otherwise causes unreferenced indices inanvildevsandboxor this PR is not labeledreindex:devanvilboxor this PR is not labeledreindex:anvildevsandboxor this PR is not labeledreindex:devanvilboxor this PR is not labeledreindex:anvildevOperator (merge the branch)
pif the PR is also labeledpartialOperator (main build)
devanvildevdevdevanvildevanvildev_select dev.shared && make -C terraform/shared applyor this PR is not labeleddeploy:shared_select anvildev.shared && make -C terraform/shared applyor this PR is not labeleddeploy:shareddevanvildevOperator (reindex)
devor this PR is neither labeledreindex:partialnorreindex:devanvildevor this PR is neither labeledreindex:partialnorreindex:anvildevdevor this PR is neither labeledreindex:partialnorreindex:devanvildevor this PR is neither labeledreindex:partialnorreindex:anvildevdevor this PR is neither labeledreindex:partialnorreindex:devanvildevor this PR is neither labeledreindex:partialnorreindex:anvildevdevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdevor this PR does not require reindexingdevdevor this PR does not require reindexingdevdeploy_browserjob in the GitLab pipeline for this PR indevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdeploy_browserjob in the GitLab pipeline for this PR inanvildevor this PR does not require reindexinganvildevOperator (mirroring)
devor this PR does not require mirroringdevanvildevor this PR does not require mirroringanvildevdevor this PR does not require mirroringdevanvildevor this PR does not require mirroringanvildevdevor this PR does not require mirroringdevanvildevor this PR does not require mirroringanvildevOperator
deploy:shared,deploy:gitlab,deploy:runner,API,reindex:partial,reindex:anvilprodandreindex:prodlabels to the next promotion PRs or this PR carries none of these labelsdeploy:shared,deploy:gitlab,deploy:runner,API,reindex:partial,reindex:anvilprodandreindex:prodlabels, from the description of this PR to that of the next promotion PRs or this PR carries none of these labelsShorthand for review comments
Lline is too longWline wrapping is wrongQbad quotesFother formatting problem