Skip to content

AutoMLForecastingTrainingJob fails with error 404 Model does not exist #6581

@Crowiant

Description

@Crowiant

Environment details

  • OS type and version: Doesn't matter, initially catch in Cloud Composer.
  • Python version: 3.11.8
  • pip version: 23.2.1
  • google-cloud-aiplatform version: 1.142.0

Steps to reproduce

  1. Run AutoMLForecastingTrainingJob.run several times, not simultaneously to not exceed quota.
  2. Some runs will fail with error: google.api_core.exceptions.NotFound: 404 The Model does not exist. For me 1-2 of the jobs from 10 failed with this error.

Code example

System test from here : https://github.com/apache/airflow/blob/main/providers/google/tests/system/google/cloud/vertex_ai/example_vertex_ai_batch_prediction_job.py

Stack trace

File "/opt/python3.11/lib/python3.11/site-packages/airflow/providers/google/cloud/hooks/vertex_ai/auto_ml.py", line 706, in create_auto_ml_forecasting_training_job
    model = self._job.run(
            ^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/google/cloud/aiplatform/training_jobs.py", line 2212, in run
    return self._run(
           ^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/google/cloud/aiplatform/base.py", line 862, in wrapper
    return method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/google/cloud/aiplatform/training_jobs.py", line 2666, in _run
    new_model = self._run_job(
                ^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/google/cloud/aiplatform/training_jobs.py", line 855, in _run_job
    model = self._get_model(block=block)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/google/cloud/aiplatform/training_jobs.py", line 953, in _get_model
    return models.Model(
           ^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/google/cloud/aiplatform/models.py", line 5238, in __init__
    self._gca_resource = self._get_gca_resource(resource_name=versioned_model_name)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/google/cloud/aiplatform/base.py", line 691, in _get_gca_resource
    return getattr(self.api_client, self._getter_method)(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/google/cloud/aiplatform_v1/services/model_service/client.py", line 1145, in get_model
    response = rpc(
               ^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 294, in retry_wrapped_func
    return retry_target(
           ^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 156, in retry_target
    next_sleep = _retry_error_helper(
                 ^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/google/api_core/retry/retry_base.py", line 214, in _retry_error_helper
    raise final_exc from source_exc
  File "/opt/python3.11/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 147, in retry_target
    result = target()
             ^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/google/api_core/grpc_helpers.py", line 77, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.NotFound: 404 The Model does not exist.

I checked model in Google Cloud and model exists, no error or failed status. Maybe Model registry needs more time to deploy the model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: vertex-aiIssues related to the googleapis/python-aiplatform API.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions