gcp_aiplatform_endpoint

ancestors

Type: UNORDERED_LIST_STRING

client_connection_config

Type: STRUCT
Provider name: clientConnectionConfig
Description: Configurations that are applied to the endpoint for online prediction.

  • inference_timeout
    Type: STRING
    Provider name: inferenceTimeout
    Description: Customizable online prediction request timeout.

create_time

Type: TIMESTAMP
Provider name: createTime
Description: Output only. Timestamp when this Endpoint was created.

dedicated_endpoint_dns

Type: STRING
Provider name: dedicatedEndpointDns
Description: Output only. DNS of the dedicated endpoint. Will only be populated if dedicated_endpoint_enabled is true. Depending on the features enabled, uid might be a random number or a string. For example, if fast_tryout is enabled, uid will be fasttryout. Format: https://{endpoint_id}.{region}-{uid}.prediction.vertexai.goog.

dedicated_endpoint_enabled

Type: BOOLEAN
Provider name: dedicatedEndpointEnabled
Description: If true, the endpoint will be exposed through a dedicated DNS [Endpoint.dedicated_endpoint_dns]. Your request to the dedicated DNS will be isolated from other users’ traffic and will have better performance and reliability. Note: Once you enabled dedicated endpoint, you won’t be able to send request to the shared DNS {region}-aiplatform.googleapis.com. The limitation will be removed soon.

deployed_models

Type: UNORDERED_LIST_STRUCT
Provider name: deployedModels
Description: Output only. The models deployed in this Endpoint. To add or remove DeployedModels use EndpointService.DeployModel and EndpointService.UndeployModel respectively.

  • automatic_resources
    Type: STRUCT
    Provider name: automaticResources
    Description: A description of resources that to large degree are decided by Vertex AI, and require only a modest additional configuration.
    • max_replica_count
      Type: INT32
      Provider name: maxReplicaCount
      Description: Immutable. The maximum number of replicas that may be deployed on when the traffic against it increases. If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale to that many replicas is guaranteed (barring service outages). If traffic increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, a no upper bound for scaling under heavy traffic will be assume, though Vertex AI may be unable to scale beyond certain replica number.
    • min_replica_count
      Type: INT32
      Provider name: minReplicaCount
      Description: Immutable. The minimum number of replicas that will be always deployed on. If traffic against it increases, it may dynamically be deployed onto more replicas up to max_replica_count, and as traffic decreases, some of these extra replicas may be freed. If the requested value is too large, the deployment will error.
  • create_time
    Type: TIMESTAMP
    Provider name: createTime
    Description: Output only. Timestamp when the DeployedModel was created.
  • dedicated_resources
    Type: STRUCT
    Provider name: dedicatedResources
    Description: A description of resources that are dedicated to the DeployedModel, and that need a higher degree of manual configuration.
    • autoscaling_metric_specs
      Type: UNORDERED_LIST_STRUCT
      Provider name: autoscalingMetricSpecs
      Description: Immutable. The metric specifications that overrides a resource utilization metric (CPU utilization, accelerator’s duty cycle, and so on) target value (default to 60 if not set). At most one entry is allowed per metric. If machine_spec.accelerator_count is above 0, the autoscaling will be based on both CPU utilization and accelerator’s duty cycle metrics and scale up when either metrics exceeds its target value while scale down if both metrics are under their target value. The default target value is 60 for both metrics. If machine_spec.accelerator_count is 0, the autoscaling will be based on CPU utilization metric only with default target value 60 if not explicitly set. For example, in the case of Online Prediction, if you want to override target CPU utilization to 80, you should set autoscaling_metric_specs.metric_name to aiplatform.googleapis.com/prediction/online/cpu/utilization and autoscaling_metric_specs.target to 80.
      • metric_name
        Type: STRING
        Provider name: metricName
        Description: Required. The resource metric name. Supported metrics: * For Online Prediction: * aiplatform.googleapis.com/prediction/online/accelerator/duty_cycle * aiplatform.googleapis.com/prediction/online/cpu/utilization
      • target
        Type: INT32
        Provider name: target
        Description: The target resource utilization in percentage (1% - 100%) for the given metric; once the real usage deviates from the target by a certain percentage, the machine replicas change. The default value is 60 (representing 60%) if not provided.
    • machine_spec
      Type: STRUCT
      Provider name: machineSpec
      Description: Required. Immutable. The specification of a single machine being used.
      • accelerator_count
        Type: INT32
        Provider name: acceleratorCount
        Description: The number of accelerators to attach to the machine.
      • accelerator_type
        Type: STRING
        Provider name: acceleratorType
        Description: Immutable. The type of accelerator(s) that may be attached to the machine as per accelerator_count.
        Possible values:
        • ACCELERATOR_TYPE_UNSPECIFIED - Unspecified accelerator type, which means no accelerator.
        • NVIDIA_TESLA_K80 - Deprecated: Nvidia Tesla K80 GPU has reached end of support, see https://cloud.google.com/compute/docs/eol/k80-eol.
        • NVIDIA_TESLA_P100 - Nvidia Tesla P100 GPU.
        • NVIDIA_TESLA_V100 - Nvidia Tesla V100 GPU.
        • NVIDIA_TESLA_P4 - Nvidia Tesla P4 GPU.
        • NVIDIA_TESLA_T4 - Nvidia Tesla T4 GPU.
        • NVIDIA_TESLA_A100 - Nvidia Tesla A100 GPU.
        • NVIDIA_A100_80GB - Nvidia A100 80GB GPU.
        • NVIDIA_L4 - Nvidia L4 GPU.
        • NVIDIA_H100_80GB - Nvidia H100 80Gb GPU.
        • NVIDIA_H100_MEGA_80GB - Nvidia H100 Mega 80Gb GPU.
        • NVIDIA_H200_141GB - Nvidia H200 141Gb GPU.
        • TPU_V2 - TPU v2.
        • TPU_V3 - TPU v3.
        • TPU_V4_POD - TPU v4.
        • TPU_V5_LITEPOD - TPU v5.
      • machine_type
        Type: STRING
        Provider name: machineType
        Description: Immutable. The type of the machine. See the list of machine types supported for prediction See the list of machine types supported for custom training. For DeployedModel this field is optional, and the default value is n1-standard-2. For BatchPredictionJob or as part of WorkerPoolSpec this field is required.
      • reservation_affinity
        Type: STRUCT
        Provider name: reservationAffinity
        Description: Optional. Immutable. Configuration controlling how this resource pool consumes reservation.
        • key
          Type: STRING
          Provider name: key
          Description: Optional. Corresponds to the label key of a reservation resource. To target a SPECIFIC_RESERVATION by name, use compute.googleapis.com/reservation-name as the key and specify the name of your reservation as its value.
        • reservation_affinity_type
          Type: STRING
          Provider name: reservationAffinityType
          Description: Required. Specifies the reservation affinity type.
          Possible values:
          • TYPE_UNSPECIFIED - Default value. This should not be used.
          • NO_RESERVATION - Do not consume from any reserved capacity, only use on-demand.
          • ANY_RESERVATION - Consume any reservation available, falling back to on-demand.
          • SPECIFIC_RESERVATION - Consume from a specific reservation. When chosen, the reservation must be identified via the key and values fields.
        • values
          Type: UNORDERED_LIST_STRING
          Provider name: values
          Description: Optional. Corresponds to the label values of a reservation resource. This must be the full resource name of the reservation or reservation block.
      • tpu_topology
        Type: STRING
        Provider name: tpuTopology
        Description: Immutable. The topology of the TPUs. Corresponds to the TPU topologies available from GKE. (Example: tpu_topology: “2x2x1”).
    • max_replica_count
      Type: INT32
      Provider name: maxReplicaCount
      Description: Immutable. The maximum number of replicas that may be deployed on when the traffic against it increases. If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale to that many replicas is guaranteed (barring service outages). If traffic increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, will use min_replica_count as the default value. The value of this field impacts the charge against Vertex CPU and GPU quotas. Specifically, you will be charged for (max_replica_count * number of cores in the selected machine type) and (max_replica_count * number of GPUs per replica in the selected machine type).
    • min_replica_count
      Type: INT32
      Provider name: minReplicaCount
      Description: Required. Immutable. The minimum number of machine replicas that will be always deployed on. This value must be greater than or equal to 1. If traffic increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed.
    • required_replica_count
      Type: INT32
      Provider name: requiredReplicaCount
      Description: Optional. Number of required available replicas for the deployment to succeed. This field is only needed when partial deployment/mutation is desired. If set, the deploy/mutate operation will succeed once available_replica_count reaches required_replica_count, and the rest of the replicas will be retried. If not set, the default required_replica_count will be min_replica_count.
    • spot
      Type: BOOLEAN
      Provider name: spot
      Description: Optional. If true, schedule the deployment workload on spot VMs.
  • disable_container_logging
    Type: BOOLEAN
    Provider name: disableContainerLogging
    Description: For custom-trained Models and AutoML Tabular Models, the container of the DeployedModel instances will send stderr and stdout streams to Cloud Logging by default. Please note that the logs incur cost, which are subject to Cloud Logging pricing. User can disable container logging by setting this flag to true.
  • disable_explanations
    Type: BOOLEAN
    Provider name: disableExplanations
    Description: If true, deploy the model without explainable feature, regardless the existence of Model.explanation_spec or explanation_spec.
  • enable_access_logging
    Type: BOOLEAN
    Provider name: enableAccessLogging
    Description: If true, online prediction access logs are sent to Cloud Logging. These logs are like standard server access logs, containing information like timestamp and latency for each prediction request. Note that logs may incur a cost, especially if your project receives prediction requests at a high queries per second rate (QPS). Estimate your costs before enabling this option.
  • explanation_spec
    Type: STRUCT
    Provider name: explanationSpec
    Description: Explanation configuration for this DeployedModel. When deploying a Model using EndpointService.DeployModel, this value overrides the value of Model.explanation_spec. All fields of explanation_spec are optional in the request. If a field of explanation_spec is not populated, the value of the same field of Model.explanation_spec is inherited. If the corresponding Model.explanation_spec is not populated, all fields of the explanation_spec will be used for the explanation configuration.
    • metadata
      Type: STRUCT
      Provider name: metadata
      Description: Optional. Metadata describing the Model’s input and output for explanation.
      • feature_attributions_schema_uri
        Type: STRING
        Provider name: featureAttributionsSchemaUri
        Description: Points to a YAML file stored on Google Cloud Storage describing the format of the feature attributions. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML tabular Models always have this field populated by Vertex AI. Note: The URI given on output may be different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.
      • latent_space_source
        Type: STRING
        Provider name: latentSpaceSource
        Description: Name of the source to generate embeddings for example based explanations.
    • parameters
      Type: STRUCT
      Provider name: parameters
      Description: Required. Parameters that configure explaining of the Model’s predictions.
      • examples
        Type: STRUCT
        Provider name: examples
        Description: Example-based explanations that returns the nearest neighbors from the provided dataset.
        • example_gcs_source
          Type: STRUCT
          Provider name: exampleGcsSource
          Description: The Cloud Storage input instances.
          • data_format
            Type: STRING
            Provider name: dataFormat
            Description: The format in which instances are given, if not specified, assume it’s JSONL format. Currently only JSONL format is supported.
            Possible values:
            • DATA_FORMAT_UNSPECIFIED - Format unspecified, used when unset.
            • JSONL - Examples are stored in JSONL files.
          • gcs_source
            Type: STRUCT
            Provider name: gcsSource
            Description: The Cloud Storage location for the input instances.
        • neighbor_count
          Type: INT32
          Provider name: neighborCount
          Description: The number of neighbors to return when querying for examples.
        • presets
          Type: STRUCT
          Provider name: presets
          Description: Simplified preset configuration, which automatically sets configuration values based on the desired query speed-precision trade-off and modality.
          • modality
            Type: STRING
            Provider name: modality
            Description: The modality of the uploaded model, which automatically configures the distance measurement and feature normalization for the underlying example index and queries. If your model does not precisely fit one of these types, it is okay to choose the closest type.
            Possible values:
            • MODALITY_UNSPECIFIED - Should not be set. Added as a recommended best practice for enums
            • IMAGE - IMAGE modality
            • TEXT - TEXT modality
            • TABULAR - TABULAR modality
          • query
            Type: STRING
            Provider name: query
            Description: Preset option controlling parameters for speed-precision trade-off when querying for examples. If omitted, defaults to PRECISE.
            Possible values:
            • PRECISE - More precise neighbors as a trade-off against slower response.
            • FAST - Faster response as a trade-off against less precise neighbors.
      • integrated_gradients_attribution
        Type: STRUCT
        Provider name: integratedGradientsAttribution
        Description: An attribution method that computes Aumann-Shapley values taking advantage of the model’s fully differentiable structure. Refer to this paper for more details: https://arxiv.org/abs/1703.01365
        • blur_baseline_config
          Type: STRUCT
          Provider name: blurBaselineConfig
          Description: Config for IG with blur baseline. When enabled, a linear path from the maximally blurred image to the input image is created. Using a blurred baseline instead of zero (black image) is motivated by the BlurIG approach explained here: https://arxiv.org/abs/2004.03383
          • max_blur_sigma
            Type: FLOAT
            Provider name: maxBlurSigma
            Description: The standard deviation of the blur kernel for the blurred baseline. The same blurring parameter is used for both the height and the width dimension. If not set, the method defaults to the zero (i.e. black for images) baseline.
        • smooth_grad_config
          Type: STRUCT
          Provider name: smoothGradConfig
          Description: Config for SmoothGrad approximation of gradients. When enabled, the gradients are approximated by averaging the gradients from noisy samples in the vicinity of the inputs. Adding noise can help improve the computed gradients. Refer to this paper for more details: https://arxiv.org/pdf/1706.03825.pdf
          • feature_noise_sigma
            Type: STRUCT
            Provider name: featureNoiseSigma
            Description: This is similar to noise_sigma, but provides additional flexibility. A separate noise sigma can be provided for each feature, which is useful if their distributions are different. No noise is added to features that are not set. If this field is unset, noise_sigma will be used for all features.
            • noise_sigma
              Type: UNORDERED_LIST_STRUCT
              Provider name: noiseSigma
              Description: Noise sigma per feature. No noise is added to features that are not set.
              • name
                Type: STRING
                Provider name: name
                Description: The name of the input feature for which noise sigma is provided. The features are defined in explanation metadata inputs.
              • sigma
                Type: FLOAT
                Provider name: sigma
                Description: This represents the standard deviation of the Gaussian kernel that will be used to add noise to the feature prior to computing gradients. Similar to noise_sigma but represents the noise added to the current feature. Defaults to 0.1.
          • noise_sigma
            Type: FLOAT
            Provider name: noiseSigma
            Description: This is a single float value and will be used to add noise to all the features. Use this field when all features are normalized to have the same distribution: scale to range [0, 1], [-1, 1] or z-scoring, where features are normalized to have 0-mean and 1-variance. Learn more about normalization. For best results the recommended value is about 10% - 20% of the standard deviation of the input feature. Refer to section 3.2 of the SmoothGrad paper: https://arxiv.org/pdf/1706.03825.pdf. Defaults to 0.1. If the distribution is different per feature, set feature_noise_sigma instead for each feature.
          • noisy_sample_count
            Type: INT32
            Provider name: noisySampleCount
            Description: The number of gradient samples to use for approximation. The higher this number, the more accurate the gradient is, but the runtime complexity increases by this factor as well. Valid range of its value is [1, 50]. Defaults to 3.
        • step_count
          Type: INT32
          Provider name: stepCount
          Description: Required. The number of steps for approximating the path integral. A good value to start is 50 and gradually increase until the sum to diff property is within the desired error range. Valid range of its value is [1, 100], inclusively.
      • sampled_shapley_attribution
        Type: STRUCT
        Provider name: sampledShapleyAttribution
        Description: An attribution method that approximates Shapley values for features that contribute to the label being predicted. A sampling strategy is used to approximate the value rather than considering all subsets of features. Refer to this paper for model details: https://arxiv.org/abs/1306.4265.
        • path_count
          Type: INT32
          Provider name: pathCount
          Description: Required. The number of feature permutations to consider when approximating the Shapley values. Valid range of its value is [1, 50], inclusively.
      • top_k
        Type: INT32
        Provider name: topK
        Description: If populated, returns attributions for top K indices of outputs (defaults to 1). Only applies to Models that predicts more than one outputs (e,g, multi-class Models). When set to -1, returns explanations for all outputs.
      • xrai_attribution
        Type: STRUCT
        Provider name: xraiAttribution
        Description: An attribution method that redistributes Integrated Gradients attribution to segmented regions, taking advantage of the model’s fully differentiable structure. Refer to this paper for more details: https://arxiv.org/abs/1906.02825 XRAI currently performs better on natural images, like a picture of a house or an animal. If the images are taken in artificial environments, like a lab or manufacturing line, or from diagnostic equipment, like x-rays or quality-control cameras, use Integrated Gradients instead.
        • blur_baseline_config
          Type: STRUCT
          Provider name: blurBaselineConfig
          Description: Config for XRAI with blur baseline. When enabled, a linear path from the maximally blurred image to the input image is created. Using a blurred baseline instead of zero (black image) is motivated by the BlurIG approach explained here: https://arxiv.org/abs/2004.03383
          • max_blur_sigma
            Type: FLOAT
            Provider name: maxBlurSigma
            Description: The standard deviation of the blur kernel for the blurred baseline. The same blurring parameter is used for both the height and the width dimension. If not set, the method defaults to the zero (i.e. black for images) baseline.
        • smooth_grad_config
          Type: STRUCT
          Provider name: smoothGradConfig
          Description: Config for SmoothGrad approximation of gradients. When enabled, the gradients are approximated by averaging the gradients from noisy samples in the vicinity of the inputs. Adding noise can help improve the computed gradients. Refer to this paper for more details: https://arxiv.org/pdf/1706.03825.pdf
          • feature_noise_sigma
            Type: STRUCT
            Provider name: featureNoiseSigma
            Description: This is similar to noise_sigma, but provides additional flexibility. A separate noise sigma can be provided for each feature, which is useful if their distributions are different. No noise is added to features that are not set. If this field is unset, noise_sigma will be used for all features.
            • noise_sigma
              Type: UNORDERED_LIST_STRUCT
              Provider name: noiseSigma
              Description: Noise sigma per feature. No noise is added to features that are not set.
              • name
                Type: STRING
                Provider name: name
                Description: The name of the input feature for which noise sigma is provided. The features are defined in explanation metadata inputs.
              • sigma
                Type: FLOAT
                Provider name: sigma
                Description: This represents the standard deviation of the Gaussian kernel that will be used to add noise to the feature prior to computing gradients. Similar to noise_sigma but represents the noise added to the current feature. Defaults to 0.1.
          • noise_sigma
            Type: FLOAT
            Provider name: noiseSigma
            Description: This is a single float value and will be used to add noise to all the features. Use this field when all features are normalized to have the same distribution: scale to range [0, 1], [-1, 1] or z-scoring, where features are normalized to have 0-mean and 1-variance. Learn more about normalization. For best results the recommended value is about 10% - 20% of the standard deviation of the input feature. Refer to section 3.2 of the SmoothGrad paper: https://arxiv.org/pdf/1706.03825.pdf. Defaults to 0.1. If the distribution is different per feature, set feature_noise_sigma instead for each feature.
          • noisy_sample_count
            Type: INT32
            Provider name: noisySampleCount
            Description: The number of gradient samples to use for approximation. The higher this number, the more accurate the gradient is, but the runtime complexity increases by this factor as well. Valid range of its value is [1, 50]. Defaults to 3.
        • step_count
          Type: INT32
          Provider name: stepCount
          Description: Required. The number of steps for approximating the path integral. A good value to start is 50 and gradually increase until the sum to diff property is met within the desired error range. Valid range of its value is [1, 100], inclusively.
  • faster_deployment_config
    Type: STRUCT
    Provider name: fasterDeploymentConfig
    Description: Configuration for faster model deployment.
    • fast_tryout_enabled
      Type: BOOLEAN
      Provider name: fastTryoutEnabled
      Description: If true, enable fast tryout feature for this deployed model.
  • gcp_display_name
    Type: STRING
    Provider name: displayName
    Description: The display name of the DeployedModel. If not provided upon creation, the Model’s display_name is used.
  • gcp_status
    Type: STRUCT
    Provider name: status
    Description: Output only. Runtime status of the deployed model.
    • available_replica_count
      Type: INT32
      Provider name: availableReplicaCount
      Description: Output only. The number of available replicas of the deployed model.
    • last_update_time
      Type: TIMESTAMP
      Provider name: lastUpdateTime
      Description: Output only. The time at which the status was last updated.
    • message
      Type: STRING
      Provider name: message
      Description: Output only. The latest deployed model’s status message (if any).
  • id
    Type: STRING
    Provider name: id
    Description: Immutable. The ID of the DeployedModel. If not provided upon deployment, Vertex AI will generate a value for this ID. This value should be 1-10 characters, and valid characters are /[0-9]/.
  • model
    Type: STRING
    Provider name: model
    Description: Required. The resource name of the Model that this is the deployment of. Note that the Model may be in a different location than the DeployedModel’s Endpoint. The resource name may contain version id or version alias to specify the version. Example: projects/{project}/locations/{location}/models/{model}@2 or projects/{project}/locations/{location}/models/{model}@golden if no version is specified, the default version will be deployed.
  • model_version_id
    Type: STRING
    Provider name: modelVersionId
    Description: Output only. The version ID of the model that is deployed.
  • private_endpoints
    Type: STRUCT
    Provider name: privateEndpoints
    Description: Output only. Provide paths for users to send predict/explain/health requests directly to the deployed model services running on Cloud via private services access. This field is populated if network is configured.
    • explain_http_uri
      Type: STRING
      Provider name: explainHttpUri
      Description: Output only. Http(s) path to send explain requests.
    • health_http_uri
      Type: STRING
      Provider name: healthHttpUri
      Description: Output only. Http(s) path to send health check requests.
    • predict_http_uri
      Type: STRING
      Provider name: predictHttpUri
      Description: Output only. Http(s) path to send prediction requests.
    • service_attachment
      Type: STRING
      Provider name: serviceAttachment
      Description: Output only. The name of the service attachment resource. Populated if private service connect is enabled.
  • service_account
    Type: STRING
    Provider name: serviceAccount
    Description: The service account that the DeployedModel’s container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn’t have access to the resource project. Users deploying the Model must have the iam.serviceAccounts.actAs permission on this service account.
  • shared_resources
    Type: STRING
    Provider name: sharedResources
    Description: The resource name of the shared DeploymentResourcePool to deploy on. Format: projects/{project}/locations/{location}/deploymentResourcePools/{deployment_resource_pool}
  • speculative_decoding_spec
    Type: STRUCT
    Provider name: speculativeDecodingSpec
    Description: Optional. Spec for configuring speculative decoding.
    • draft_model_speculation
      Type: STRUCT
      Provider name: draftModelSpeculation
      Description: draft model speculation.
      • draft_model
        Type: STRING
        Provider name: draftModel
        Description: Required. The resource name of the draft model.
    • ngram_speculation
      Type: STRUCT
      Provider name: ngramSpeculation
      Description: N-Gram speculation.
      • ngram_size
        Type: INT32
        Provider name: ngramSize
        Description: The number of last N input tokens used as ngram to search/match against the previous prompt sequence. This is equal to the N in N-Gram. The default value is 3 if not specified.
    • speculative_token_count
      Type: INT32
      Provider name: speculativeTokenCount
      Description: The number of speculative tokens to generate at each step.

description

Type: STRING
Provider name: description
Description: The description of the Endpoint.

enable_private_service_connect

Type: BOOLEAN
Provider name: enablePrivateServiceConnect
Description: Deprecated: If true, expose the Endpoint via private service connect. Only one of the fields, network or enable_private_service_connect, can be set.

encryption_spec

Type: STRUCT
Provider name: encryptionSpec
Description: Customer-managed encryption key spec for an Endpoint. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key.

  • kms_key_name
    Type: STRING
    Provider name: kmsKeyName
    Description: Required. The Cloud KMS resource identifier of the customer managed encryption key used to protect a resource. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created.

etag

Type: STRING
Provider name: etag
Description: Used to perform consistent read-modify-write updates. If not set, a blind “overwrite” update happens.

gcp_display_name

Type: STRING
Provider name: displayName
Description: Required. The display name of the Endpoint. The name can be up to 128 characters long and can consist of any UTF-8 characters.

gen_ai_advanced_features_config

Type: STRUCT
Provider name: genAiAdvancedFeaturesConfig
Description: Optional. Configuration for GenAiAdvancedFeatures. If the endpoint is serving GenAI models, advanced features like native RAG integration can be configured. Currently, only Model Garden models are supported.

  • rag_config
    Type: STRUCT
    Provider name: ragConfig
    Description: Configuration for Retrieval Augmented Generation feature.
    • enable_rag
      Type: BOOLEAN
      Provider name: enableRag
      Description: If true, enable Retrieval Augmented Generation in ChatCompletion request. Once enabled, the endpoint will be identified as GenAI endpoint and Arthedain router will be used.

labels

Type: UNORDERED_LIST_STRING

model_deployment_monitoring_job

Type: STRING
Provider name: modelDeploymentMonitoringJob
Description: Output only. Resource name of the Model Monitoring job associated with this Endpoint if monitoring is enabled by JobService.CreateModelDeploymentMonitoringJob. Format: projects/{project}/locations/{location}/modelDeploymentMonitoringJobs/{model_deployment_monitoring_job}

name

Type: STRING
Provider name: name
Description: Output only. The resource name of the Endpoint.

network

Type: STRING
Provider name: network
Description: Optional. The full name of the Google Compute Engine network to which the Endpoint should be peered. Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network. Only one of the fields, network or enable_private_service_connect, can be set. Format: projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is network name.

organization_id

Type: STRING

parent

Type: STRING

predict_request_response_logging_config

Type: STRUCT
Provider name: predictRequestResponseLoggingConfig
Description: Configures the request-response logging for online prediction.

  • bigquery_destination
    Type: STRUCT
    Provider name: bigqueryDestination
    Description: BigQuery table for logging. If only given a project, a new dataset will be created with name logging__ where will be made BigQuery-dataset-name compatible (e.g. most special characters will become underscores). If no table name is given, a new table will be created with name request_response_logging
    • output_uri
      Type: STRING
      Provider name: outputUri
      Description: Required. BigQuery URI to a project or table, up to 2000 characters long. When only the project is specified, the Dataset and Table is created. When the full table reference is specified, the Dataset must exist and table must not exist. Accepted forms: * BigQuery path. For example: bq://projectId or bq://projectId.bqDatasetId or bq://projectId.bqDatasetId.bqTableId.
  • enabled
    Type: BOOLEAN
    Provider name: enabled
    Description: If logging is enabled or not.
  • sampling_rate
    Type: DOUBLE
    Provider name: samplingRate
    Description: Percentage of requests to be logged, expressed as a fraction in range(0,1].

private_service_connect_config

Type: STRUCT
Provider name: privateServiceConnectConfig
Description: Optional. Configuration for private service connect. network and private_service_connect_config are mutually exclusive.

  • enable_private_service_connect
    Type: BOOLEAN
    Provider name: enablePrivateServiceConnect
    Description: Required. If true, expose the IndexEndpoint via private service connect.
  • project_allowlist
    Type: UNORDERED_LIST_STRING
    Provider name: projectAllowlist
    Description: A list of Projects from which the forwarding rule will target the service attachment.
  • service_attachment
    Type: STRING
    Provider name: serviceAttachment
    Description: Output only. The name of the generated service attachment resource. This is only populated if the endpoint is deployed with PrivateServiceConnect.

project_id

Type: STRING

project_number

Type: STRING

resource_name

Type: STRING

satisfies_pzi

Type: BOOLEAN
Provider name: satisfiesPzi
Description: Output only. Reserved for future use.

satisfies_pzs

Type: BOOLEAN
Provider name: satisfiesPzs
Description: Output only. Reserved for future use.

tags

Type: UNORDERED_LIST_STRING

update_time

Type: TIMESTAMP
Provider name: updateTime
Description: Output only. Timestamp when this Endpoint was last updated.

PREVIEWING: aliciascott/DOCS-10646-HA-Agent