使用 gcloud CLI 或 Vertex AI API 部署模型

如需使用 gcloud CLI 或 Vertex AI API 将模型部署到公共端点,您需要获取现有端点的端点 ID,然后将模型部署到该端点。

获取端点 ID

您需要端点 ID 才能部署模型。

gcloud

以下示例使用 gcloud ai endpoints list 命令

  gcloud ai endpoints list \       --region=LOCATION_ID \       --filter=display_name=ENDPOINT_NAME 

替换以下内容:

  • LOCATION_ID:您在其中使用 Vertex AI 的区域。
  • ENDPOINT_NAME:端点的显示名称。

请注意 ENDPOINT_ID 列中显示的数字。请在以下步骤中使用此 ID。

REST

在使用任何请求数据之前,请先进行以下替换:

  • LOCATION_ID:您在其中使用 Vertex AI 的区域。
  • PROJECT_ID:您的项目 ID
  • ENDPOINT_NAME:端点的显示名称。

HTTP 方法和网址:

GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME

如需发送您的请求,请展开以下选项之一:

您应该收到类似以下内容的 JSON 响应:

 {   "endpoints": [     {       "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID",       "displayName": "ENDPOINT_NAME",       "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx",       "createTime": "2020-04-17T18:31:11.585169Z",       "updateTime": "2020-04-17T18:35:08.568959Z"     }   ] } 
请记下 ENDPOINT_ID

Python

如需了解如何安装或更新 Vertex AI SDK for Python,请参阅安装 Vertex AI SDK for Python。 如需了解详情,请参阅 Python API 参考文档

替换以下内容:

  • PROJECT_ID:您的项目 ID。
  • LOCATION_ID:您在其中使用 Vertex AI 的区域。
  • ENDPOINT_NAME:端点的显示名称。
from google.cloud import aiplatform  PROJECT_ID = "PROJECT_ID" LOCATION = "LOCATION_ID" ENDPOINT_NAME = "ENDPOINT_NAME"  aiplatform.init(     project=PROJECT_ID,     location=LOCATION, )  endpoint = aiplatform.Endpoint.list( filter='display_name=ENDPOINT_NAME', ) endpoint_id = endpoint.name.split("/")[-1] 

部署模型

部署模型时,您需要为已部署的模型指定一个 ID,以便将其与部署到端点的其他模型区分开来。

在下面选择您的语言或环境对应的标签页:

gcloud

以下示例使用 gcloud ai endpoints deploy-model 命令

以下示例将 Model 部署到 Endpoint,但不使用 GPU 来加快预测服务速度,而且未在多个 DeployedModel 资源之间拆分流量:

在使用下面的命令数据之前,请先进行以下替换:

  • ENDPOINT_ID:端点的 ID。
  • LOCATION_ID:您在其中使用 Vertex AI 的区域。
  • MODEL_ID:要部署的模型的 ID。
  • DEPLOYED_MODEL_NAMEDeployedModel 的名称。您还可以将 Model 的显示名用于 DeployedModel
  • MIN_REPLICA_COUNT:此部署的最小节点数。 节点数可根据推理负载的需要而增加或减少,直至达到节点数上限并且绝不会少于此节点数。
  • MAX_REPLICA_COUNT:此部署的节点数上限。 节点数可根据推理负载的需要而增加或减少,直至达到此节点数并且绝不会少于节点数下限。 如果您省略 --max-replica-count 标志,则节点数上限将设置为 --min-replica-count 的值。

执行 gcloud ai endpoints deploy-model 命令:

Linux、macOS 或 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\   --region=LOCATION_ID \   --model=MODEL_ID \   --display-name=DEPLOYED_MODEL_NAME \   --min-replica-count=MIN_REPLICA_COUNT \   --max-replica-count=MAX_REPLICA_COUNT \   --traffic-split=0=100

Windows (PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`   --region=LOCATION_ID `   --model=MODEL_ID `   --display-name=DEPLOYED_MODEL_NAME `   --min-replica-count=MIN_REPLICA_COUNT `   --max-replica-count=MAX_REPLICA_COUNT `   --traffic-split=0=100

Windows (cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^   --region=LOCATION_ID ^   --model=MODEL_ID ^   --display-name=DEPLOYED_MODEL_NAME ^   --min-replica-count=MIN_REPLICA_COUNT ^   --max-replica-count=MAX_REPLICA_COUNT ^   --traffic-split=0=100
 

拆分流量

上述示例中的 --traffic-split=0=100 标志会将 Endpoint 接收的 100% 预测流量发送到新 DeployedModel(使用临时 ID 0 表示)。如果您的 Endpoint 已有其他 DeployedModel 资源,那么您可以在新 DeployedModel 和旧资源之间拆分流量。例如,如需将 20% 的流量发送到新 DeployedModel,将 80% 发送到旧版本,请运行以下命令。

在使用下面的命令数据之前,请先进行以下替换:

  • OLD_DEPLOYED_MODEL_ID:现有 DeployedModel 的 ID。

执行 gcloud ai endpoints deploy-model 命令:

Linux、macOS 或 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\   --region=LOCATION_ID \   --model=MODEL_ID \   --display-name=DEPLOYED_MODEL_NAME \    --min-replica-count=MIN_REPLICA_COUNT \   --max-replica-count=MAX_REPLICA_COUNT \   --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`   --region=LOCATION_ID `   --model=MODEL_ID `   --display-name=DEPLOYED_MODEL_NAME \    --min-replica-count=MIN_REPLICA_COUNT `   --max-replica-count=MAX_REPLICA_COUNT `   --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^   --region=LOCATION_ID ^   --model=MODEL_ID ^   --display-name=DEPLOYED_MODEL_NAME \    --min-replica-count=MIN_REPLICA_COUNT ^   --max-replica-count=MAX_REPLICA_COUNT ^   --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80
 

REST

部署此模型。

在使用任何请求数据之前,请先进行以下替换:

  • LOCATION_ID:您在其中使用 Vertex AI 的区域。
  • PROJECT_ID:您的项目 ID
  • ENDPOINT_ID:端点的 ID。
  • MODEL_ID:要部署的模型的 ID。
  • DEPLOYED_MODEL_NAMEDeployedModel 的名称。您还可以将 Model 的显示名用于 DeployedModel
  • MACHINE_TYPE:可选。用于此部署的每个节点的机器资源。其默认设置为 n1-standard-2详细了解机器类型。
  • ACCELERATOR_TYPE:要挂接到机器的加速器类型。如果未指定 ACCELERATOR_COUNT 或为零,则可选。建议不要用于使用非 GPU 映像的 AutoML 模型或自定义训练模型。了解详情
  • ACCELERATOR_COUNT:每个副本要使用的加速器数量。可选。对于使用非 GPU 映像的 AutoML 模型或自定义模型,应该为零或未指定。
  • MIN_REPLICA_COUNT:此部署的最小节点数。 节点数可根据推理负载的需要而增加或减少,直至达到节点数上限并且绝不会少于此节点数。 此值必须大于或等于 1。
  • MAX_REPLICA_COUNT:此部署的节点数上限。 节点数可根据推理负载的需要而增加或减少,直至达到此节点数并且绝不会少于节点数下限。
  • REQUIRED_REPLICA_COUNT:可选。此部署被标记为成功所需的节点数。必须大于或等于 1,且小于或等于节点数下限。如果未指定,则默认值为节点数下限值。
  • TRAFFIC_SPLIT_THIS_MODEL:流向此端点的要路由到使用此操作部署的模型的预测流量百分比。默认值为 100。所有流量百分比之和必须为 100。详细了解流量拆分
  • DEPLOYED_MODEL_ID_N:可选。如果将其他模型部署到此端点,您必须更新其流量拆分百分比,以便所有百分比之和等于 100。
  • TRAFFIC_SPLIT_MODEL_N:已部署模型 ID 密钥的流量拆分百分比值。
  • PROJECT_NUMBER:自动生成的项目编号

HTTP 方法和网址:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel

请求 JSON 正文:

 {   "deployedModel": {     "model": "projects/PROJECT/locations/us-central1/models/MODEL_ID",     "displayName": "DEPLOYED_MODEL_NAME",     "dedicatedResources": {        "machineSpec": {          "machineType": "MACHINE_TYPE",          "acceleratorType": "ACCELERATOR_TYPE",          "acceleratorCount": "ACCELERATOR_COUNT"        },        "minReplicaCount": MIN_REPLICA_COUNT,        "maxReplicaCount": MAX_REPLICA_COUNT,        "requiredReplicaCount": REQUIRED_REPLICA_COUNT      },   },   "trafficSplit": {     "0": TRAFFIC_SPLIT_THIS_MODEL,     "DEPLOYED_MODEL_ID_1": TRAFFIC_SPLIT_MODEL_1,     "DEPLOYED_MODEL_ID_2": TRAFFIC_SPLIT_MODEL_2   }, } 

如需发送您的请求,请展开以下选项之一:

您应该收到类似以下内容的 JSON 响应:

 {   "name": "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID/operations/OPERATION_ID",   "metadata": {     "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata",     "genericMetadata": {       "createTime": "2020-10-19T17:53:16.502088Z",       "updateTime": "2020-10-19T17:53:16.502088Z"     }   } } 

Java

在尝试此示例之前,请按照《Vertex AI 快速入门:使用客户端库》中的 Java 设置说明执行操作。 如需了解详情,请参阅 Vertex AI Java API 参考文档

如需向 Vertex AI 进行身份验证,请设置应用默认凭证。 如需了解详情,请参阅为本地开发环境设置身份验证

import com.google.api.gax.longrunning.OperationFuture; import com.google.cloud.aiplatform.v1.DedicatedResources; import com.google.cloud.aiplatform.v1.DeployModelOperationMetadata; import com.google.cloud.aiplatform.v1.DeployModelResponse; import com.google.cloud.aiplatform.v1.DeployedModel; import com.google.cloud.aiplatform.v1.EndpointName; import com.google.cloud.aiplatform.v1.EndpointServiceClient; import com.google.cloud.aiplatform.v1.EndpointServiceSettings; import com.google.cloud.aiplatform.v1.MachineSpec; import com.google.cloud.aiplatform.v1.ModelName; import java.io.IOException; import java.util.HashMap; import java.util.Map; import java.util.concurrent.ExecutionException;  public class DeployModelCustomTrainedModelSample {    public static void main(String[] args)       throws IOException, ExecutionException, InterruptedException {     // TODO(developer): Replace these variables before running the sample.     String project = "PROJECT";     String endpointId = "ENDPOINT_ID";     String modelName = "MODEL_NAME";     String deployedModelDisplayName = "DEPLOYED_MODEL_DISPLAY_NAME";     deployModelCustomTrainedModelSample(project, endpointId, modelName, deployedModelDisplayName);   }    static void deployModelCustomTrainedModelSample(       String project, String endpointId, String model, String deployedModelDisplayName)       throws IOException, ExecutionException, InterruptedException {     EndpointServiceSettings settings =         EndpointServiceSettings.newBuilder()             .setEndpoint("us-central1-aiplatform.googleapis.com:443")             .build();     String location = "us-central1";      // Initialize client that will be used to send requests. This client only needs to be created     // once, and can be reused for multiple requests. After completing all of your requests, call     // the "close" method on the client to safely clean up any remaining background resources.     try (EndpointServiceClient client = EndpointServiceClient.create(settings)) {       MachineSpec machineSpec = MachineSpec.newBuilder().setMachineType("n1-standard-2").build();       DedicatedResources dedicatedResources =           DedicatedResources.newBuilder().setMinReplicaCount(1).setMachineSpec(machineSpec).build();        String modelName = ModelName.of(project, location, model).toString();       DeployedModel deployedModel =           DeployedModel.newBuilder()               .setModel(modelName)               .setDisplayName(deployedModelDisplayName)               // `dedicated_resources` must be used for non-AutoML models               .setDedicatedResources(dedicatedResources)               .build();       // key '0' assigns traffic for the newly deployed model       // Traffic percentage values must add up to 100       // Leave dictionary empty if endpoint should not accept any traffic       Map<String, Integer> trafficSplit = new HashMap<>();       trafficSplit.put("0", 100);       EndpointName endpoint = EndpointName.of(project, location, endpointId);       OperationFuture<DeployModelResponse, DeployModelOperationMetadata> response =           client.deployModelAsync(endpoint, deployedModel, trafficSplit);        // You can use OperationFuture.getInitialFuture to get a future representing the initial       // response to the request, which contains information while the operation is in progress.       System.out.format("Operation name: %s\n", response.getInitialFuture().get().getName());        // OperationFuture.get() will block until the operation is finished.       DeployModelResponse deployModelResponse = response.get();       System.out.format("deployModelResponse: %s\n", deployModelResponse);     }   } } 

Python

如需了解如何安装或更新 Vertex AI SDK for Python,请参阅安装 Vertex AI SDK for Python。 如需了解详情,请参阅 Python API 参考文档

def deploy_model_with_dedicated_resources_sample(     project,     location,     model_name: str,     machine_type: str,     endpoint: Optional[aiplatform.Endpoint] = None,     deployed_model_display_name: Optional[str] = None,     traffic_percentage: Optional[int] = 0,     traffic_split: Optional[Dict[str, int]] = None,     min_replica_count: int = 1,     max_replica_count: int = 1,     accelerator_type: Optional[str] = None,     accelerator_count: Optional[int] = None,     explanation_metadata: Optional[explain.ExplanationMetadata] = None,     explanation_parameters: Optional[explain.ExplanationParameters] = None,     metadata: Optional[Sequence[Tuple[str, str]]] = (),     sync: bool = True, ):     """     model_name: A fully-qualified model resource name or model ID.           Example: "projects/123/locations/us-central1/models/456" or           "456" when project and location are initialized or passed.     """      aiplatform.init(project=project, location=location)      model = aiplatform.Model(model_name=model_name)      # The explanation_metadata and explanation_parameters should only be     # provided for a custom trained model and not an AutoML model.     model.deploy(         endpoint=endpoint,         deployed_model_display_name=deployed_model_display_name,         traffic_percentage=traffic_percentage,         traffic_split=traffic_split,         machine_type=machine_type,         min_replica_count=min_replica_count,         max_replica_count=max_replica_count,         accelerator_type=accelerator_type,         accelerator_count=accelerator_count,         explanation_metadata=explanation_metadata,         explanation_parameters=explanation_parameters,         metadata=metadata,         sync=sync,     )      model.wait()      print(model.display_name)     print(model.resource_name)     return model  

Node.js

在尝试此示例之前,请按照《Vertex AI 快速入门:使用客户端库》中的 Node.js 设置说明执行操作。 如需了解详情,请参阅 Vertex AI Node.js API 参考文档

如需向 Vertex AI 进行身份验证,请设置应用默认凭证。 如需了解详情,请参阅为本地开发环境设置身份验证

const automl = require('@google-cloud/automl'); const client = new automl.v1beta1.AutoMlClient();  /**  * Demonstrates using the AutoML client to create a model.  * TODO(developer): Uncomment the following lines before running the sample.  */ // const projectId = '[PROJECT_ID]' e.g., "my-gcloud-project"; // const computeRegion = '[REGION_NAME]' e.g., "us-central1"; // const datasetId = '[DATASET_ID]' e.g., "TBL2246891593778855936"; // const tableId = '[TABLE_ID]' e.g., "1991013247762825216"; // const columnId = '[COLUMN_ID]' e.g., "773141392279994368"; // const modelName = '[MODEL_NAME]' e.g., "testModel"; // const trainBudget = '[TRAIN_BUDGET]' e.g., "1000", // `Train budget in milli node hours`;  // A resource that represents Google Cloud Platform location. const projectLocation = client.locationPath(projectId, computeRegion);  // Get the full path of the column. const columnSpecId = client.columnSpecPath(   projectId,   computeRegion,   datasetId,   tableId,   columnId );  // Set target column to train the model. const targetColumnSpec = {name: columnSpecId};  // Set tables model metadata. const tablesModelMetadata = {   targetColumnSpec: targetColumnSpec,   trainBudgetMilliNodeHours: trainBudget, };  // Set datasetId, model name and model metadata for the dataset. const myModel = {   datasetId: datasetId,   displayName: modelName,   tablesModelMetadata: tablesModelMetadata, };  // Create a model with the model metadata in the region. client   .createModel({parent: projectLocation, model: myModel})   .then(responses => {     const initialApiResponse = responses[1];     console.log(`Training operation name: ${initialApiResponse.name}`);     console.log('Training started...');   })   .catch(err => {     console.error(err);   });

了解如何更改推理日志记录的默认设置

获取操作状态

某些请求会启动需要一些时间才能完成的长时间运行的操作。这些请求会返回操作名称,您可以使用该名称查看操作状态或取消操作。Vertex AI 提供辅助方法来调用长时间运行的操作。如需了解详情,请参阅使用长时间运行的操作

后续步骤