使用 gcloud CLI 或 Vertex AI API 部署模型

如需使用 gcloud CLI 或 Vertex AI API 将模型部署到公共端点，您需要获取现有端点的端点 ID，然后将模型部署到该端点。

获取端点 ID

您需要端点 ID 才能部署模型。

gcloud

以下示例使用 gcloud ai endpoints list 命令：

  gcloud ai endpoints list \       --region=LOCATION_ID \       --filter=display_name=ENDPOINT_NAME

替换以下内容：

LOCATION_ID：您在其中使用 Vertex AI 的区域。
ENDPOINT_NAME：端点的显示名称。

请注意 ENDPOINT_ID 列中显示的数字。请在以下步骤中使用此 ID。

REST

在使用任何请求数据之前，请先进行以下替换：

LOCATION_ID：您在其中使用 Vertex AI 的区域。
PROJECT_ID：您的项目 ID。
ENDPOINT_NAME：端点的显示名称。

HTTP 方法和网址：

GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

执行以下命令：

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

 {   "endpoints": [     {       "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID",       "displayName": "ENDPOINT_NAME",       "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx",       "createTime": "2020-04-17T18:31:11.585169Z",       "updateTime": "2020-04-17T18:35:08.568959Z"     }   ] }

请记下 ENDPOINT_ID。

Python

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python API 参考文档。

替换以下内容：

PROJECT_ID：您的项目 ID。
LOCATION_ID：您在其中使用 Vertex AI 的区域。
ENDPOINT_NAME：端点的显示名称。

from google.cloud import aiplatform  PROJECT_ID = "PROJECT_ID" LOCATION = "LOCATION_ID" ENDPOINT_NAME = "ENDPOINT_NAME"  aiplatform.init(     project=PROJECT_ID,     location=LOCATION, )  endpoint = aiplatform.Endpoint.list( filter='display_name=ENDPOINT_NAME', ) endpoint_id = endpoint.name.split("/")[-1]

部署模型

部署模型时，您需要为已部署的模型指定一个 ID，以便将其与部署到端点的其他模型区分开来。

在下面选择您的语言或环境对应的标签页：

gcloud

以下示例使用 gcloud ai endpoints deploy-model 命令。

以下示例将 Model 部署到 Endpoint，但不使用 GPU 来加快预测服务速度，而且未在多个 DeployedModel 资源之间拆分流量：

在使用下面的命令数据之前，请先进行以下替换：

ENDPOINT_ID：端点的 ID。
LOCATION_ID：您在其中使用 Vertex AI 的区域。
MODEL_ID：要部署的模型的 ID。
DEPLOYED_MODEL_NAME：DeployedModel 的名称。您还可以将 Model 的显示名用于 DeployedModel。
MIN_REPLICA_COUNT：此部署的最小节点数。节点数可根据推理负载的需要而增加或减少，直至达到节点数上限并且绝不会少于此节点数。
MAX_REPLICA_COUNT：此部署的节点数上限。节点数可根据推理负载的需要而增加或减少，直至达到此节点数并且绝不会少于节点数下限。如果您省略 --max-replica-count 标志，则节点数上限将设置为 --min-replica-count 的值。

执行 gcloud ai endpoints deploy-model 命令：

Linux、macOS 或 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\   --region=LOCATION_ID \   --model=MODEL_ID \   --display-name=DEPLOYED_MODEL_NAME \   --min-replica-count=MIN_REPLICA_COUNT \   --max-replica-count=MAX_REPLICA_COUNT \   --traffic-split=0=100

Windows (PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`   --region=LOCATION_ID `   --model=MODEL_ID `   --display-name=DEPLOYED_MODEL_NAME `   --min-replica-count=MIN_REPLICA_COUNT `   --max-replica-count=MAX_REPLICA_COUNT `   --traffic-split=0=100

Windows (cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^   --region=LOCATION_ID ^   --model=MODEL_ID ^   --display-name=DEPLOYED_MODEL_NAME ^   --min-replica-count=MIN_REPLICA_COUNT ^   --max-replica-count=MAX_REPLICA_COUNT ^   --traffic-split=0=100

拆分流量

上述示例中的 --traffic-split=0=100 标志会将 Endpoint 接收的 100% 预测流量发送到新 DeployedModel（使用临时 ID 0 表示）。如果您的 Endpoint 已有其他 DeployedModel 资源，那么您可以在新 DeployedModel 和旧资源之间拆分流量。例如，如需将 20% 的流量发送到新 DeployedModel，将 80% 发送到旧版本，请运行以下命令。

在使用下面的命令数据之前，请先进行以下替换：

OLD_DEPLOYED_MODEL_ID：现有 DeployedModel 的 ID。

执行 gcloud ai endpoints deploy-model 命令：

Linux、macOS 或 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\   --region=LOCATION_ID \   --model=MODEL_ID \   --display-name=DEPLOYED_MODEL_NAME \    --min-replica-count=MIN_REPLICA_COUNT \   --max-replica-count=MAX_REPLICA_COUNT \   --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`   --region=LOCATION_ID `   --model=MODEL_ID `   --display-name=DEPLOYED_MODEL_NAME \    --min-replica-count=MIN_REPLICA_COUNT `   --max-replica-count=MAX_REPLICA_COUNT `   --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^   --region=LOCATION_ID ^   --model=MODEL_ID ^   --display-name=DEPLOYED_MODEL_NAME \    --min-replica-count=MIN_REPLICA_COUNT ^   --max-replica-count=MAX_REPLICA_COUNT ^   --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

REST

部署此模型。

在使用任何请求数据之前，请先进行以下替换：

LOCATION_ID：您在其中使用 Vertex AI 的区域。
PROJECT_ID：您的项目 ID。
ENDPOINT_ID：端点的 ID。
MODEL_ID：要部署的模型的 ID。
DEPLOYED_MODEL_NAME：DeployedModel 的名称。您还可以将 Model 的显示名用于 DeployedModel。
MACHINE_TYPE：可选。用于此部署的每个节点的机器资源。其默认设置为 n1-standard-2。详细了解机器类型。
ACCELERATOR_TYPE：要挂接到机器的加速器类型。如果未指定 ACCELERATOR_COUNT 或为零，则可选。建议不要用于使用非 GPU 映像的 AutoML 模型或自定义训练模型。了解详情。
ACCELERATOR_COUNT：每个副本要使用的加速器数量。可选。对于使用非 GPU 映像的 AutoML 模型或自定义模型，应该为零或未指定。
MIN_REPLICA_COUNT：此部署的最小节点数。节点数可根据推理负载的需要而增加或减少，直至达到节点数上限并且绝不会少于此节点数。此值必须大于或等于 1。
MAX_REPLICA_COUNT：此部署的节点数上限。节点数可根据推理负载的需要而增加或减少，直至达到此节点数并且绝不会少于节点数下限。
REQUIRED_REPLICA_COUNT：可选。此部署被标记为成功所需的节点数。必须大于或等于 1，且小于或等于节点数下限。如果未指定，则默认值为节点数下限值。
TRAFFIC_SPLIT_THIS_MODEL：流向此端点的要路由到使用此操作部署的模型的预测流量百分比。默认值为 100。所有流量百分比之和必须为 100。详细了解流量拆分。
DEPLOYED_MODEL_ID_N：可选。如果将其他模型部署到此端点，您必须更新其流量拆分百分比，以便所有百分比之和等于 100。
TRAFFIC_SPLIT_MODEL_N：已部署模型 ID 密钥的流量拆分百分比值。
PROJECT_NUMBER：自动生成的项目编号

HTTP 方法和网址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel

请求 JSON 正文：

 {   "deployedModel": {     "model": "projects/PROJECT/locations/us-central1/models/MODEL_ID",     "displayName": "DEPLOYED_MODEL_NAME",     "dedicatedResources": {        "machineSpec": {          "machineType": "MACHINE_TYPE",          "acceleratorType": "ACCELERATOR_TYPE",          "acceleratorCount": "ACCELERATOR_COUNT"        },        "minReplicaCount": MIN_REPLICA_COUNT,        "maxReplicaCount": MAX_REPLICA_COUNT,        "requiredReplicaCount": REQUIRED_REPLICA_COUNT      },   },   "trafficSplit": {     "0": TRAFFIC_SPLIT_THIS_MODEL,     "DEPLOYED_MODEL_ID_1": TRAFFIC_SPLIT_MODEL_1,     "DEPLOYED_MODEL_ID_2": TRAFFIC_SPLIT_MODEL_2   }, }

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

 {   "name": "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID/operations/OPERATION_ID",   "metadata": {     "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata",     "genericMetadata": {       "createTime": "2020-10-19T17:53:16.502088Z",       "updateTime": "2020-10-19T17:53:16.502088Z"     }   } }

Java

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Java 设置说明执行操作。如需了解详情，请参阅 Vertex AI Java API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为本地开发环境设置身份验证。

import com.google.api.gax.longrunning.OperationFuture; import com.google.cloud.aiplatform.v1.DedicatedResources; import com.google.cloud.aiplatform.v1.DeployModelOperationMetadata; import com.google.cloud.aiplatform.v1.DeployModelResponse; import com.google.cloud.aiplatform.v1.DeployedModel; import com.google.cloud.aiplatform.v1.EndpointName; import com.google.cloud.aiplatform.v1.EndpointServiceClient; import com.google.cloud.aiplatform.v1.EndpointServiceSettings; import com.google.cloud.aiplatform.v1.MachineSpec; import com.google.cloud.aiplatform.v1.ModelName; import java.io.IOException; import java.util.HashMap; import java.util.Map; import java.util.concurrent.ExecutionException;  public class DeployModelCustomTrainedModelSample {    public static void main(String[] args)       throws IOException, ExecutionException, InterruptedException {     // TODO(developer): Replace these variables before running the sample.     String project = "PROJECT";     String endpointId = "ENDPOINT_ID";     String modelName = "MODEL_NAME";     String deployedModelDisplayName = "DEPLOYED_MODEL_DISPLAY_NAME";     deployModelCustomTrainedModelSample(project, endpointId, modelName, deployedModelDisplayName);   }    static void deployModelCustomTrainedModelSample(       String project, String endpointId, String model, String deployedModelDisplayName)       throws IOException, ExecutionException, InterruptedException {     EndpointServiceSettings settings =         EndpointServiceSettings.newBuilder()             .setEndpoint("us-central1-aiplatform.googleapis.com:443")             .build();     String location = "us-central1";      // Initialize client that will be used to send requests. This client only needs to be created     // once, and can be reused for multiple requests. After completing all of your requests, call     // the "close" method on the client to safely clean up any remaining background resources.     try (EndpointServiceClient client = EndpointServiceClient.create(settings)) {       MachineSpec machineSpec = MachineSpec.newBuilder().setMachineType("n1-standard-2").build();       DedicatedResources dedicatedResources =           DedicatedResources.newBuilder().setMinReplicaCount(1).setMachineSpec(machineSpec).build();        String modelName = ModelName.of(project, location, model).toString();       DeployedModel deployedModel =           DeployedModel.newBuilder()               .setModel(modelName)               .setDisplayName(deployedModelDisplayName)               // `dedicated_resources` must be used for non-AutoML models               .setDedicatedResources(dedicatedResources)               .build();       // key '0' assigns traffic for the newly deployed model       // Traffic percentage values must add up to 100       // Leave dictionary empty if endpoint should not accept any traffic       Map<String, Integer> trafficSplit = new HashMap<>();       trafficSplit.put("0", 100);       EndpointName endpoint = EndpointName.of(project, location, endpointId);       OperationFuture<DeployModelResponse, DeployModelOperationMetadata> response =           client.deployModelAsync(endpoint, deployedModel, trafficSplit);        // You can use OperationFuture.getInitialFuture to get a future representing the initial       // response to the request, which contains information while the operation is in progress.       System.out.format("Operation name: %s\n", response.getInitialFuture().get().getName());        // OperationFuture.get() will block until the operation is finished.       DeployModelResponse deployModelResponse = response.get();       System.out.format("deployModelResponse: %s\n", deployModelResponse);     }   } }

Python

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python API 参考文档。

def deploy_model_with_dedicated_resources_sample(     project,     location,     model_name: str,     machine_type: str,     endpoint: Optional[aiplatform.Endpoint] = None,     deployed_model_display_name: Optional[str] = None,     traffic_percentage: Optional[int] = 0,     traffic_split: Optional[Dict[str, int]] = None,     min_replica_count: int = 1,     max_replica_count: int = 1,     accelerator_type: Optional[str] = None,     accelerator_count: Optional[int] = None,     explanation_metadata: Optional[explain.ExplanationMetadata] = None,     explanation_parameters: Optional[explain.ExplanationParameters] = None,     metadata: Optional[Sequence[Tuple[str, str]]] = (),     sync: bool = True, ):     """     model_name: A fully-qualified model resource name or model ID.           Example: "projects/123/locations/us-central1/models/456" or           "456" when project and location are initialized or passed.     """      aiplatform.init(project=project, location=location)      model = aiplatform.Model(model_name=model_name)      # The explanation_metadata and explanation_parameters should only be     # provided for a custom trained model and not an AutoML model.     model.deploy(         endpoint=endpoint,         deployed_model_display_name=deployed_model_display_name,         traffic_percentage=traffic_percentage,         traffic_split=traffic_split,         machine_type=machine_type,         min_replica_count=min_replica_count,         max_replica_count=max_replica_count,         accelerator_type=accelerator_type,         accelerator_count=accelerator_count,         explanation_metadata=explanation_metadata,         explanation_parameters=explanation_parameters,         metadata=metadata,         sync=sync,     )      model.wait()      print(model.display_name)     print(model.resource_name)     return model

Node.js

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Node.js 设置说明执行操作。如需了解详情，请参阅 Vertex AI Node.js API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭证。如需了解详情，请参阅为本地开发环境设置身份验证。

const automl = require('@google-cloud/automl'); const client = new automl.v1beta1.AutoMlClient();  /**  * Demonstrates using the AutoML client to create a model.  * TODO(developer): Uncomment the following lines before running the sample.  */ // const projectId = '[PROJECT_ID]' e.g., "my-gcloud-project"; // const computeRegion = '[REGION_NAME]' e.g., "us-central1"; // const datasetId = '[DATASET_ID]' e.g., "TBL2246891593778855936"; // const tableId = '[TABLE_ID]' e.g., "1991013247762825216"; // const columnId = '[COLUMN_ID]' e.g., "773141392279994368"; // const modelName = '[MODEL_NAME]' e.g., "testModel"; // const trainBudget = '[TRAIN_BUDGET]' e.g., "1000", // `Train budget in milli node hours`;  // A resource that represents Google Cloud Platform location. const projectLocation = client.locationPath(projectId, computeRegion);  // Get the full path of the column. const columnSpecId = client.columnSpecPath(   projectId,   computeRegion,   datasetId,   tableId,   columnId );  // Set target column to train the model. const targetColumnSpec = {name: columnSpecId};  // Set tables model metadata. const tablesModelMetadata = {   targetColumnSpec: targetColumnSpec,   trainBudgetMilliNodeHours: trainBudget, };  // Set datasetId, model name and model metadata for the dataset. const myModel = {   datasetId: datasetId,   displayName: modelName,   tablesModelMetadata: tablesModelMetadata, };  // Create a model with the model metadata in the region. client   .createModel({parent: projectLocation, model: myModel})   .then(responses => {     const initialApiResponse = responses[1];     console.log(`Training operation name: ${initialApiResponse.name}`);     console.log('Training started...');   })   .catch(err => {     console.error(err);   });

了解如何更改推理日志记录的默认设置。

获取操作状态

某些请求会启动需要一些时间才能完成的长时间运行的操作。这些请求会返回操作名称，您可以使用该名称查看操作状态或取消操作。Vertex AI 提供辅助方法来调用长时间运行的操作。如需了解详情，请参阅使用长时间运行的操作。

后续步骤

了解如何获取在线推理结果。
了解专用端点。

使用 gcloud CLI 或 Vertex AI API 部署模型 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

获取端点 ID

gcloud

REST

curl（Linux、macOS 或 Cloud Shell）

PowerShell (Windows)

Python

部署模型

gcloud

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

拆分流量

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

REST

curl（Linux、macOS 或 Cloud Shell）

PowerShell (Windows)

Java

Python

Node.js

获取操作状态

后续步骤

使用 gcloud CLI 或 Vertex AI API 部署模型