本頁面由 Cloud Translation API 翻譯而成。

使用 gcloud CLI 或 Vertex AI API 部署模型

如要使用 gcloud CLI 或 Vertex AI API 將模型部署至公開端點，您需要取得現有端點的端點 ID，然後將模型部署至該端點。

取得端點 ID

您需要端點 ID 才能部署模型。

gcloud

下列範例使用 gcloud ai endpoints list 指令：

  gcloud ai endpoints list \       --region=LOCATION_ID \       --filter=display_name=ENDPOINT_NAME

更改下列內容：

LOCATION_ID：您使用 Vertex AI 的區域。
ENDPOINT_NAME：端點的顯示名稱。

請記下 ENDPOINT_ID 欄中顯示的數字。請在下一個步驟中使用這個 ID。

REST

使用任何要求資料之前，請先替換以下項目：

LOCATION_ID：您使用 Vertex AI 的區域。
PROJECT_ID：您的專案 ID。
ENDPOINT_NAME：端點的顯示名稱。

HTTP 方法和網址：

GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME

如要傳送要求，請展開以下其中一個選項：

curl (Linux、macOS 或 Cloud Shell)

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或使用 Cloud Shell，自動登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

執行下列指令：

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME"

PowerShell (Windows)

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME" | Select-Object -Expand Content

您應該會收到如下的 JSON 回應：

 {   "endpoints": [     {       "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID",       "displayName": "ENDPOINT_NAME",       "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx",       "createTime": "2020-04-17T18:31:11.585169Z",       "updateTime": "2020-04-17T18:35:08.568959Z"     }   ] }

請注意 ENDPOINT_ID。

Python

如要瞭解如何安裝或更新 Python 適用的 Vertex AI SDK，請參閱「安裝 Python 適用的 Vertex AI SDK」。詳情請參閱 Python API 參考說明文件。

更改下列內容：

PROJECT_ID：您的專案 ID。
LOCATION_ID：您使用 Vertex AI 的區域。
ENDPOINT_NAME：端點的顯示名稱。

from google.cloud import aiplatform  PROJECT_ID = "PROJECT_ID" LOCATION = "LOCATION_ID" ENDPOINT_NAME = "ENDPOINT_NAME"  aiplatform.init(     project=PROJECT_ID,     location=LOCATION, )  endpoint = aiplatform.Endpoint.list( filter='display_name=ENDPOINT_NAME', ) endpoint_id = endpoint.name.split("/")[-1]

部署模型

部署模型時，您會為部署的模型提供 ID，以便與部署至端點的其他模型區分。

選取下方分頁，查看適用於您語言或環境的程式碼：

gcloud

下列範例使用 gcloud ai endpoints deploy-model 指令。

以下範例會將 Model 部署至 Endpoint，但不會使用 GPU 加速預測服務，也不會在多個 DeployedModel 資源之間分配流量：

使用下列任何指令資料之前，請先替換以下項目：

ENDPOINT_ID：端點的 ID。
LOCATION_ID：您使用 Vertex AI 的區域。
MODEL_ID：要部署的模型 ID。
DEPLOYED_MODEL_NAME：DeployedModel 的名稱。您也可以使用 Model 的顯示名稱做為 DeployedModel。
MIN_REPLICA_COUNT：此部署作業的節點數量下限。節點數量可視推論負載需求增加或減少，最多可達節點數量上限，最少則不得低於這個數量。
MAX_REPLICA_COUNT：此部署作業的節點數量上限。節點數量可視推論負載需求增減，最多可達這個節點數量，且絕不會少於節點數量下限。如果省略 --max-replica-count 標記，節點數量上限就會設為 --min-replica-count 的值。

執行 gcloud ai endpoints deploy-model 指令：

Linux、macOS 或 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\   --region=LOCATION_ID \   --model=MODEL_ID \   --display-name=DEPLOYED_MODEL_NAME \   --min-replica-count=MIN_REPLICA_COUNT \   --max-replica-count=MAX_REPLICA_COUNT \   --traffic-split=0=100

Windows (PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`   --region=LOCATION_ID `   --model=MODEL_ID `   --display-name=DEPLOYED_MODEL_NAME `   --min-replica-count=MIN_REPLICA_COUNT `   --max-replica-count=MAX_REPLICA_COUNT `   --traffic-split=0=100

Windows (cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^   --region=LOCATION_ID ^   --model=MODEL_ID ^   --display-name=DEPLOYED_MODEL_NAME ^   --min-replica-count=MIN_REPLICA_COUNT ^   --max-replica-count=MAX_REPLICA_COUNT ^   --traffic-split=0=100

流量分配

在上述範例中，--traffic-split=0=100 標記會將 Endpoint 接收到的 100% 預測流量傳送至新的 DeployedModel，也就是以臨時 ID 0 表示的 DeployedModel。如果 Endpoint 已經有其他DeployedModel資源，則可以在新舊資源之間分配流量。DeployedModel 舉例來說，如要將 20% 的流量傳送至新的 DeployedModel，80% 的流量傳送至舊版，請執行下列指令：

使用下列任何指令資料之前，請先替換以下項目：

OLD_DEPLOYED_MODEL_ID：現有 DeployedModel 的 ID。

執行 gcloud ai endpoints deploy-model 指令：

Linux、macOS 或 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\   --region=LOCATION_ID \   --model=MODEL_ID \   --display-name=DEPLOYED_MODEL_NAME \    --min-replica-count=MIN_REPLICA_COUNT \   --max-replica-count=MAX_REPLICA_COUNT \   --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`   --region=LOCATION_ID `   --model=MODEL_ID `   --display-name=DEPLOYED_MODEL_NAME \    --min-replica-count=MIN_REPLICA_COUNT `   --max-replica-count=MAX_REPLICA_COUNT `   --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^   --region=LOCATION_ID ^   --model=MODEL_ID ^   --display-name=DEPLOYED_MODEL_NAME \    --min-replica-count=MIN_REPLICA_COUNT ^   --max-replica-count=MAX_REPLICA_COUNT ^   --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

REST

部署模型。

使用任何要求資料之前，請先替換以下項目：

LOCATION_ID：您使用 Vertex AI 的區域。
PROJECT_ID：您的專案 ID。
ENDPOINT_ID：端點的 ID。
MODEL_ID：要部署的模型 ID。
DEPLOYED_MODEL_NAME：DeployedModel 的名稱。您也可以使用 Model 的顯示名稱做為 DeployedModel。
MACHINE_TYPE：選用。這個部署作業中每個節點使用的機器資源。預設設定為 n1-standard-2。進一步瞭解機器類型。
ACCELERATOR_TYPE：要附加至機器的加速器類型。如果未指定 ACCELERATOR_COUNT 或指定為零，則為選用。不建議用於 AutoML 模型或使用非 GPU 圖片的自訂訓練模型。瞭解詳情。
ACCELERATOR_COUNT：每個副本要使用的加速器數量。 (選用步驟) 如果是使用非 GPU 圖片的 AutoML 模型或自訂訓練模型，則應為零或未指定。
MIN_REPLICA_COUNT：此部署作業的節點數量下限。節點數量可視推論負載需求增加或減少，最多可達節點數量上限，最少則不得低於這個數量。這個值必須大於或等於 1。
MAX_REPLICA_COUNT：此部署作業的節點數量上限。節點數量可視推論負載需求增減，最多可達這個節點數量，且絕不會少於節點數量下限。
REQUIRED_REPLICA_COUNT：選用。這項部署作業要標示為成功，所需的節點數量。必須大於或等於 1，且小於或等於節點數下限。如未指定，預設值為節點數量下限。
TRAFFIC_SPLIT_THIS_MODEL：要將多少預測流量從這個端點導向透過這項作業部署的模型。預設值為 100。所有流量百分比加總必須為 100%。進一步瞭解流量分配。
DEPLOYED_MODEL_ID_N：選用。如果其他模型部署至這個端點，您必須更新流量分配百分比，讓所有百分比加總為 100%。
TRAFFIC_SPLIT_MODEL_N：已部署模型 ID 鍵的流量分配百分比值。
PROJECT_NUMBER：系統自動為專案產生的專案編號

HTTP 方法和網址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel

JSON 要求主體：

 {   "deployedModel": {     "model": "projects/PROJECT/locations/us-central1/models/MODEL_ID",     "displayName": "DEPLOYED_MODEL_NAME",     "dedicatedResources": {        "machineSpec": {          "machineType": "MACHINE_TYPE",          "acceleratorType": "ACCELERATOR_TYPE",          "acceleratorCount": "ACCELERATOR_COUNT"        },        "minReplicaCount": MIN_REPLICA_COUNT,        "maxReplicaCount": MAX_REPLICA_COUNT,        "requiredReplicaCount": REQUIRED_REPLICA_COUNT      },   },   "trafficSplit": {     "0": TRAFFIC_SPLIT_THIS_MODEL,     "DEPLOYED_MODEL_ID_1": TRAFFIC_SPLIT_MODEL_1,     "DEPLOYED_MODEL_ID_2": TRAFFIC_SPLIT_MODEL_2   }, }

如要傳送要求，請展開以下其中一個選項：

curl (Linux、macOS 或 Cloud Shell)

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"

PowerShell (Windows)

注意： 下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list 查看目前有效的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand Content

您應該會收到如下的 JSON 回應：

 {   "name": "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID/operations/OPERATION_ID",   "metadata": {     "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata",     "genericMetadata": {       "createTime": "2020-10-19T17:53:16.502088Z",       "updateTime": "2020-10-19T17:53:16.502088Z"     }   } }

Java

在試用這個範例之前，請先按照Java使用用戶端程式庫的 Vertex AI 快速入門中的操作說明進行設定。詳情請參閱 Vertex AI Java API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

import com.google.api.gax.longrunning.OperationFuture; import com.google.cloud.aiplatform.v1.DedicatedResources; import com.google.cloud.aiplatform.v1.DeployModelOperationMetadata; import com.google.cloud.aiplatform.v1.DeployModelResponse; import com.google.cloud.aiplatform.v1.DeployedModel; import com.google.cloud.aiplatform.v1.EndpointName; import com.google.cloud.aiplatform.v1.EndpointServiceClient; import com.google.cloud.aiplatform.v1.EndpointServiceSettings; import com.google.cloud.aiplatform.v1.MachineSpec; import com.google.cloud.aiplatform.v1.ModelName; import java.io.IOException; import java.util.HashMap; import java.util.Map; import java.util.concurrent.ExecutionException;  public class DeployModelCustomTrainedModelSample {    public static void main(String[] args)       throws IOException, ExecutionException, InterruptedException {     // TODO(developer): Replace these variables before running the sample.     String project = "PROJECT";     String endpointId = "ENDPOINT_ID";     String modelName = "MODEL_NAME";     String deployedModelDisplayName = "DEPLOYED_MODEL_DISPLAY_NAME";     deployModelCustomTrainedModelSample(project, endpointId, modelName, deployedModelDisplayName);   }    static void deployModelCustomTrainedModelSample(       String project, String endpointId, String model, String deployedModelDisplayName)       throws IOException, ExecutionException, InterruptedException {     EndpointServiceSettings settings =         EndpointServiceSettings.newBuilder()             .setEndpoint("us-central1-aiplatform.googleapis.com:443")             .build();     String location = "us-central1";      // Initialize client that will be used to send requests. This client only needs to be created     // once, and can be reused for multiple requests. After completing all of your requests, call     // the "close" method on the client to safely clean up any remaining background resources.     try (EndpointServiceClient client = EndpointServiceClient.create(settings)) {       MachineSpec machineSpec = MachineSpec.newBuilder().setMachineType("n1-standard-2").build();       DedicatedResources dedicatedResources =           DedicatedResources.newBuilder().setMinReplicaCount(1).setMachineSpec(machineSpec).build();        String modelName = ModelName.of(project, location, model).toString();       DeployedModel deployedModel =           DeployedModel.newBuilder()               .setModel(modelName)               .setDisplayName(deployedModelDisplayName)               // `dedicated_resources` must be used for non-AutoML models               .setDedicatedResources(dedicatedResources)               .build();       // key '0' assigns traffic for the newly deployed model       // Traffic percentage values must add up to 100       // Leave dictionary empty if endpoint should not accept any traffic       Map<String, Integer> trafficSplit = new HashMap<>();       trafficSplit.put("0", 100);       EndpointName endpoint = EndpointName.of(project, location, endpointId);       OperationFuture<DeployModelResponse, DeployModelOperationMetadata> response =           client.deployModelAsync(endpoint, deployedModel, trafficSplit);        // You can use OperationFuture.getInitialFuture to get a future representing the initial       // response to the request, which contains information while the operation is in progress.       System.out.format("Operation name: %s\n", response.getInitialFuture().get().getName());        // OperationFuture.get() will block until the operation is finished.       DeployModelResponse deployModelResponse = response.get();       System.out.format("deployModelResponse: %s\n", deployModelResponse);     }   } }

Python

如要瞭解如何安裝或更新 Python 適用的 Vertex AI SDK，請參閱「安裝 Python 適用的 Vertex AI SDK」。詳情請參閱 Python API 參考說明文件。

def deploy_model_with_dedicated_resources_sample(     project,     location,     model_name: str,     machine_type: str,     endpoint: Optional[aiplatform.Endpoint] = None,     deployed_model_display_name: Optional[str] = None,     traffic_percentage: Optional[int] = 0,     traffic_split: Optional[Dict[str, int]] = None,     min_replica_count: int = 1,     max_replica_count: int = 1,     accelerator_type: Optional[str] = None,     accelerator_count: Optional[int] = None,     explanation_metadata: Optional[explain.ExplanationMetadata] = None,     explanation_parameters: Optional[explain.ExplanationParameters] = None,     metadata: Optional[Sequence[Tuple[str, str]]] = (),     sync: bool = True, ):     """     model_name: A fully-qualified model resource name or model ID.           Example: "projects/123/locations/us-central1/models/456" or           "456" when project and location are initialized or passed.     """      aiplatform.init(project=project, location=location)      model = aiplatform.Model(model_name=model_name)      # The explanation_metadata and explanation_parameters should only be     # provided for a custom trained model and not an AutoML model.     model.deploy(         endpoint=endpoint,         deployed_model_display_name=deployed_model_display_name,         traffic_percentage=traffic_percentage,         traffic_split=traffic_split,         machine_type=machine_type,         min_replica_count=min_replica_count,         max_replica_count=max_replica_count,         accelerator_type=accelerator_type,         accelerator_count=accelerator_count,         explanation_metadata=explanation_metadata,         explanation_parameters=explanation_parameters,         metadata=metadata,         sync=sync,     )      model.wait()      print(model.display_name)     print(model.resource_name)     return model

Node.js

在試用這個範例之前，請先按照Node.js使用用戶端程式庫的 Vertex AI 快速入門中的操作說明進行設定。詳情請參閱 Vertex AI Node.js API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

const automl = require('@google-cloud/automl'); const client = new automl.v1beta1.AutoMlClient();  /**  * Demonstrates using the AutoML client to create a model.  * TODO(developer): Uncomment the following lines before running the sample.  */ // const projectId = '[PROJECT_ID]' e.g., "my-gcloud-project"; // const computeRegion = '[REGION_NAME]' e.g., "us-central1"; // const datasetId = '[DATASET_ID]' e.g., "TBL2246891593778855936"; // const tableId = '[TABLE_ID]' e.g., "1991013247762825216"; // const columnId = '[COLUMN_ID]' e.g., "773141392279994368"; // const modelName = '[MODEL_NAME]' e.g., "testModel"; // const trainBudget = '[TRAIN_BUDGET]' e.g., "1000", // `Train budget in milli node hours`;  // A resource that represents Google Cloud Platform location. const projectLocation = client.locationPath(projectId, computeRegion);  // Get the full path of the column. const columnSpecId = client.columnSpecPath(   projectId,   computeRegion,   datasetId,   tableId,   columnId );  // Set target column to train the model. const targetColumnSpec = {name: columnSpecId};  // Set tables model metadata. const tablesModelMetadata = {   targetColumnSpec: targetColumnSpec,   trainBudgetMilliNodeHours: trainBudget, };  // Set datasetId, model name and model metadata for the dataset. const myModel = {   datasetId: datasetId,   displayName: modelName,   tablesModelMetadata: tablesModelMetadata, };  // Create a model with the model metadata in the region. client   .createModel({parent: projectLocation, model: myModel})   .then(responses => {     const initialApiResponse = responses[1];     console.log(`Training operation name: ${initialApiResponse.name}`);     console.log('Training started...');   })   .catch(err => {     console.error(err);   });

瞭解如何變更推論記錄的預設設定。

取得作業狀態

部分要求會啟動長時間執行的作業，需要一段時間才能完成。這些要求會傳回作業名稱，您可以使用該名稱查看作業狀態或取消作業。Vertex AI 提供輔助方法，可對長時間執行的作業發出呼叫。詳情請參閱「處理長時間執行作業」。

後續步驟

瞭解如何取得線上推論結果。
瞭解私人端點。