训练图片分类模型

本页面介绍了如何使用 Google Cloud 控制台或 Vertex AI API 根据图片数据集训练 AutoML 分类模型。

训练 AutoML 模型

Google Cloud 控制台

  1. 在 Google Cloud 控制台的 Vertex AI 部分中,前往数据集页面。

    转到“数据集”页面

  2. 点击要用于训练模型的数据集的名称,以打开其详情页面。

  3. 点击训练新模型

  4. 训练方法选择 AutoML

  5. 点击继续

  6. 输入模型的名称。

  7. 如果您要手动设置训练数据的拆分方式,请展开高级选项,然后选择数据拆分选项。了解详情

  8. 点击开始训练

    模型训练可能需要几个小时,具体取决于数据的大小和复杂性,以及训练预算(如果指定)。您可以关闭此标签页,稍后再返回。模型完成训练后,您会收到电子邮件。

API

在下面选择您的目标对应的标签页:

分类

在下面选择您的语言或环境对应的标签页:

REST

在使用任何请求数据之前,请先进行以下替换:

  • LOCATION:数据集所在且模型在其中创建的区域。例如 us-central1
  • PROJECT:您的项目 ID
  • TRAININGPIPELINE_DISPLAYNAME:必填。trainingPipeline 的显示名称。
  • DATASET_ID:用于训练的数据集的 ID 编号。
  • fractionSplit:可选。数据的多个可能的机器学习用途拆分选项之一。对于 fractionSplit,值的总和必须为 1。例如:
    • {"trainingFraction": "0.7","validationFraction": "0.15","testFraction": "0.15"}
  • MODEL_DISPLAYNAME*:TrainingPipeline 上传(创建)的模型的显示名称。
  • MODEL_DESCRIPTION*:模型的说明。
  • modelToUpload.labels*:用于组织模型的任何键值对。例如:
    • "env": "prod"
    • "tier": "backend"
  • MODELTYPE:要训练的云托管模型的类型。选项包括:
    • CLOUD(默认)
  • NODE_HOUR_BUDGET:实际训练费用将等于或小于此值。对于 Cloud 模型,预算必须为:8,000 - 800,000 毫节点时(含边界值)。默认值为 192,000,代表实际用时一天(假设使用 8 个节点)。
  • PROJECT_NUMBER:自动生成的项目编号

HTTP 方法和网址:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines

请求 JSON 正文:

 {   "displayName": "TRAININGPIPELINE_DISPLAYNAME",   "inputDataConfig": {     "datasetId": "DATASET_ID",     "fractionSplit": {       "trainingFraction": "DECIMAL",       "validationFraction": "DECIMAL",       "testFraction": "DECIMAL"     }   },   "modelToUpload": {     "displayName": "MODEL_DISPLAYNAME",     "description": "MODEL_DESCRIPTION",     "labels": {       "KEY": "VALUE"     }   },   "trainingTaskDefinition": "gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_image_classification_1.0.0.yaml",   "trainingTaskInputs": {     "multiLabel": "false",     "modelType": ["MODELTYPE"],     "budgetMilliNodeHours": NODE_HOUR_BUDGET   } } 

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines"

PowerShell

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines" | Select-Object -Expand Content

响应包含有关规范的信息以及 TRAININGPIPELINE_ID

Java

在尝试此示例之前,请按照《Vertex AI 快速入门:使用客户端库》中的 Java 设置说明执行操作。 如需了解详情,请参阅 Vertex AI Java API 参考文档

如需向 Vertex AI 进行身份验证,请设置应用默认凭证。 如需了解详情,请参阅为本地开发环境设置身份验证

import com.google.cloud.aiplatform.util.ValueConverter; import com.google.cloud.aiplatform.v1.DeployedModelRef; import com.google.cloud.aiplatform.v1.EnvVar; import com.google.cloud.aiplatform.v1.FilterSplit; import com.google.cloud.aiplatform.v1.FractionSplit; import com.google.cloud.aiplatform.v1.InputDataConfig; import com.google.cloud.aiplatform.v1.LocationName; import com.google.cloud.aiplatform.v1.Model; import com.google.cloud.aiplatform.v1.Model.ExportFormat; import com.google.cloud.aiplatform.v1.ModelContainerSpec; import com.google.cloud.aiplatform.v1.PipelineServiceClient; import com.google.cloud.aiplatform.v1.PipelineServiceSettings; import com.google.cloud.aiplatform.v1.Port; import com.google.cloud.aiplatform.v1.PredefinedSplit; import com.google.cloud.aiplatform.v1.PredictSchemata; import com.google.cloud.aiplatform.v1.TimestampSplit; import com.google.cloud.aiplatform.v1.TrainingPipeline; import com.google.cloud.aiplatform.v1.schema.trainingjob.definition.AutoMlImageClassificationInputs; import com.google.cloud.aiplatform.v1.schema.trainingjob.definition.AutoMlImageClassificationInputs.ModelType; import com.google.rpc.Status; import java.io.IOException;  public class CreateTrainingPipelineImageClassificationSample {    public static void main(String[] args) throws IOException {     // TODO(developer): Replace these variables before running the sample.     String trainingPipelineDisplayName = "YOUR_TRAINING_PIPELINE_DISPLAY_NAME";     String project = "YOUR_PROJECT_ID";     String datasetId = "YOUR_DATASET_ID";     String modelDisplayName = "YOUR_MODEL_DISPLAY_NAME";     createTrainingPipelineImageClassificationSample(         project, trainingPipelineDisplayName, datasetId, modelDisplayName);   }    static void createTrainingPipelineImageClassificationSample(       String project, String trainingPipelineDisplayName, String datasetId, String modelDisplayName)       throws IOException {     PipelineServiceSettings pipelineServiceSettings =         PipelineServiceSettings.newBuilder()             .setEndpoint("us-central1-aiplatform.googleapis.com:443")             .build();      // Initialize client that will be used to send requests. This client only needs to be created     // once, and can be reused for multiple requests. After completing all of your requests, call     // the "close" method on the client to safely clean up any remaining background resources.     try (PipelineServiceClient pipelineServiceClient =         PipelineServiceClient.create(pipelineServiceSettings)) {       String location = "us-central1";       String trainingTaskDefinition =           "gs://google-cloud-aiplatform/schema/trainingjob/definition/"               + "automl_image_classification_1.0.0.yaml";       LocationName locationName = LocationName.of(project, location);        AutoMlImageClassificationInputs autoMlImageClassificationInputs =           AutoMlImageClassificationInputs.newBuilder()               .setModelType(ModelType.CLOUD)               .setMultiLabel(false)               .setBudgetMilliNodeHours(8000)               .setDisableEarlyStopping(false)               .build();        InputDataConfig trainingInputDataConfig =           InputDataConfig.newBuilder().setDatasetId(datasetId).build();       Model model = Model.newBuilder().setDisplayName(modelDisplayName).build();       TrainingPipeline trainingPipeline =           TrainingPipeline.newBuilder()               .setDisplayName(trainingPipelineDisplayName)               .setTrainingTaskDefinition(trainingTaskDefinition)               .setTrainingTaskInputs(ValueConverter.toValue(autoMlImageClassificationInputs))               .setInputDataConfig(trainingInputDataConfig)               .setModelToUpload(model)               .build();        TrainingPipeline trainingPipelineResponse =           pipelineServiceClient.createTrainingPipeline(locationName, trainingPipeline);        System.out.println("Create Training Pipeline Image Classification Response");       System.out.format("Name: %s\n", trainingPipelineResponse.getName());       System.out.format("Display Name: %s\n", trainingPipelineResponse.getDisplayName());        System.out.format(           "Training Task Definition %s\n", trainingPipelineResponse.getTrainingTaskDefinition());       System.out.format(           "Training Task Inputs: %s\n", trainingPipelineResponse.getTrainingTaskInputs());       System.out.format(           "Training Task Metadata: %s\n", trainingPipelineResponse.getTrainingTaskMetadata());       System.out.format("State: %s\n", trainingPipelineResponse.getState());        System.out.format("Create Time: %s\n", trainingPipelineResponse.getCreateTime());       System.out.format("StartTime %s\n", trainingPipelineResponse.getStartTime());       System.out.format("End Time: %s\n", trainingPipelineResponse.getEndTime());       System.out.format("Update Time: %s\n", trainingPipelineResponse.getUpdateTime());       System.out.format("Labels: %s\n", trainingPipelineResponse.getLabelsMap());        InputDataConfig inputDataConfig = trainingPipelineResponse.getInputDataConfig();       System.out.println("Input Data Config");       System.out.format("Dataset Id: %s", inputDataConfig.getDatasetId());       System.out.format("Annotations Filter: %s\n", inputDataConfig.getAnnotationsFilter());        FractionSplit fractionSplit = inputDataConfig.getFractionSplit();       System.out.println("Fraction Split");       System.out.format("Training Fraction: %s\n", fractionSplit.getTrainingFraction());       System.out.format("Validation Fraction: %s\n", fractionSplit.getValidationFraction());       System.out.format("Test Fraction: %s\n", fractionSplit.getTestFraction());        FilterSplit filterSplit = inputDataConfig.getFilterSplit();       System.out.println("Filter Split");       System.out.format("Training Filter: %s\n", filterSplit.getTrainingFilter());       System.out.format("Validation Filter: %s\n", filterSplit.getValidationFilter());       System.out.format("Test Filter: %s\n", filterSplit.getTestFilter());        PredefinedSplit predefinedSplit = inputDataConfig.getPredefinedSplit();       System.out.println("Predefined Split");       System.out.format("Key: %s\n", predefinedSplit.getKey());        TimestampSplit timestampSplit = inputDataConfig.getTimestampSplit();       System.out.println("Timestamp Split");       System.out.format("Training Fraction: %s\n", timestampSplit.getTrainingFraction());       System.out.format("Validation Fraction: %s\n", timestampSplit.getValidationFraction());       System.out.format("Test Fraction: %s\n", timestampSplit.getTestFraction());       System.out.format("Key: %s\n", timestampSplit.getKey());        Model modelResponse = trainingPipelineResponse.getModelToUpload();       System.out.println("Model To Upload");       System.out.format("Name: %s\n", modelResponse.getName());       System.out.format("Display Name: %s\n", modelResponse.getDisplayName());       System.out.format("Description: %s\n", modelResponse.getDescription());        System.out.format("Metadata Schema Uri: %s\n", modelResponse.getMetadataSchemaUri());       System.out.format("Metadata: %s\n", modelResponse.getMetadata());       System.out.format("Training Pipeline: %s\n", modelResponse.getTrainingPipeline());       System.out.format("Artifact Uri: %s\n", modelResponse.getArtifactUri());        System.out.format(           "Supported Deployment Resources Types: %s\n",           modelResponse.getSupportedDeploymentResourcesTypesList());       System.out.format(           "Supported Input Storage Formats: %s\n",           modelResponse.getSupportedInputStorageFormatsList());       System.out.format(           "Supported Output Storage Formats: %s\n",           modelResponse.getSupportedOutputStorageFormatsList());        System.out.format("Create Time: %s\n", modelResponse.getCreateTime());       System.out.format("Update Time: %s\n", modelResponse.getUpdateTime());       System.out.format("Labels: %sn\n", modelResponse.getLabelsMap());        PredictSchemata predictSchemata = modelResponse.getPredictSchemata();       System.out.println("Predict Schemata");       System.out.format("Instance Schema Uri: %s\n", predictSchemata.getInstanceSchemaUri());       System.out.format("Parameters Schema Uri: %s\n", predictSchemata.getParametersSchemaUri());       System.out.format("Prediction Schema Uri: %s\n", predictSchemata.getPredictionSchemaUri());        for (ExportFormat exportFormat : modelResponse.getSupportedExportFormatsList()) {         System.out.println("Supported Export Format");         System.out.format("Id: %s\n", exportFormat.getId());       }        ModelContainerSpec modelContainerSpec = modelResponse.getContainerSpec();       System.out.println("Container Spec");       System.out.format("Image Uri: %s\n", modelContainerSpec.getImageUri());       System.out.format("Command: %s\n", modelContainerSpec.getCommandList());       System.out.format("Args: %s\n", modelContainerSpec.getArgsList());       System.out.format("Predict Route: %s\n", modelContainerSpec.getPredictRoute());       System.out.format("Health Route: %s\n", modelContainerSpec.getHealthRoute());        for (EnvVar envVar : modelContainerSpec.getEnvList()) {         System.out.println("Env");         System.out.format("Name: %s\n", envVar.getName());         System.out.format("Value: %s\n", envVar.getValue());       }        for (Port port : modelContainerSpec.getPortsList()) {         System.out.println("Port");         System.out.format("Container Port: %s\n", port.getContainerPort());       }        for (DeployedModelRef deployedModelRef : modelResponse.getDeployedModelsList()) {         System.out.println("Deployed Model");         System.out.format("Endpoint: %s\n", deployedModelRef.getEndpoint());         System.out.format("Deployed Model Id: %s\n", deployedModelRef.getDeployedModelId());       }        Status status = trainingPipelineResponse.getError();       System.out.println("Error");       System.out.format("Code: %s\n", status.getCode());       System.out.format("Message: %s\n", status.getMessage());     }   } }

Node.js

在尝试此示例之前,请按照《Vertex AI 快速入门:使用客户端库》中的 Node.js 设置说明执行操作。 如需了解详情,请参阅 Vertex AI Node.js API 参考文档

如需向 Vertex AI 进行身份验证,请设置应用默认凭证。 如需了解详情,请参阅为本地开发环境设置身份验证

/**  * TODO(developer): Uncomment these variables before running the sample.  * (Not necessary if passing values as arguments)  */ /* const datasetId = 'YOUR DATASET'; const modelDisplayName = 'NEW MODEL NAME; const trainingPipelineDisplayName = 'NAME FOR TRAINING PIPELINE'; const project = 'YOUR PROJECT ID'; const location = 'us-central1';   */ // Imports the Google Cloud Pipeline Service Client library const aiplatform = require('@google-cloud/aiplatform');  const {definition} =   aiplatform.protos.google.cloud.aiplatform.v1.schema.trainingjob; const ModelType = definition.AutoMlImageClassificationInputs.ModelType;  // Specifies the location of the api endpoint const clientOptions = {   apiEndpoint: 'us-central1-aiplatform.googleapis.com', };  // Instantiates a client const {PipelineServiceClient} = aiplatform.v1; const pipelineServiceClient = new PipelineServiceClient(clientOptions);  async function createTrainingPipelineImageClassification() {   // Configure the parent resource   const parent = `projects/${project}/locations/${location}`;    // Values should match the input expected by your model.   const trainingTaskInputsMessage =     new definition.AutoMlImageClassificationInputs({       multiLabel: true,       modelType: ModelType.CLOUD,       budgetMilliNodeHours: 8000,       disableEarlyStopping: false,     });    const trainingTaskInputs = trainingTaskInputsMessage.toValue();    const trainingTaskDefinition =     'gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_image_classification_1.0.0.yaml';    const modelToUpload = {displayName: modelDisplayName};   const inputDataConfig = {datasetId};   const trainingPipeline = {     displayName: trainingPipelineDisplayName,     trainingTaskDefinition,     trainingTaskInputs,     inputDataConfig,     modelToUpload,   };   const request = {parent, trainingPipeline};    // Create training pipeline request   const [response] =     await pipelineServiceClient.createTrainingPipeline(request);    console.log('Create training pipeline image classification response');   console.log(`Name : ${response.name}`);   console.log('Raw response:');   console.log(JSON.stringify(response, null, 2)); }  createTrainingPipelineImageClassification();

Python

如需了解如何安装或更新 Vertex AI SDK for Python,请参阅安装 Vertex AI SDK for Python。 如需了解详情,请参阅 Python API 参考文档

def create_training_pipeline_image_classification_sample(     project: str,     location: str,     display_name: str,     dataset_id: str,     model_display_name: Optional[str] = None,     model_type: str = "CLOUD",     multi_label: bool = False,     training_fraction_split: float = 0.8,     validation_fraction_split: float = 0.1,     test_fraction_split: float = 0.1,     budget_milli_node_hours: int = 8000,     disable_early_stopping: bool = False,     sync: bool = True, ):     aiplatform.init(project=project, location=location)      job = aiplatform.AutoMLImageTrainingJob(         display_name=display_name,         model_type=model_type,         prediction_type="classification",         multi_label=multi_label,     )      my_image_ds = aiplatform.ImageDataset(dataset_id)      model = job.run(         dataset=my_image_ds,         model_display_name=model_display_name,         training_fraction_split=training_fraction_split,         validation_fraction_split=validation_fraction_split,         test_fraction_split=test_fraction_split,         budget_milli_node_hours=budget_milli_node_hours,         disable_early_stopping=disable_early_stopping,         sync=sync,     )      model.wait()      print(model.display_name)     print(model.resource_name)     print(model.uri)     return model  

分类

在下面选择您的语言或环境对应的标签页:

REST

在使用任何请求数据之前,请先进行以下替换:

  • LOCATION:数据集所在且模型在其中创建的区域。例如 us-central1
  • PROJECT:。
  • TRAININGPIPELINE_DISPLAYNAME:必填。trainingPipeline 的显示名称。
  • DATASET_ID:用于训练的数据集的 ID 编号。
  • fractionSplit:可选。数据的多个可能的机器学习用途拆分选项之一。对于 fractionSplit,值的总和必须为 1。例如:
    • {"trainingFraction": "0.7","validationFraction": "0.15","testFraction": "0.15"}
  • MODEL_DISPLAYNAME*:TrainingPipeline 上传(创建)的模型的显示名称。
  • MODEL_DESCRIPTION*:模型的说明。
  • modelToUpload.labels*:用于组织模型的任何键值对。例如:
    • "env": "prod"
    • "tier": "backend"
  • MODELTYPE:要训练的云托管模型的类型。选项包括:
    • CLOUD(默认)
  • NODE_HOUR_BUDGET:实际训练费用将等于或小于此值。对于 Cloud 模型,预算必须为:8,000 - 800,000 毫节点时(含边界值)。默认值为 192,000,代表实际用时一天(假设使用 8 个节点)。
  • PROJECT_NUMBER:自动生成的项目编号

HTTP 方法和网址:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines

请求 JSON 正文:

 {   "displayName": "TRAININGPIPELINE_DISPLAYNAME",   "inputDataConfig": {     "datasetId": "DATASET_ID",     "fractionSplit": {       "trainingFraction": "DECIMAL",       "validationFraction": "DECIMAL",       "testFraction": "DECIMAL"     }   },   "modelToUpload": {     "displayName": "MODEL_DISPLAYNAME",     "description": "MODEL_DESCRIPTION",     "labels": {       "KEY": "VALUE"     }   },   "trainingTaskDefinition": "gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_image_classification_1.0.0.yaml",   "trainingTaskInputs": {     "multiLabel": "true",     "modelType": ["MODELTYPE"],     "budgetMilliNodeHours": NODE_HOUR_BUDGET   } } 

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines"

PowerShell

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines" | Select-Object -Expand Content

响应包含有关规范的信息以及 TRAININGPIPELINE_ID

Java

在尝试此示例之前,请按照《Vertex AI 快速入门:使用客户端库》中的 Java 设置说明执行操作。 如需了解详情,请参阅 Vertex AI Java API 参考文档

如需向 Vertex AI 进行身份验证,请设置应用默认凭证。 如需了解详情,请参阅为本地开发环境设置身份验证

import com.google.cloud.aiplatform.util.ValueConverter; import com.google.cloud.aiplatform.v1.DeployedModelRef; import com.google.cloud.aiplatform.v1.EnvVar; import com.google.cloud.aiplatform.v1.FilterSplit; import com.google.cloud.aiplatform.v1.FractionSplit; import com.google.cloud.aiplatform.v1.InputDataConfig; import com.google.cloud.aiplatform.v1.LocationName; import com.google.cloud.aiplatform.v1.Model; import com.google.cloud.aiplatform.v1.Model.ExportFormat; import com.google.cloud.aiplatform.v1.ModelContainerSpec; import com.google.cloud.aiplatform.v1.PipelineServiceClient; import com.google.cloud.aiplatform.v1.PipelineServiceSettings; import com.google.cloud.aiplatform.v1.Port; import com.google.cloud.aiplatform.v1.PredefinedSplit; import com.google.cloud.aiplatform.v1.PredictSchemata; import com.google.cloud.aiplatform.v1.TimestampSplit; import com.google.cloud.aiplatform.v1.TrainingPipeline; import com.google.cloud.aiplatform.v1.schema.trainingjob.definition.AutoMlImageClassificationInputs; import com.google.cloud.aiplatform.v1.schema.trainingjob.definition.AutoMlImageClassificationInputs.ModelType; import com.google.rpc.Status; import java.io.IOException;  public class CreateTrainingPipelineImageClassificationSample {    public static void main(String[] args) throws IOException {     // TODO(developer): Replace these variables before running the sample.     String trainingPipelineDisplayName = "YOUR_TRAINING_PIPELINE_DISPLAY_NAME";     String project = "YOUR_PROJECT_ID";     String datasetId = "YOUR_DATASET_ID";     String modelDisplayName = "YOUR_MODEL_DISPLAY_NAME";     createTrainingPipelineImageClassificationSample(         project, trainingPipelineDisplayName, datasetId, modelDisplayName);   }    static void createTrainingPipelineImageClassificationSample(       String project, String trainingPipelineDisplayName, String datasetId, String modelDisplayName)       throws IOException {     PipelineServiceSettings pipelineServiceSettings =         PipelineServiceSettings.newBuilder()             .setEndpoint("us-central1-aiplatform.googleapis.com:443")             .build();      // Initialize client that will be used to send requests. This client only needs to be created     // once, and can be reused for multiple requests. After completing all of your requests, call     // the "close" method on the client to safely clean up any remaining background resources.     try (PipelineServiceClient pipelineServiceClient =         PipelineServiceClient.create(pipelineServiceSettings)) {       String location = "us-central1";       String trainingTaskDefinition =           "gs://google-cloud-aiplatform/schema/trainingjob/definition/"               + "automl_image_classification_1.0.0.yaml";       LocationName locationName = LocationName.of(project, location);        AutoMlImageClassificationInputs autoMlImageClassificationInputs =           AutoMlImageClassificationInputs.newBuilder()               .setModelType(ModelType.CLOUD)               .setMultiLabel(false)               .setBudgetMilliNodeHours(8000)               .setDisableEarlyStopping(false)               .build();        InputDataConfig trainingInputDataConfig =           InputDataConfig.newBuilder().setDatasetId(datasetId).build();       Model model = Model.newBuilder().setDisplayName(modelDisplayName).build();       TrainingPipeline trainingPipeline =           TrainingPipeline.newBuilder()               .setDisplayName(trainingPipelineDisplayName)               .setTrainingTaskDefinition(trainingTaskDefinition)               .setTrainingTaskInputs(ValueConverter.toValue(autoMlImageClassificationInputs))               .setInputDataConfig(trainingInputDataConfig)               .setModelToUpload(model)               .build();        TrainingPipeline trainingPipelineResponse =           pipelineServiceClient.createTrainingPipeline(locationName, trainingPipeline);        System.out.println("Create Training Pipeline Image Classification Response");       System.out.format("Name: %s\n", trainingPipelineResponse.getName());       System.out.format("Display Name: %s\n", trainingPipelineResponse.getDisplayName());        System.out.format(           "Training Task Definition %s\n", trainingPipelineResponse.getTrainingTaskDefinition());       System.out.format(           "Training Task Inputs: %s\n", trainingPipelineResponse.getTrainingTaskInputs());       System.out.format(           "Training Task Metadata: %s\n", trainingPipelineResponse.getTrainingTaskMetadata());       System.out.format("State: %s\n", trainingPipelineResponse.getState());        System.out.format("Create Time: %s\n", trainingPipelineResponse.getCreateTime());       System.out.format("StartTime %s\n", trainingPipelineResponse.getStartTime());       System.out.format("End Time: %s\n", trainingPipelineResponse.getEndTime());       System.out.format("Update Time: %s\n", trainingPipelineResponse.getUpdateTime());       System.out.format("Labels: %s\n", trainingPipelineResponse.getLabelsMap());        InputDataConfig inputDataConfig = trainingPipelineResponse.getInputDataConfig();       System.out.println("Input Data Config");       System.out.format("Dataset Id: %s", inputDataConfig.getDatasetId());       System.out.format("Annotations Filter: %s\n", inputDataConfig.getAnnotationsFilter());        FractionSplit fractionSplit = inputDataConfig.getFractionSplit();       System.out.println("Fraction Split");       System.out.format("Training Fraction: %s\n", fractionSplit.getTrainingFraction());       System.out.format("Validation Fraction: %s\n", fractionSplit.getValidationFraction());       System.out.format("Test Fraction: %s\n", fractionSplit.getTestFraction());        FilterSplit filterSplit = inputDataConfig.getFilterSplit();       System.out.println("Filter Split");       System.out.format("Training Filter: %s\n", filterSplit.getTrainingFilter());       System.out.format("Validation Filter: %s\n", filterSplit.getValidationFilter());       System.out.format("Test Filter: %s\n", filterSplit.getTestFilter());        PredefinedSplit predefinedSplit = inputDataConfig.getPredefinedSplit();       System.out.println("Predefined Split");       System.out.format("Key: %s\n", predefinedSplit.getKey());        TimestampSplit timestampSplit = inputDataConfig.getTimestampSplit();       System.out.println("Timestamp Split");       System.out.format("Training Fraction: %s\n", timestampSplit.getTrainingFraction());       System.out.format("Validation Fraction: %s\n", timestampSplit.getValidationFraction());       System.out.format("Test Fraction: %s\n", timestampSplit.getTestFraction());       System.out.format("Key: %s\n", timestampSplit.getKey());        Model modelResponse = trainingPipelineResponse.getModelToUpload();       System.out.println("Model To Upload");       System.out.format("Name: %s\n", modelResponse.getName());       System.out.format("Display Name: %s\n", modelResponse.getDisplayName());       System.out.format("Description: %s\n", modelResponse.getDescription());        System.out.format("Metadata Schema Uri: %s\n", modelResponse.getMetadataSchemaUri());       System.out.format("Metadata: %s\n", modelResponse.getMetadata());       System.out.format("Training Pipeline: %s\n", modelResponse.getTrainingPipeline());       System.out.format("Artifact Uri: %s\n", modelResponse.getArtifactUri());        System.out.format(           "Supported Deployment Resources Types: %s\n",           modelResponse.getSupportedDeploymentResourcesTypesList());       System.out.format(           "Supported Input Storage Formats: %s\n",           modelResponse.getSupportedInputStorageFormatsList());       System.out.format(           "Supported Output Storage Formats: %s\n",           modelResponse.getSupportedOutputStorageFormatsList());        System.out.format("Create Time: %s\n", modelResponse.getCreateTime());       System.out.format("Update Time: %s\n", modelResponse.getUpdateTime());       System.out.format("Labels: %sn\n", modelResponse.getLabelsMap());        PredictSchemata predictSchemata = modelResponse.getPredictSchemata();       System.out.println("Predict Schemata");       System.out.format("Instance Schema Uri: %s\n", predictSchemata.getInstanceSchemaUri());       System.out.format("Parameters Schema Uri: %s\n", predictSchemata.getParametersSchemaUri());       System.out.format("Prediction Schema Uri: %s\n", predictSchemata.getPredictionSchemaUri());        for (ExportFormat exportFormat : modelResponse.getSupportedExportFormatsList()) {         System.out.println("Supported Export Format");         System.out.format("Id: %s\n", exportFormat.getId());       }        ModelContainerSpec modelContainerSpec = modelResponse.getContainerSpec();       System.out.println("Container Spec");       System.out.format("Image Uri: %s\n", modelContainerSpec.getImageUri());       System.out.format("Command: %s\n", modelContainerSpec.getCommandList());       System.out.format("Args: %s\n", modelContainerSpec.getArgsList());       System.out.format("Predict Route: %s\n", modelContainerSpec.getPredictRoute());       System.out.format("Health Route: %s\n", modelContainerSpec.getHealthRoute());        for (EnvVar envVar : modelContainerSpec.getEnvList()) {         System.out.println("Env");         System.out.format("Name: %s\n", envVar.getName());         System.out.format("Value: %s\n", envVar.getValue());       }        for (Port port : modelContainerSpec.getPortsList()) {         System.out.println("Port");         System.out.format("Container Port: %s\n", port.getContainerPort());       }        for (DeployedModelRef deployedModelRef : modelResponse.getDeployedModelsList()) {         System.out.println("Deployed Model");         System.out.format("Endpoint: %s\n", deployedModelRef.getEndpoint());         System.out.format("Deployed Model Id: %s\n", deployedModelRef.getDeployedModelId());       }        Status status = trainingPipelineResponse.getError();       System.out.println("Error");       System.out.format("Code: %s\n", status.getCode());       System.out.format("Message: %s\n", status.getMessage());     }   } }

Node.js

在尝试此示例之前,请按照《Vertex AI 快速入门:使用客户端库》中的 Node.js 设置说明执行操作。 如需了解详情,请参阅 Vertex AI Node.js API 参考文档

如需向 Vertex AI 进行身份验证,请设置应用默认凭证。 如需了解详情,请参阅为本地开发环境设置身份验证

/**  * TODO(developer): Uncomment these variables before running the sample.  * (Not necessary if passing values as arguments)  */ /* const datasetId = 'YOUR DATASET'; const modelDisplayName = 'NEW MODEL NAME; const trainingPipelineDisplayName = 'NAME FOR TRAINING PIPELINE'; const project = 'YOUR PROJECT ID'; const location = 'us-central1';   */ // Imports the Google Cloud Pipeline Service Client library const aiplatform = require('@google-cloud/aiplatform');  const {definition} =   aiplatform.protos.google.cloud.aiplatform.v1.schema.trainingjob; const ModelType = definition.AutoMlImageClassificationInputs.ModelType;  // Specifies the location of the api endpoint const clientOptions = {   apiEndpoint: 'us-central1-aiplatform.googleapis.com', };  // Instantiates a client const {PipelineServiceClient} = aiplatform.v1; const pipelineServiceClient = new PipelineServiceClient(clientOptions);  async function createTrainingPipelineImageClassification() {   // Configure the parent resource   const parent = `projects/${project}/locations/${location}`;    // Values should match the input expected by your model.   const trainingTaskInputsMessage =     new definition.AutoMlImageClassificationInputs({       multiLabel: true,       modelType: ModelType.CLOUD,       budgetMilliNodeHours: 8000,       disableEarlyStopping: false,     });    const trainingTaskInputs = trainingTaskInputsMessage.toValue();    const trainingTaskDefinition =     'gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_image_classification_1.0.0.yaml';    const modelToUpload = {displayName: modelDisplayName};   const inputDataConfig = {datasetId};   const trainingPipeline = {     displayName: trainingPipelineDisplayName,     trainingTaskDefinition,     trainingTaskInputs,     inputDataConfig,     modelToUpload,   };   const request = {parent, trainingPipeline};    // Create training pipeline request   const [response] =     await pipelineServiceClient.createTrainingPipeline(request);    console.log('Create training pipeline image classification response');   console.log(`Name : ${response.name}`);   console.log('Raw response:');   console.log(JSON.stringify(response, null, 2)); }  createTrainingPipelineImageClassification();

Python

如需了解如何安装或更新 Vertex AI SDK for Python,请参阅安装 Vertex AI SDK for Python。 如需了解详情,请参阅 Python API 参考文档

def create_training_pipeline_image_classification_sample(     project: str,     location: str,     display_name: str,     dataset_id: str,     model_display_name: Optional[str] = None,     model_type: str = "CLOUD",     multi_label: bool = False,     training_fraction_split: float = 0.8,     validation_fraction_split: float = 0.1,     test_fraction_split: float = 0.1,     budget_milli_node_hours: int = 8000,     disable_early_stopping: bool = False,     sync: bool = True, ):     aiplatform.init(project=project, location=location)      job = aiplatform.AutoMLImageTrainingJob(         display_name=display_name,         model_type=model_type,         prediction_type="classification",         multi_label=multi_label,     )      my_image_ds = aiplatform.ImageDataset(dataset_id)      model = job.run(         dataset=my_image_ds,         model_display_name=model_display_name,         training_fraction_split=training_fraction_split,         validation_fraction_split=validation_fraction_split,         test_fraction_split=test_fraction_split,         budget_milli_node_hours=budget_milli_node_hours,         disable_early_stopping=disable_early_stopping,         sync=sync,     )      model.wait()      print(model.display_name)     print(model.resource_name)     print(model.uri)     return model  

使用 REST 控制数据拆分

您可以控制在训练集、验证集和测试集之间拆分训练数据的方式。使用 Vertex AI API 时,请使用 Split 对象来确定数据拆分。Split 对象可以包含在 InputConfig 对象中作为多种对象类型中的一种,其中每种类型都提供一种不同的训练数据拆分方式。您只能选择一种方法。

  • FractionSplit:
    • TRAINING_FRACTION:要用于训练集的训练数据的比例。
    • VALIDATION_FRACTION:要用于验证集的训练数据的比例。不用于视频数据。
    • TEST_FRACTION:要用于测试集的训练数据的比例。

    如果指定了任一比例,则必须指定所有比例。这些比例之和必须等于 1.0。比例的默认值会因数据类型而异。了解详情

     "fractionSplit": {   "trainingFraction": TRAINING_FRACTION,   "validationFraction": VALIDATION_FRACTION,   "testFraction": TEST_FRACTION }, 
  • FilterSplit
    • TRAINING_FILTER:与此过滤条件匹配的数据项用于训练集。
    • VALIDATION_FILTER:与此过滤条件匹配的数据项用于验证集。对于视频数据,该值必须为“-”。
    • TEST_FILTER:与此过滤条件匹配的数据项用于测试集。

    这些过滤条件可以与 ml_use 标签或应用于数据的任何标签结合使用。详细了解如何使用 ml-use 标签其他标签来过滤数据。

    以下示例展示了如何将 filterSplit 对象与 ml_use 标签结合使用,其中包含验证集:

     "filterSplit": { "trainingFilter": "labels.aiplatform.googleapis.com/ml_use=training", "validationFilter": "labels.aiplatform.googleapis.com/ml_use=validation", "testFilter": "labels.aiplatform.googleapis.com/ml_use=test" }