All Gemini 1.0 and Gemini 1.5 models are now retired.
To avoid service disruption, update to a newer model (for example, gemini-2.5-flash-lite). Learn more.

Trang này được dịch bởi Cloud Translation API.

Tạo và chỉnh sửa hình ảnh bằng Gemini (còn gọi là "nano banana")

Bạn có thể yêu cầu một mô hình Gemini tạo và chỉnh sửa hình ảnh bằng cả câu lệnh chỉ có văn bản và câu lệnh có cả văn bản và hình ảnh. Khi sử dụng Firebase AI Logic, bạn có thể đưa ra yêu cầu này ngay từ ứng dụng của mình.

Với tính năng này, bạn có thể làm những việc như:

Tạo hình ảnh theo cách lặp đi lặp lại thông qua cuộc trò chuyện bằng ngôn ngữ tự nhiên, điều chỉnh hình ảnh trong khi vẫn duy trì tính nhất quán và ngữ cảnh.
Tạo hình ảnh có văn bản chất lượng cao, kể cả các chuỗi văn bản dài.
Tạo đầu ra là văn bản và hình ảnh xen kẽ. Ví dụ: một bài đăng trên blog có văn bản và hình ảnh trong một lượt duy nhất. Trước đây, việc này đòi hỏi bạn phải kết hợp nhiều mô hình.
Tạo hình ảnh bằng kiến thức về thế giới và khả năng suy luận của Gemini.

Bạn có thể xem danh sách đầy đủ các phương thức và chức năng được hỗ trợ (cùng với các câu lệnh mẫu) ở phần sau của trang này.

Lưu ý quan trọng: Khi sử dụng mô hình Gemini để tạo hình ảnh, mô hình này không thể chỉ trả về hình ảnh mà luôn trả về cả văn bản và hình ảnh.
Ngoài ra, hãy lưu ý rằng bạn phải thêm responseModalities: ["TEXT", "IMAGE"] vào cấu hình mô hình.

Chuyển đến mã cho văn bản thành hình ảnh Chuyển đến mã cho văn bản và hình ảnh xen kẽ

Chuyển đến mã để chỉnh sửa hình ảnh Chuyển đến mã để chỉnh sửa hình ảnh lặp đi lặp lại

Xem các hướng dẫn khác để biết thêm các lựa chọn để xử lý hình ảnh
Phân tích hình ảnh Phân tích hình ảnh trên thiết bị Tạo đầu ra có cấu trúc

Lựa chọn giữa các mô hình Gemini và Imagen

Các SDK Firebase AI Logic hỗ trợ việc tạo và chỉnh sửa hình ảnh bằng mô hình Gemini hoặc mô hình Imagen.

Đối với hầu hết các trường hợp sử dụng, hãy bắt đầu bằng Gemini, sau đó chỉ chọn Imagen cho các tác vụ chuyên biệt mà chất lượng hình ảnh là yếu tố quan trọng.

Chọn Gemini khi bạn muốn:

Để sử dụng kiến thức và khả năng suy luận về thế giới nhằm tạo ra những hình ảnh phù hợp với bối cảnh.
Để kết hợp liền mạch văn bản và hình ảnh hoặc để xen kẽ văn bản và hình ảnh đầu ra.
Để nhúng hình ảnh chính xác vào các chuỗi văn bản dài.
Để chỉnh sửa hình ảnh theo cách đàm thoại trong khi vẫn duy trì ngữ cảnh.

Chọn Imagen khi bạn muốn:

Để ưu tiên chất lượng hình ảnh, độ chân thực, chi tiết nghệ thuật hoặc phong cách cụ thể (ví dụ: trường phái ấn tượng hoặc anime).
Để truyền tải thương hiệu, phong cách hoặc tạo biểu trưng và thiết kế sản phẩm.
Để chỉ định rõ tỷ lệ khung hình hoặc định dạng của hình ảnh được tạo.

Trước khi bắt đầu

Nhấp vào nhà cung cấp Gemini API để xem nội dung và mã dành riêng cho nhà cung cấp trên trang này.

Nếu bạn chưa thực hiện, hãy hoàn tất hướng dẫn bắt đầu sử dụng. Hướng dẫn này mô tả cách thiết lập dự án Firebase, kết nối ứng dụng với Firebase, thêm SDK, khởi chạy dịch vụ phụ trợ cho nhà cung cấp Gemini API mà bạn chọn và tạo một phiên bản GenerativeModel.

Để kiểm thử và lặp lại lời nhắc, bạn nên sử dụng Google AI Studio.

Các mẫu hỗ trợ tính năng này

gemini-2.5-flash-image (còn gọi là "nano banana").

Xin lưu ý rằng các SDK này cũng hỗ trợ tạo hình ảnh bằng các mô hình Imagen.

Tạo và chỉnh sửa hình ảnh

Bạn có thể tạo và chỉnh sửa hình ảnh bằng mô hình Gemini.

Tạo hình ảnh (chỉ nhập văn bản)

Trước khi dùng thử mẫu này, hãy hoàn tất phần Trước khi bắt đầu của hướng dẫn này để thiết lập dự án và ứng dụng của bạn.
Trong phần đó, bạn cũng sẽ nhấp vào một nút cho nhà cung cấp Gemini API mà bạn chọn để xem nội dung dành riêng cho nhà cung cấp trên trang này.

Bạn có thể yêu cầu mô hình Gemini tạo hình ảnh bằng cách đưa ra câu lệnh bằng văn bản.

Hãy nhớ tạo một thực thể GenerativeModel, thêm responseModalities: ["TEXT", "IMAGE"] vào cấu hình mô hình và gọi generateContent.

Swift

 import FirebaseAILogic  // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel(   modelName: "gemini-2.5-flash-image",   // Configure the model to respond with text and images (required)   generationConfig: GenerationConfig(responseModalities: [.text, .image]) )  // Provide a text prompt instructing the model to generate an image let prompt = "Generate an image of the Eiffel tower with fireworks in the background."  // To generate an image, call `generateContent` with the text input let response = try await model.generateContent(prompt)  // Handle the generated image guard let inlineDataPart = response.inlineDataParts.first else {   fatalError("No image data in response.") } guard let uiImage = UIImage(data: inlineDataPart.data) else {   fatalError("Failed to convert data to UIImage.") }

Kotlin

 // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(     modelName = "gemini-2.5-flash-image",     // Configure the model to respond with text and images (required)     generationConfig = generationConfig { responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) } )  // Provide a text prompt instructing the model to generate an image val prompt = "Generate an image of the Eiffel tower with fireworks in the background."  // To generate image output, call `generateContent` with the text input val generatedImageAsBitmap = model.generateContent(prompt)     // Handle the generated image     .candidates.first().content.parts.filterIsInstance<ImagePart>().firstOrNull()?.image

Java

 // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel(     "gemini-2.5-flash-image",     // Configure the model to respond with text and images (required)     new GenerationConfig.Builder()         .setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE))         .build() );  GenerativeModelFutures model = GenerativeModelFutures.from(ai);  // Provide a text prompt instructing the model to generate an image Content prompt = new Content.Builder()         .addText("Generate an image of the Eiffel Tower with fireworks in the background.")         .build();  // To generate an image, call `generateContent` with the text input ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt); Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {     @Override     public void onSuccess(GenerateContentResponse result) {          // iterate over all the parts in the first candidate in the result object         for (Part part : result.getCandidates().get(0).getContent().getParts()) {             if (part instanceof ImagePart) {                 ImagePart imagePart = (ImagePart) part;                 // The returned image as a bitmap                 Bitmap generatedImageAsBitmap = imagePart.getImage();                 break;             }         }     }      @Override     public void onFailure(Throwable t) {         t.printStackTrace();     } }, executor);

Web

 import { initializeApp } from "firebase/app"; import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";  // TODO(developer) Replace the following with your app's Firebase configuration // See: https://firebase.google.com/docs/web/learn-more#config-object const firebaseConfig = {   // ... };  // Initialize FirebaseApp const firebaseApp = initializeApp(firebaseConfig);  // Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });  // Create a `GenerativeModel` instance with a model that supports your use case const model = getGenerativeModel(ai, {   model: "gemini-2.5-flash-image",   // Configure the model to respond with text and images (required)   generationConfig: {     responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE],   }, });  // Provide a text prompt instructing the model to generate an image const prompt = 'Generate an image of the Eiffel Tower with fireworks in the background.';  // To generate an image, call `generateContent` with the text input const result = model.generateContent(prompt);  // Handle the generated image try {   const inlineDataParts = result.response.inlineDataParts();   if (inlineDataParts?.[0]) {     const image = inlineDataParts[0].inlineData;     console.log(image.mimeType, image.data);   } } catch (err) {   console.error('Prompt or candidate was blocked:', err); }

Dart

 import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart';  await Firebase.initializeApp(   options: DefaultFirebaseOptions.currentPlatform, );  // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output final model = FirebaseAI.googleAI().generativeModel(   model: 'gemini-2.5-flash-image',   // Configure the model to respond with text and images (required)   generationConfig: GenerationConfig(responseModalities: [ResponseModalities.text, ResponseModalities.image]), );  // Provide a text prompt instructing the model to generate an image final prompt = [Content.text('Generate an image of the Eiffel Tower with fireworks in the background.')];  // To generate an image, call `generateContent` with the text input final response = await model.generateContent(prompt); if (response.inlineDataParts.isNotEmpty) {   final imageBytes = response.inlineDataParts[0].bytes;   // Process the image } else {   // Handle the case where no images were generated   print('Error: No images were generated.'); }

Unity

 using Firebase; using Firebase.AI;  // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel(   modelName: "gemini-2.5-flash-image",   // Configure the model to respond with text and images (required)   generationConfig: new GenerationConfig(     responseModalities: new[] { ResponseModality.Text, ResponseModality.Image }) );  // Provide a text prompt instructing the model to generate an image var prompt = "Generate an image of the Eiffel Tower with fireworks in the background.";  // To generate an image, call `GenerateContentAsync` with the text input var response = await model.GenerateContentAsync(prompt);  var text = response.Text; if (!string.IsNullOrWhiteSpace(text)) {   // Do something with the text }  // Handle the generated image var imageParts = response.Candidates.First().Content.Parts                          .OfType<ModelContent.InlineDataPart>()                          .Where(part => part.MimeType == "image/png"); foreach (var imagePart in imageParts) {   // Load the Image into a Unity Texture2D object   UnityEngine.Texture2D texture2D = new(2, 2);   if (texture2D.LoadImage(imagePart.Data.ToArray())) {     // Do something with the image   } }

Tạo hình ảnh và văn bản xen kẽ

Bạn có thể yêu cầu mô hình Gemini tạo hình ảnh xen kẽ với câu trả lời bằng văn bản. Ví dụ: bạn có thể tạo hình ảnh về hình dạng của từng bước trong một công thức được tạo cùng với hướng dẫn của bước đó mà không cần đưa ra các yêu cầu riêng cho mô hình hoặc các mô hình khác nhau.

Hãy nhớ tạo một thực thể GenerativeModel, thêm responseModalities: ["TEXT", "IMAGE"] vào cấu hình mô hình và gọi generateContent.

Swift

 import FirebaseAILogic  // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel(   modelName: "gemini-2.5-flash-image",   // Configure the model to respond with text and images (required)   generationConfig: GenerationConfig(responseModalities: [.text, .image]) )  // Provide a text prompt instructing the model to generate interleaved text and images let prompt = """ Generate an illustrated recipe for a paella. Create images to go alongside the text as you generate the recipe """  // To generate interleaved text and images, call `generateContent` with the text input let response = try await model.generateContent(prompt)  // Handle the generated text and image guard let candidate = response.candidates.first else {   fatalError("No candidates in response.") } for part in candidate.content.parts {   switch part {   case let textPart as TextPart:     // Do something with the generated text     let text = textPart.text   case let inlineDataPart as InlineDataPart:     // Do something with the generated image     guard let uiImage = UIImage(data: inlineDataPart.data) else {       fatalError("Failed to convert data to UIImage.")     }   default:     fatalError("Unsupported part type: \(part)")   } }

Kotlin

 // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(     modelName = "gemini-2.5-flash-image",     // Configure the model to respond with text and images (required)     generationConfig = generationConfig { responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) } )  // Provide a text prompt instructing the model to generate interleaved text and images val prompt = """     Generate an illustrated recipe for a paella.     Create images to go alongside the text as you generate the recipe     """.trimIndent()  // To generate interleaved text and images, call `generateContent` with the text input val responseContent = model.generateContent(prompt).candidates.first().content  // The response will contain image and text parts interleaved for (part in responseContent.parts) {     when (part) {         is ImagePart -> {             // ImagePart as a bitmap             val generatedImageAsBitmap: Bitmap? = part.asImageOrNull()         }         is TextPart -> {             // Text content from the TextPart             val text = part.text         }     } }

Java

 // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel(     "gemini-2.5-flash-image",     // Configure the model to respond with text and images (required)     new GenerationConfig.Builder()         .setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE))         .build() );  GenerativeModelFutures model = GenerativeModelFutures.from(ai);  // Provide a text prompt instructing the model to generate interleaved text and images Content prompt = new Content.Builder()         .addText("Generate an illustrated recipe for a paella.\n" +                  "Create images to go alongside the text as you generate the recipe")         .build();  // To generate interleaved text and images, call `generateContent` with the text input ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt); Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {     @Override     public void onSuccess(GenerateContentResponse result) {         Content responseContent = result.getCandidates().get(0).getContent();         // The response will contain image and text parts interleaved         for (Part part : responseContent.getParts()) {             if (part instanceof ImagePart) {                 // ImagePart as a bitmap                 Bitmap generatedImageAsBitmap = ((ImagePart) part).getImage();             } else if (part instanceof TextPart){                 // Text content from the TextPart                 String text = ((TextPart) part).getText();             }         }     }      @Override     public void onFailure(Throwable t) {         System.err.println(t);     } }, executor);

Web

 import { initializeApp } from "firebase/app"; import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";  // TODO(developer) Replace the following with your app's Firebase configuration // See: https://firebase.google.com/docs/web/learn-more#config-object const firebaseConfig = {   // ... };  // Initialize FirebaseApp const firebaseApp = initializeApp(firebaseConfig);  // Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });  // Create a `GenerativeModel` instance with a model that supports your use case const model = getGenerativeModel(ai, {   model: "gemini-2.5-flash-image",   // Configure the model to respond with text and images (required)   generationConfig: {     responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE],   }, });  // Provide a text prompt instructing the model to generate interleaved text and images const prompt = 'Generate an illustrated recipe for a paella.\n.' +   'Create images to go alongside the text as you generate the recipe';  // To generate interleaved text and images, call `generateContent` with the text input const result = await model.generateContent(prompt);  // Handle the generated text and image try {   const response = result.response;   if (response.candidates?.[0].content?.parts) {     for (const part of response.candidates?.[0].content?.parts) {       if (part.text) {         // Do something with the text         console.log(part.text)       }       if (part.inlineData) {         // Do something with the image         const image = part.inlineData;         console.log(image.mimeType, image.data);       }     }   }  } catch (err) {   console.error('Prompt or candidate was blocked:', err); }

Dart

 import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart';  await Firebase.initializeApp(   options: DefaultFirebaseOptions.currentPlatform, );  // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output final model = FirebaseAI.googleAI().generativeModel(   model: 'gemini-2.5-flash-image',   // Configure the model to respond with text and images (required)   generationConfig: GenerationConfig(responseModalities: [ResponseModalities.text, ResponseModalities.image]), );  // Provide a text prompt instructing the model to generate interleaved text and images final prompt = [Content.text(   'Generate an illustrated recipe for a paella\n ' +   'Create images to go alongside the text as you generate the recipe' )];  // To generate interleaved text and images, call `generateContent` with the text input final response = await model.generateContent(prompt);  // Handle the generated text and image final parts = response.candidates.firstOrNull?.content.parts if (parts.isNotEmpty) {   for (final part in parts) {     if (part is TextPart) {       // Do something with text part       final text = part.text     }     if (part is InlineDataPart) {       // Process image       final imageBytes = part.bytes     }   } } else {   // Handle the case where no images were generated   print('Error: No images were generated.'); }

Unity

 using Firebase; using Firebase.AI;  // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel(   modelName: "gemini-2.5-flash-image",   // Configure the model to respond with text and images (required)   generationConfig: new GenerationConfig(     responseModalities: new[] { ResponseModality.Text, ResponseModality.Image }) );  // Provide a text prompt instructing the model to generate interleaved text and images var prompt = "Generate an illustrated recipe for a paella \n" +   "Create images to go alongside the text as you generate the recipe";  // To generate interleaved text and images, call `GenerateContentAsync` with the text input var response = await model.GenerateContentAsync(prompt);  // Handle the generated text and image foreach (var part in response.Candidates.First().Content.Parts) {   if (part is ModelContent.TextPart textPart) {     if (!string.IsNullOrWhiteSpace(textPart.Text)) {       // Do something with the text     }   } else if (part is ModelContent.InlineDataPart dataPart) {     if (dataPart.MimeType == "image/png") {       // Load the Image into a Unity Texture2D object       UnityEngine.Texture2D texture2D = new(2, 2);       if (texture2D.LoadImage(dataPart.Data.ToArray())) {         // Do something with the image       }     }   } }

Chỉnh sửa hình ảnh (đầu vào là văn bản và hình ảnh)

Bạn có thể yêu cầu mô hình Gemini chỉnh sửa hình ảnh bằng cách đưa ra câu lệnh bằng văn bản và một hoặc nhiều hình ảnh.

Hãy nhớ tạo một thực thể GenerativeModel, thêm responseModalities: ["TEXT", "IMAGE"] vào cấu hình mô hình và gọi generateContent.

Swift

 import FirebaseAILogic  // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel(   modelName: "gemini-2.5-flash-image",   // Configure the model to respond with text and images (required)   generationConfig: GenerationConfig(responseModalities: [.text, .image]) )  // Provide an image for the model to edit guard let image = UIImage(named: "scones") else { fatalError("Image file not found.") }  // Provide a text prompt instructing the model to edit the image let prompt = "Edit this image to make it look like a cartoon"  // To edit the image, call `generateContent` with the image and text input let response = try await model.generateContent(image, prompt)  // Handle the generated image guard let inlineDataPart = response.inlineDataParts.first else {   fatalError("No image data in response.") } guard let uiImage = UIImage(data: inlineDataPart.data) else {   fatalError("Failed to convert data to UIImage.") }

Kotlin

 // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(     modelName = "gemini-2.5-flash-image",     // Configure the model to respond with text and images (required)     generationConfig = generationConfig { responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) } )  // Provide an image for the model to edit val bitmap = BitmapFactory.decodeResource(context.resources, R.drawable.scones)  // Provide a text prompt instructing the model to edit the image val prompt = content {     image(bitmap)     text("Edit this image to make it look like a cartoon") }  // To edit the image, call `generateContent` with the prompt (image and text input) val generatedImageAsBitmap = model.generateContent(prompt)     // Handle the generated text and image     .candidates.first().content.parts.filterIsInstance<ImagePart>().firstOrNull()?.image

Java

 // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel(     "gemini-2.5-flash-image",     // Configure the model to respond with text and images (required)     new GenerationConfig.Builder()         .setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE))         .build() );  GenerativeModelFutures model = GenerativeModelFutures.from(ai);  // Provide an image for the model to edit Bitmap bitmap = BitmapFactory.decodeResource(resources, R.drawable.scones);  // Provide a text prompt instructing the model to edit the image Content promptcontent = new Content.Builder()         .addImage(bitmap)         .addText("Edit this image to make it look like a cartoon")         .build();  // To edit the image, call `generateContent` with the prompt (image and text input) ListenableFuture<GenerateContentResponse> response = model.generateContent(promptcontent); Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {     @Override     public void onSuccess(GenerateContentResponse result) {         // iterate over all the parts in the first candidate in the result object         for (Part part : result.getCandidates().get(0).getContent().getParts()) {             if (part instanceof ImagePart) {                 ImagePart imagePart = (ImagePart) part;                 Bitmap generatedImageAsBitmap = imagePart.getImage();                 break;             }         }     }      @Override     public void onFailure(Throwable t) {         t.printStackTrace();     } }, executor);

Web

 import { initializeApp } from "firebase/app"; import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";  // TODO(developer) Replace the following with your app's Firebase configuration // See: https://firebase.google.com/docs/web/learn-more#config-object const firebaseConfig = {   // ... };  // Initialize FirebaseApp const firebaseApp = initializeApp(firebaseConfig);  // Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });  // Create a `GenerativeModel` instance with a model that supports your use case const model = getGenerativeModel(ai, {   model: "gemini-2.5-flash-image",   // Configure the model to respond with text and images (required)   generationConfig: {     responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE],   }, });  // Prepare an image for the model to edit async function fileToGenerativePart(file) {   const base64EncodedDataPromise = new Promise((resolve) => {     const reader = new FileReader();     reader.onloadend = () => resolve(reader.result.split(',')[1]);     reader.readAsDataURL(file);   });   return {     inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },   }; }  // Provide a text prompt instructing the model to edit the image const prompt = "Edit this image to make it look like a cartoon";  const fileInputEl = document.querySelector("input[type=file]"); const imagePart = await fileToGenerativePart(fileInputEl.files[0]);  // To edit the image, call `generateContent` with the image and text input const result = await model.generateContent([prompt, imagePart]);  // Handle the generated image try {   const inlineDataParts = result.response.inlineDataParts();   if (inlineDataParts?.[0]) {     const image = inlineDataParts[0].inlineData;     console.log(image.mimeType, image.data);   } } catch (err) {   console.error('Prompt or candidate was blocked:', err); }

Dart

 import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart';  await Firebase.initializeApp(   options: DefaultFirebaseOptions.currentPlatform, );  // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output final model = FirebaseAI.googleAI().generativeModel(   model: 'gemini-2.5-flash-image',   // Configure the model to respond with text and images (required)   generationConfig: GenerationConfig(responseModalities: [ResponseModalities.text, ResponseModalities.image]), );  // Prepare an image for the model to edit final image = await File('scones.jpg').readAsBytes(); final imagePart = InlineDataPart('image/jpeg', image);  // Provide a text prompt instructing the model to edit the image final prompt = TextPart("Edit this image to make it look like a cartoon");  // To edit the image, call `generateContent` with the image and text input final response = await model.generateContent([   Content.multi([prompt,imagePart]) ]);  // Handle the generated image if (response.inlineDataParts.isNotEmpty) {   final imageBytes = response.inlineDataParts[0].bytes;   // Process the image } else {   // Handle the case where no images were generated   print('Error: No images were generated.'); }

Unity

 using Firebase; using Firebase.AI;  // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel(   modelName: "gemini-2.5-flash-image",   // Configure the model to respond with text and images (required)   generationConfig: new GenerationConfig(     responseModalities: new[] { ResponseModality.Text, ResponseModality.Image }) );  // Prepare an image for the model to edit var imageFile = System.IO.File.ReadAllBytes(System.IO.Path.Combine(   UnityEngine.Application.streamingAssetsPath, "scones.jpg")); var image = ModelContent.InlineData("image/jpeg", imageFile);  // Provide a text prompt instructing the model to edit the image var prompt = ModelContent.Text("Edit this image to make it look like a cartoon.");  // To edit the image, call `GenerateContent` with the image and text input var response = await model.GenerateContentAsync(new [] { prompt, image });  var text = response.Text; if (!string.IsNullOrWhiteSpace(text)) {   // Do something with the text }  // Handle the generated image var imageParts = response.Candidates.First().Content.Parts                          .OfType<ModelContent.InlineDataPart>()                          .Where(part => part.MimeType == "image/png"); foreach (var imagePart in imageParts) {   // Load the Image into a Unity Texture2D object   Texture2D texture2D = new Texture2D(2, 2);   if (texture2D.LoadImage(imagePart.Data.ToArray())) {     // Do something with the image   } }

Lặp lại và chỉnh sửa hình ảnh bằng tính năng trò chuyện nhiều lượt

Khi sử dụng tính năng trò chuyện nhiều lượt, bạn có thể lặp lại với mô hình Gemini trên những hình ảnh mà mô hình này tạo ra hoặc do bạn cung cấp.

Hãy nhớ tạo một thực thể GenerativeModel, thêm responseModalities: ["TEXT", "IMAGE"] vào cấu hình mô hình của bạn và gọi startChat() cũng như sendMessage() để gửi thông báo cho người dùng mới.

Swift

 import FirebaseAILogic  // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel(   modelName: "gemini-2.5-flash-image",   // Configure the model to respond with text and images (required)   generationConfig: GenerationConfig(responseModalities: [.text, .image]) )  // Initialize the chat let chat = model.startChat()  guard let image = UIImage(named: "scones") else { fatalError("Image file not found.") }  // Provide an initial text prompt instructing the model to edit the image let prompt = "Edit this image to make it look like a cartoon"  // To generate an initial response, send a user message with the image and text prompt let response = try await chat.sendMessage(image, prompt)  // Inspect the generated image guard let inlineDataPart = response.inlineDataParts.first else {   fatalError("No image data in response.") } guard let uiImage = UIImage(data: inlineDataPart.data) else {   fatalError("Failed to convert data to UIImage.") }  // Follow up requests do not need to specify the image again let followUpResponse = try await chat.sendMessage("But make it old-school line drawing style")  // Inspect the edited image after the follow up request guard let followUpInlineDataPart = followUpResponse.inlineDataParts.first else {   fatalError("No image data in response.") } guard let followUpUIImage = UIImage(data: followUpInlineDataPart.data) else {   fatalError("Failed to convert data to UIImage.") }

Kotlin

 // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(     modelName = "gemini-2.5-flash-image",     // Configure the model to respond with text and images (required)     generationConfig = generationConfig { responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) } )  // Provide an image for the model to edit val bitmap = BitmapFactory.decodeResource(context.resources, R.drawable.scones)  // Create the initial prompt instructing the model to edit the image val prompt = content {     image(bitmap)     text("Edit this image to make it look like a cartoon") }  // Initialize the chat val chat = model.startChat()  // To generate an initial response, send a user message with the image and text prompt var response = chat.sendMessage(prompt) // Inspect the returned image var generatedImageAsBitmap = response     .candidates.first().content.parts.filterIsInstance<ImagePart>().firstOrNull()?.image  // Follow up requests do not need to specify the image again response = chat.sendMessage("But make it old-school line drawing style") generatedImageAsBitmap = response     .candidates.first().content.parts.filterIsInstance<ImagePart>().firstOrNull()?.image

Java

 // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel(     "gemini-2.5-flash-image",     // Configure the model to respond with text and images (required)     new GenerationConfig.Builder()         .setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE))         .build() );  GenerativeModelFutures model = GenerativeModelFutures.from(ai);  // Provide an image for the model to edit Bitmap bitmap = BitmapFactory.decodeResource(resources, R.drawable.scones);  // Initialize the chat ChatFutures chat = model.startChat();  // Create the initial prompt instructing the model to edit the image Content prompt = new Content.Builder()         .setRole("user")         .addImage(bitmap)         .addText("Edit this image to make it look like a cartoon")         .build();  // To generate an initial response, send a user message with the image and text prompt ListenableFuture<GenerateContentResponse> response = chat.sendMessage(prompt); // Extract the image from the initial response ListenableFuture<@Nullable Bitmap> initialRequest = Futures.transform(response, result -> {     for (Part part : result.getCandidates().get(0).getContent().getParts()) {         if (part instanceof ImagePart) {             ImagePart imagePart = (ImagePart) part;             return imagePart.getImage();         }     }     return null; }, executor);  // Follow up requests do not need to specify the image again ListenableFuture<GenerateContentResponse> modelResponseFuture = Futures.transformAsync(         initialRequest,         generatedImage -> {             Content followUpPrompt = new Content.Builder()                     .addText("But make it old-school line drawing style")                     .build();             return chat.sendMessage(followUpPrompt);         },         executor);  // Add a final callback to check the reworked image Futures.addCallback(modelResponseFuture, new FutureCallback<GenerateContentResponse>() {     @Override     public void onSuccess(GenerateContentResponse result) {         for (Part part : result.getCandidates().get(0).getContent().getParts()) {             if (part instanceof ImagePart) {                 ImagePart imagePart = (ImagePart) part;                 Bitmap generatedImageAsBitmap = imagePart.getImage();                 break;             }         }     }      @Override     public void onFailure(Throwable t) {         t.printStackTrace();     } }, executor);

Web

 import { initializeApp } from "firebase/app"; import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";  // TODO(developer) Replace the following with your app's Firebase configuration // See: https://firebase.google.com/docs/web/learn-more#config-object const firebaseConfig = {   // ... };  // Initialize FirebaseApp const firebaseApp = initializeApp(firebaseConfig);  // Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });  // Create a `GenerativeModel` instance with a model that supports your use case const model = getGenerativeModel(ai, {   model: "gemini-2.5-flash-image",   // Configure the model to respond with text and images (required)   generationConfig: {     responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE],   }, });  // Prepare an image for the model to edit async function fileToGenerativePart(file) {   const base64EncodedDataPromise = new Promise((resolve) => {     const reader = new FileReader();     reader.onloadend = () => resolve(reader.result.split(',')[1]);     reader.readAsDataURL(file);   });   return {     inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },   }; }  const fileInputEl = document.querySelector("input[type=file]"); const imagePart = await fileToGenerativePart(fileInputEl.files[0]);  // Provide an initial text prompt instructing the model to edit the image const prompt = "Edit this image to make it look like a cartoon";  // Initialize the chat const chat = model.startChat();  // To generate an initial response, send a user message with the image and text prompt const result = await chat.sendMessage([prompt, imagePart]);  // Request and inspect the generated image try {   const inlineDataParts = result.response.inlineDataParts();   if (inlineDataParts?.[0]) {     // Inspect the generated image     const image = inlineDataParts[0].inlineData;     console.log(image.mimeType, image.data);   } } catch (err) {   console.error('Prompt or candidate was blocked:', err); }  // Follow up requests do not need to specify the image again const followUpResult = await chat.sendMessage("But make it old-school line drawing style");  // Request and inspect the returned image try {   const followUpInlineDataParts = followUpResult.response.inlineDataParts();   if (followUpInlineDataParts?.[0]) {     // Inspect the generated image     const followUpImage = followUpInlineDataParts[0].inlineData;     console.log(followUpImage.mimeType, followUpImage.data);   } } catch (err) {   console.error('Prompt or candidate was blocked:', err); }

Dart

 import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart';  await Firebase.initializeApp(   options: DefaultFirebaseOptions.currentPlatform, );  // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output final model = FirebaseAI.googleAI().generativeModel(   model: 'gemini-2.5-flash-image',   // Configure the model to respond with text and images (required)   generationConfig: GenerationConfig(responseModalities: [ResponseModalities.text, ResponseModalities.image]), );  // Prepare an image for the model to edit final image = await File('scones.jpg').readAsBytes(); final imagePart = InlineDataPart('image/jpeg', image);  // Provide an initial text prompt instructing the model to edit the image final prompt = TextPart("Edit this image to make it look like a cartoon");  // Initialize the chat final chat = model.startChat();  // To generate an initial response, send a user message with the image and text prompt final response = await chat.sendMessage([   Content.multi([prompt,imagePart]) ]);  // Inspect the returned image if (response.inlineDataParts.isNotEmpty) {   final imageBytes = response.inlineDataParts[0].bytes;   // Process the image } else {   // Handle the case where no images were generated   print('Error: No images were generated.'); }  // Follow up requests do not need to specify the image again final followUpResponse = await chat.sendMessage([   Content.text("But make it old-school line drawing style") ]);  // Inspect the returned image if (followUpResponse.inlineDataParts.isNotEmpty) {   final followUpImageBytes = response.inlineDataParts[0].bytes;   // Process the image } else {   // Handle the case where no images were generated   print('Error: No images were generated.'); }

Unity

 using Firebase; using Firebase.AI;  // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel(   modelName: "gemini-2.5-flash-image",   // Configure the model to respond with text and images (required)   generationConfig: new GenerationConfig(     responseModalities: new[] { ResponseModality.Text, ResponseModality.Image }) );  // Prepare an image for the model to edit var imageFile = System.IO.File.ReadAllBytes(System.IO.Path.Combine(   UnityEngine.Application.streamingAssetsPath, "scones.jpg")); var image = ModelContent.InlineData("image/jpeg", imageFile);  // Provide an initial text prompt instructing the model to edit the image var prompt = ModelContent.Text("Edit this image to make it look like a cartoon.");  // Initialize the chat var chat = model.StartChat();  // To generate an initial response, send a user message with the image and text prompt var response = await chat.SendMessageAsync(new [] { prompt, image });  // Inspect the returned image var imageParts = response.Candidates.First().Content.Parts                          .OfType<ModelContent.InlineDataPart>()                          .Where(part => part.MimeType == "image/png"); // Load the image into a Unity Texture2D object UnityEngine.Texture2D texture2D = new(2, 2); if (texture2D.LoadImage(imageParts.First().Data.ToArray())) {   // Do something with the image }  // Follow up requests do not need to specify the image again var followUpResponse = await chat.SendMessageAsync("But make it old-school line drawing style");  // Inspect the returned image var followUpImageParts = followUpResponse.Candidates.First().Content.Parts                          .OfType<ModelContent.InlineDataPart>()                          .Where(part => part.MimeType == "image/png"); // Load the image into a Unity Texture2D object UnityEngine.Texture2D followUpTexture2D = new(2, 2); if (followUpTexture2D.LoadImage(followUpImageParts.First().Data.ToArray())) {   // Do something with the image }

Các tính năng được hỗ trợ, hạn chế và phương pháp hay nhất

Các phương thức và chức năng được hỗ trợ

Sau đây là các phương thức và chức năng được hỗ trợ cho đầu ra hình ảnh từ mô hình Gemini. Mỗi chức năng đều có một câu lệnh mẫu và có một đoạn mã mẫu ở trên.

Văn bản Hình ảnh(các hình ảnh) (chỉ văn bản thành hình ảnh)
- Tạo hình ảnh tháp Eiffel với pháo hoa ở phía sau.
Văn bản Hình ảnh (kết xuất văn bản trong hình ảnh)
- Tạo một bức ảnh đậm chất điện ảnh về một toà nhà lớn với dòng chữ khổng lồ được chiếu lên mặt tiền của toà nhà này.
Văn bản Hình ảnh và văn bản (xen kẽ)
- Tạo một công thức minh hoạ cho món paella. Tạo hình ảnh cùng với văn bản khi bạn tạo công thức.
- Tạo một câu chuyện về một chú chó theo phong cách hoạt hình 3D. Tạo hình ảnh cho từng cảnh.
Hình ảnh và văn bản Hình ảnh và văn bản (xen kẽ)
- [hình ảnh một căn phòng có đồ đạc] + Những màu sắc nào khác của ghế sofa sẽ phù hợp với không gian của tôi? Bạn có thể cập nhật hình ảnh không?
Chỉnh sửa ảnh (văn bản và hình ảnh thành hình ảnh)
- [hình ảnh bánh nướng] + Chỉnh sửa hình ảnh này để trông giống như một bức tranh biếm hoạ
- [hình ảnh một chú mèo] + [hình ảnh một chiếc gối] + Tạo một bức tranh thêu chữ thập về chú mèo của tôi trên chiếc gối này.
Chỉnh sửa hình ảnh nhiều lượt (trò chuyện)
- [hình ảnh một chiếc ô tô màu xanh dương] + Biến chiếc ô tô này thành ô tô mui trần., sau đó Bây giờ, hãy đổi màu thành vàng.

Giới hạn và các phương pháp hay nhất

Sau đây là các hạn chế và phương pháp hay nhất đối với đầu ra hình ảnh từ mô hình Gemini.

Các mô hình Gemini tạo hình ảnh hỗ trợ những nội dung sau:
- Tạo hình ảnh PNG có kích thước tối đa là 1024 px.
- Tạo và chỉnh sửa hình ảnh về con người.
- Sử dụng các bộ lọc an toàn mang đến trải nghiệm linh hoạt và ít hạn chế hơn cho người dùng.
Các mô hình Gemini tạo hình ảnh không hỗ trợ những nội dung sau:
- Bao gồm cả đầu vào âm thanh hoặc video.
- Chỉ tạo hình ảnh .
  Các mô hình sẽ luôn trả về cả văn bản và hình ảnh, đồng thời bạn phải thêm responseModalities: ["TEXT", "IMAGE"] vào cấu hình mô hình.
Để đạt hiệu suất tốt nhất, hãy sử dụng các ngôn ngữ sau: en, es-mx, ja-jp, zh-cn, hi-in.
Tính năng tạo hình ảnh có thể không phải lúc nào cũng hoạt động. Sau đây là một số vấn đề đã biết:
- Mô hình này chỉ có thể xuất văn bản.
  Hãy thử yêu cầu rõ ràng về đầu ra là hình ảnh (ví dụ: "tạo một hình ảnh", "cung cấp hình ảnh trong quá trình trò chuyện", "cập nhật hình ảnh").
- Mô hình có thể ngừng tạo nội dung khi chưa hoàn tất.
  Hãy thử lại hoặc thử một câu lệnh khác.
- Mô hình có thể tạo văn bản dưới dạng hình ảnh.
  Hãy thử yêu cầu rõ ràng về đầu ra dạng văn bản. Ví dụ: "tạo văn bản tường thuật cùng với hình minh hoạ".
Khi tạo văn bản cho một hình ảnh, Gemini hoạt động hiệu quả nhất nếu bạn tạo văn bản trước rồi yêu cầu tạo hình ảnh có văn bản đó.

Tạo và chỉnh sửa hình ảnh bằng Gemini (còn gọi là "nano banana") Sử dụng bộ sưu tập để sắp xếp ngăn nắp các trang Lưu và phân loại nội dung dựa trên lựa chọn ưu tiên của bạn.

Lựa chọn giữa các mô hình Gemini và Imagen

Trước khi bắt đầu

Các mẫu hỗ trợ tính năng này

Tạo và chỉnh sửa hình ảnh

Tạo hình ảnh (chỉ nhập văn bản)

Swift

Kotlin

Java

Web

Dart

Unity

Tạo hình ảnh và văn bản xen kẽ

Swift

Kotlin

Java

Web

Dart

Unity

Chỉnh sửa hình ảnh (đầu vào là văn bản và hình ảnh)

Swift

Kotlin

Java

Web

Dart

Unity

Lặp lại và chỉnh sửa hình ảnh bằng tính năng trò chuyện nhiều lượt

Swift

Kotlin

Java

Web

Dart

Unity

Các tính năng được hỗ trợ, hạn chế và phương pháp hay nhất

Các phương thức và chức năng được hỗ trợ

Giới hạn và các phương pháp hay nhất

Tạo và chỉnh sửa hình ảnh bằng Gemini (còn gọi là "nano banana")