Document understanding

I modelli Gemini possono elaborare documenti in formato PDF utilizzando la visione nativa per comprendere i contesti dei documenti interi. Questa funzionalità va oltre la semplice estrazione del testo e consente a Gemini di:

  • Analizza e interpreta contenuti, inclusi testo, immagini, diagrammi, grafici e tabelle, anche in documenti lunghi fino a 1000 pagine.
  • Estrai le informazioni in formati di output strutturato.
  • Riassumere e rispondere a domande basate sugli elementi visivi e testuali di un documento.
  • Trascrivi i contenuti del documento (ad es. in HTML), conservando i layout e la formattazione, per l'utilizzo nelle applicazioni downstream.

Trasferimento di dati PDF incorporati

Puoi trasmettere i dati PDF incorporati nella richiesta a generateContent. Per i payload PDF di dimensioni inferiori a 20 MB, puoi scegliere tra il caricamento di documenti codificati in base64 o il caricamento diretto di file archiviati localmente.

L'esempio seguente mostra come recuperare un PDF da un URL e convertirlo in byte per l'elaborazione:

Python

from google import genai from google.genai import types import httpx  client = genai.Client()  doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"  # Retrieve and encode the PDF byte doc_data = httpx.get(doc_url).content  prompt = "Summarize this document" response = client.models.generate_content(   model="gemini-2.5-flash",   contents=[       types.Part.from_bytes(         data=doc_data,         mime_type='application/pdf',       ),       prompt]) print(response.text) 

JavaScript

import { GoogleGenAI } from "@google/genai";  const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" });  async function main() {     const pdfResp = await fetch('https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf')         .then((response) => response.arrayBuffer());      const contents = [         { text: "Summarize this document" },         {             inlineData: {                 mimeType: 'application/pdf',                 data: Buffer.from(pdfResp).toString("base64")             }         }     ];      const response = await ai.models.generateContent({         model: "gemini-2.5-flash",         contents: contents     });     console.log(response.text); }  main(); 

Vai

package main  import (     "context"     "fmt"     "io"     "net/http"     "os"     "google.golang.org/genai" )  func main() {      ctx := context.Background()     client, _ := genai.NewClient(ctx, &genai.ClientConfig{         APIKey:  os.Getenv("GEMINI_API_KEY"),         Backend: genai.BackendGeminiAPI,     })      pdfResp, _ := http.Get("https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf")     var pdfBytes []byte     if pdfResp != nil && pdfResp.Body != nil {         pdfBytes, _ = io.ReadAll(pdfResp.Body)         pdfResp.Body.Close()     }      parts := []*genai.Part{         &genai.Part{             InlineData: &genai.Blob{                 MIMEType: "application/pdf",                 Data:     pdfBytes,             },         },         genai.NewPartFromText("Summarize this document"),     }      contents := []*genai.Content{         genai.NewContentFromParts(parts, genai.RoleUser),     }      result, _ := client.Models.GenerateContent(         ctx,         "gemini-2.5-flash",         contents,         nil,     )      fmt.Println(result.Text()) } 

REST

DOC_URL="https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf" PROMPT="Summarize this document" DISPLAY_NAME="base64_pdf"  # Download the PDF wget -O "${DISPLAY_NAME}.pdf" "${DOC_URL}"  # Check for FreeBSD base64 and set flags accordingly if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then   B64FLAGS="--input" else   B64FLAGS="-w0" fi  # Base64 encode the PDF ENCODED_PDF=$(base64 $B64FLAGS "${DISPLAY_NAME}.pdf")  # Generate content using the base64 encoded PDF curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \     -H 'Content-Type: application/json' \     -X POST \     -d '{       "contents": [{         "parts":[           {"inline_data": {"mime_type": "application/pdf", "data": "'"$ENCODED_PDF"'"}},           {"text": "'$PROMPT'"}         ]       }]     }' 2> /dev/null > response.json  cat response.json echo  jq ".candidates[].content.parts[].text" response.json  # Clean up the downloaded PDF rm "${DISPLAY_NAME}.pdf" 

Puoi anche leggere un PDF da un file locale per l'elaborazione:

Python

from google import genai from google.genai import types import pathlib  client = genai.Client()  # Retrieve and encode the PDF byte filepath = pathlib.Path('file.pdf')  prompt = "Summarize this document" response = client.models.generate_content(   model="gemini-2.5-flash",   contents=[       types.Part.from_bytes(         data=filepath.read_bytes(),         mime_type='application/pdf',       ),       prompt]) print(response.text) 

JavaScript

import { GoogleGenAI } from "@google/genai"; import * as fs from 'fs';  const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" });  async function main() {     const contents = [         { text: "Summarize this document" },         {             inlineData: {                 mimeType: 'application/pdf',                 data: Buffer.from(fs.readFileSync("content/343019_3_art_0_py4t4l_convrt.pdf")).toString("base64")             }         }     ];      const response = await ai.models.generateContent({         model: "gemini-2.5-flash",         contents: contents     });     console.log(response.text); }  main(); 

Vai

package main  import (     "context"     "fmt"     "os"     "google.golang.org/genai" )  func main() {      ctx := context.Background()     client, _ := genai.NewClient(ctx, &genai.ClientConfig{         APIKey:  os.Getenv("GEMINI_API_KEY"),         Backend: genai.BackendGeminiAPI,     })      pdfBytes, _ := os.ReadFile("path/to/your/file.pdf")      parts := []*genai.Part{         &genai.Part{             InlineData: &genai.Blob{                 MIMEType: "application/pdf",                 Data:     pdfBytes,             },         },         genai.NewPartFromText("Summarize this document"),     }     contents := []*genai.Content{         genai.NewContentFromParts(parts, genai.RoleUser),     }      result, _ := client.Models.GenerateContent(         ctx,         "gemini-2.5-flash",         contents,         nil,     )      fmt.Println(result.Text()) } 

Caricamento di PDF utilizzando l'API File

Puoi utilizzare l'API File per caricare documenti più grandi. Utilizza sempre l'API File quando le dimensioni totali della richiesta (inclusi i file, il prompt di testo, le istruzioni di sistema e così via) sono superiori a 20 MB.

Chiama media.upload per caricare un file utilizzando l'API File. Il seguente codice carica un file di documento e poi lo utilizza in una chiamata a models.generateContent.

PDF di grandi dimensioni da URL

Utilizza l'API File per semplificare il caricamento e l'elaborazione di file PDF di grandi dimensioni dagli URL:

Python

from google import genai from google.genai import types import io import httpx  client = genai.Client()  long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf"  # Retrieve and upload the PDF using the File API doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)  sample_doc = client.files.upload(   # You can pass a path or a file-like object here   file=doc_io,   config=dict(     mime_type='application/pdf') )  prompt = "Summarize this document"  response = client.models.generate_content(   model="gemini-2.5-flash",   contents=[sample_doc, prompt]) print(response.text) 

JavaScript

import { createPartFromUri, GoogleGenAI } from "@google/genai";  const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" });  async function main() {      const pdfBuffer = await fetch("https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf")         .then((response) => response.arrayBuffer());      const fileBlob = new Blob([pdfBuffer], { type: 'application/pdf' });      const file = await ai.files.upload({         file: fileBlob,         config: {             displayName: 'A17_FlightPlan.pdf',         },     });      // Wait for the file to be processed.     let getFile = await ai.files.get({ name: file.name });     while (getFile.state === 'PROCESSING') {         getFile = await ai.files.get({ name: file.name });         console.log(`current file status: ${getFile.state}`);         console.log('File is still processing, retrying in 5 seconds');          await new Promise((resolve) => {             setTimeout(resolve, 5000);         });     }     if (file.state === 'FAILED') {         throw new Error('File processing failed.');     }      // Add the file to the contents.     const content = [         'Summarize this document',     ];      if (file.uri && file.mimeType) {         const fileContent = createPartFromUri(file.uri, file.mimeType);         content.push(fileContent);     }      const response = await ai.models.generateContent({         model: 'gemini-2.5-flash',         contents: content,     });      console.log(response.text);  }  main(); 

Vai

package main  import (   "context"   "fmt"   "io"   "net/http"   "os"   "google.golang.org/genai" )  func main() {    ctx := context.Background()   client, _ := genai.NewClient(ctx, &genai.ClientConfig{     APIKey:  os.Getenv("GEMINI_API_KEY"),     Backend: genai.BackendGeminiAPI,   })    pdfURL := "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf"   localPdfPath := "A17_FlightPlan_downloaded.pdf"    respHttp, _ := http.Get(pdfURL)   defer respHttp.Body.Close()    outFile, _ := os.Create(localPdfPath)   defer outFile.Close()    _, _ = io.Copy(outFile, respHttp.Body)    uploadConfig := &genai.UploadFileConfig{MIMEType: "application/pdf"}   uploadedFile, _ := client.Files.UploadFromPath(ctx, localPdfPath, uploadConfig)    promptParts := []*genai.Part{     genai.NewPartFromURI(uploadedFile.URI, uploadedFile.MIMEType),     genai.NewPartFromText("Summarize this document"),   }   contents := []*genai.Content{     genai.NewContentFromParts(promptParts, genai.RoleUser), // Specify role   }      result, _ := client.Models.GenerateContent(         ctx,         "gemini-2.5-flash",         contents,         nil,     )    fmt.Println(result.Text()) } 

REST

PDF_PATH="https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" DISPLAY_NAME="A17_FlightPlan" PROMPT="Summarize this document"  # Download the PDF from the provided URL wget -O "${DISPLAY_NAME}.pdf" "${PDF_PATH}"  MIME_TYPE=$(file -b --mime-type "${DISPLAY_NAME}.pdf") NUM_BYTES=$(wc -c < "${DISPLAY_NAME}.pdf")  echo "MIME_TYPE: ${MIME_TYPE}" echo "NUM_BYTES: ${NUM_BYTES}"  tmp_header_file=upload-header.tmp  # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "${BASE_URL}/upload/v1beta/files?key=${GOOGLE_API_KEY}" \   -D upload-header.tmp \   -H "X-Goog-Upload-Protocol: resumable" \   -H "X-Goog-Upload-Command: start" \   -H "X-Goog-Upload-Header-Content-Length: ${NUM_BYTES}" \   -H "X-Goog-Upload-Header-Content-Type: ${MIME_TYPE}" \   -H "Content-Type: application/json" \   -d "{'file': {'display_name': '${DISPLAY_NAME}'}}" 2> /dev/null  upload_url=$(grep -i "x-goog-upload-url: " "${tmp_header_file}" | cut -d" " -f2 | tr -d "\r") rm "${tmp_header_file}"  # Upload the actual bytes. curl "${upload_url}" \   -H "Content-Length: ${NUM_BYTES}" \   -H "X-Goog-Upload-Offset: 0" \   -H "X-Goog-Upload-Command: upload, finalize" \   --data-binary "@${DISPLAY_NAME}.pdf" 2> /dev/null > file_info.json  file_uri=$(jq ".file.uri" file_info.json) echo "file_uri: ${file_uri}"  # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \     -H 'Content-Type: application/json' \     -X POST \     -d '{       "contents": [{         "parts":[           {"text": "'$PROMPT'"},           {"file_data":{"mime_type": "application/pdf", "file_uri": '$file_uri'}}]         }]       }' 2> /dev/null > response.json  cat response.json echo  jq ".candidates[].content.parts[].text" response.json  # Clean up the downloaded PDF rm "${DISPLAY_NAME}.pdf" 

PDF di grandi dimensioni archiviati localmente

Python

from google import genai from google.genai import types import pathlib import httpx  client = genai.Client()  # Retrieve and encode the PDF byte file_path = pathlib.Path('large_file.pdf')  # Upload the PDF using the File API sample_file = client.files.upload(   file=file_path, )  prompt="Summarize this document"  response = client.models.generate_content(   model="gemini-2.5-flash",   contents=[sample_file, "Summarize this document"]) print(response.text) 

JavaScript

import { createPartFromUri, GoogleGenAI } from "@google/genai";  const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" });  async function main() {     const file = await ai.files.upload({         file: 'path-to-localfile.pdf'         config: {             displayName: 'A17_FlightPlan.pdf',         },     });      // Wait for the file to be processed.     let getFile = await ai.files.get({ name: file.name });     while (getFile.state === 'PROCESSING') {         getFile = await ai.files.get({ name: file.name });         console.log(`current file status: ${getFile.state}`);         console.log('File is still processing, retrying in 5 seconds');          await new Promise((resolve) => {             setTimeout(resolve, 5000);         });     }     if (file.state === 'FAILED') {         throw new Error('File processing failed.');     }      // Add the file to the contents.     const content = [         'Summarize this document',     ];      if (file.uri && file.mimeType) {         const fileContent = createPartFromUri(file.uri, file.mimeType);         content.push(fileContent);     }      const response = await ai.models.generateContent({         model: 'gemini-2.5-flash',         contents: content,     });      console.log(response.text);  }  main(); 

Vai

package main  import (     "context"     "fmt"     "os"     "google.golang.org/genai" )  func main() {      ctx := context.Background()     client, _ := genai.NewClient(ctx, &genai.ClientConfig{         APIKey:  os.Getenv("GEMINI_API_KEY"),         Backend: genai.BackendGeminiAPI,     })     localPdfPath := "/path/to/file.pdf"      uploadConfig := &genai.UploadFileConfig{MIMEType: "application/pdf"}     uploadedFile, _ := client.Files.UploadFromPath(ctx, localPdfPath, uploadConfig)      promptParts := []*genai.Part{         genai.NewPartFromURI(uploadedFile.URI, uploadedFile.MIMEType),         genai.NewPartFromText("Give me a summary of this pdf file."),     }     contents := []*genai.Content{         genai.NewContentFromParts(promptParts, genai.RoleUser),     }      result, _ := client.Models.GenerateContent(         ctx,         "gemini-2.5-flash",         contents,         nil,     )      fmt.Println(result.Text()) } 

REST

NUM_BYTES=$(wc -c < "${PDF_PATH}") DISPLAY_NAME=TEXT tmp_header_file=upload-header.tmp  # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "${BASE_URL}/upload/v1beta/files?key=${GEMINI_API_KEY}" \   -D upload-header.tmp \   -H "X-Goog-Upload-Protocol: resumable" \   -H "X-Goog-Upload-Command: start" \   -H "X-Goog-Upload-Header-Content-Length: ${NUM_BYTES}" \   -H "X-Goog-Upload-Header-Content-Type: application/pdf" \   -H "Content-Type: application/json" \   -d "{'file': {'display_name': '${DISPLAY_NAME}'}}" 2> /dev/null  upload_url=$(grep -i "x-goog-upload-url: " "${tmp_header_file}" | cut -d" " -f2 | tr -d "\r") rm "${tmp_header_file}"  # Upload the actual bytes. curl "${upload_url}" \   -H "Content-Length: ${NUM_BYTES}" \   -H "X-Goog-Upload-Offset: 0" \   -H "X-Goog-Upload-Command: upload, finalize" \   --data-binary "@${PDF_PATH}" 2> /dev/null > file_info.json  file_uri=$(jq ".file.uri" file_info.json) echo file_uri=$file_uri  # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \     -H 'Content-Type: application/json' \     -X POST \     -d '{       "contents": [{         "parts":[           {"text": "Can you add a few more lines to this poem?"},           {"file_data":{"mime_type": "application/pdf", "file_uri": '$file_uri'}}]         }]       }' 2> /dev/null > response.json  cat response.json echo  jq ".candidates[].content.parts[].text" response.json 

Puoi verificare che l'API abbia memorizzato correttamente il file caricato e recuperare i relativi metadati chiamando files.get. Solo name (e, per estensione, uri) sono univoci.

Python

from google import genai import pathlib  client = genai.Client()  fpath = pathlib.Path('example.txt') fpath.write_text('hello')  file = client.files.upload(file='example.txt')  file_info = client.files.get(name=file.name) print(file_info.model_dump_json(indent=4)) 

REST

name=$(jq ".file.name" file_info.json) # Get the file of interest to check state curl https://generativelanguage.googleapis.com/v1beta/files/$name > file_info.json # Print some information about the file you got name=$(jq ".file.name" file_info.json) echo name=$name file_uri=$(jq ".file.uri" file_info.json) echo file_uri=$file_uri 

Invio di più PDF

L'API Gemini è in grado di elaborare più documenti PDF (fino a 1000 pagine) in un'unica richiesta, a condizione che le dimensioni combinate dei documenti e del prompt di testo rimangano all'interno della finestra contestuale del modello.

Python

from google import genai import io import httpx  client = genai.Client()  doc_url_1 = "https://arxiv.org/pdf/2312.11805" doc_url_2 = "https://arxiv.org/pdf/2403.05530"  # Retrieve and upload both PDFs using the File API doc_data_1 = io.BytesIO(httpx.get(doc_url_1).content) doc_data_2 = io.BytesIO(httpx.get(doc_url_2).content)  sample_pdf_1 = client.files.upload(   file=doc_data_1,   config=dict(mime_type='application/pdf') ) sample_pdf_2 = client.files.upload(   file=doc_data_2,   config=dict(mime_type='application/pdf') )  prompt = "What is the difference between each of the main benchmarks between these two papers? Output these in a table."  response = client.models.generate_content(   model="gemini-2.5-flash",   contents=[sample_pdf_1, sample_pdf_2, prompt]) print(response.text) 

JavaScript

import { createPartFromUri, GoogleGenAI } from "@google/genai";  const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" });  async function uploadRemotePDF(url, displayName) {     const pdfBuffer = await fetch(url)         .then((response) => response.arrayBuffer());      const fileBlob = new Blob([pdfBuffer], { type: 'application/pdf' });      const file = await ai.files.upload({         file: fileBlob,         config: {             displayName: displayName,         },     });      // Wait for the file to be processed.     let getFile = await ai.files.get({ name: file.name });     while (getFile.state === 'PROCESSING') {         getFile = await ai.files.get({ name: file.name });         console.log(`current file status: ${getFile.state}`);         console.log('File is still processing, retrying in 5 seconds');          await new Promise((resolve) => {             setTimeout(resolve, 5000);         });     }     if (file.state === 'FAILED') {         throw new Error('File processing failed.');     }      return file; }  async function main() {     const content = [         'What is the difference between each of the main benchmarks between these two papers? Output these in a table.',     ];      let file1 = await uploadRemotePDF("https://arxiv.org/pdf/2312.11805", "PDF 1")     if (file1.uri && file1.mimeType) {         const fileContent = createPartFromUri(file1.uri, file1.mimeType);         content.push(fileContent);     }     let file2 = await uploadRemotePDF("https://arxiv.org/pdf/2403.05530", "PDF 2")     if (file2.uri && file2.mimeType) {         const fileContent = createPartFromUri(file2.uri, file2.mimeType);         content.push(fileContent);     }      const response = await ai.models.generateContent({         model: 'gemini-2.5-flash',         contents: content,     });      console.log(response.text); }  main(); 

Vai

package main  import (     "context"     "fmt"     "io"     "net/http"     "os"     "google.golang.org/genai" )  func main() {      ctx := context.Background()     client, _ := genai.NewClient(ctx, &genai.ClientConfig{         APIKey:  os.Getenv("GEMINI_API_KEY"),         Backend: genai.BackendGeminiAPI,     })      docUrl1 := "https://arxiv.org/pdf/2312.11805"     docUrl2 := "https://arxiv.org/pdf/2403.05530"     localPath1 := "doc1_downloaded.pdf"     localPath2 := "doc2_downloaded.pdf"      respHttp1, _ := http.Get(docUrl1)     defer respHttp1.Body.Close()      outFile1, _ := os.Create(localPath1)     _, _ = io.Copy(outFile1, respHttp1.Body)     outFile1.Close()      respHttp2, _ := http.Get(docUrl2)     defer respHttp2.Body.Close()      outFile2, _ := os.Create(localPath2)     _, _ = io.Copy(outFile2, respHttp2.Body)     outFile2.Close()      uploadConfig1 := &genai.UploadFileConfig{MIMEType: "application/pdf"}     uploadedFile1, _ := client.Files.UploadFromPath(ctx, localPath1, uploadConfig1)      uploadConfig2 := &genai.UploadFileConfig{MIMEType: "application/pdf"}     uploadedFile2, _ := client.Files.UploadFromPath(ctx, localPath2, uploadConfig2)      promptParts := []*genai.Part{         genai.NewPartFromURI(uploadedFile1.URI, uploadedFile1.MIMEType),         genai.NewPartFromURI(uploadedFile2.URI, uploadedFile2.MIMEType),         genai.NewPartFromText("What is the difference between each of the " +                               "main benchmarks between these two papers? " +                               "Output these in a table."),     }     contents := []*genai.Content{         genai.NewContentFromParts(promptParts, genai.RoleUser),     }      modelName := "gemini-2.5-flash"     result, _ := client.Models.GenerateContent(         ctx,         modelName,         contents,         nil,     )      fmt.Println(result.Text()) } 

REST

DOC_URL_1="https://arxiv.org/pdf/2312.11805" DOC_URL_2="https://arxiv.org/pdf/2403.05530" DISPLAY_NAME_1="Gemini_paper" DISPLAY_NAME_2="Gemini_1.5_paper" PROMPT="What is the difference between each of the main benchmarks between these two papers? Output these in a table."  # Function to download and upload a PDF upload_pdf() {   local doc_url="$1"   local display_name="$2"    # Download the PDF   wget -O "${display_name}.pdf" "${doc_url}"    local MIME_TYPE=$(file -b --mime-type "${display_name}.pdf")   local NUM_BYTES=$(wc -c < "${display_name}.pdf")    echo "MIME_TYPE: ${MIME_TYPE}"   echo "NUM_BYTES: ${NUM_BYTES}"    local tmp_header_file=upload-header.tmp    # Initial resumable request   curl "${BASE_URL}/upload/v1beta/files?key=${GOOGLE_API_KEY}" \     -D "${tmp_header_file}" \     -H "X-Goog-Upload-Protocol: resumable" \     -H "X-Goog-Upload-Command: start" \     -H "X-Goog-Upload-Header-Content-Length: ${NUM_BYTES}" \     -H "X-Goog-Upload-Header-Content-Type: ${MIME_TYPE}" \     -H "Content-Type: application/json" \     -d "{'file': {'display_name': '${display_name}'}}" 2> /dev/null    local upload_url=$(grep -i "x-goog-upload-url: " "${tmp_header_file}" | cut -d" " -f2 | tr -d "\r")   rm "${tmp_header_file}"    # Upload the PDF   curl "${upload_url}" \     -H "Content-Length: ${NUM_BYTES}" \     -H "X-Goog-Upload-Offset: 0" \     -H "X-Goog-Upload-Command: upload, finalize" \     --data-binary "@${display_name}.pdf" 2> /dev/null > "file_info_${display_name}.json"    local file_uri=$(jq ".file.uri" "file_info_${display_name}.json")   echo "file_uri for ${display_name}: ${file_uri}"    # Clean up the downloaded PDF   rm "${display_name}.pdf"    echo "${file_uri}" }  # Upload the first PDF file_uri_1=$(upload_pdf "${DOC_URL_1}" "${DISPLAY_NAME_1}")  # Upload the second PDF file_uri_2=$(upload_pdf "${DOC_URL_2}" "${DISPLAY_NAME_2}")  # Now generate content using both files curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \     -H 'Content-Type: application/json' \     -X POST \     -d '{       "contents": [{         "parts":[           {"file_data": {"mime_type": "application/pdf", "file_uri": '$file_uri_1'}},           {"file_data": {"mime_type": "application/pdf", "file_uri": '$file_uri_2'}},           {"text": "'$PROMPT'"}         ]       }]     }' 2> /dev/null > response.json  cat response.json echo  jq ".candidates[].content.parts[].text" response.json 

Dettagli tecnici

Gemini supporta un massimo di 1000 pagine di documenti. Ogni pagina del documento equivale a 258 token.

Sebbene non esistano limiti specifici al numero di pixel in un documento, oltre alla finestra contestuale del modello, le pagine più grandi vengono ridimensionate a una risoluzione massima di 3072 x 3072 mantenendo le proporzioni originali, mentre le pagine più piccole vengono ridimensionate a 768 x 768 pixel. Non è prevista alcuna riduzione dei costi per le pagine di dimensioni inferiori, ad eccezione della larghezza di banda, né alcun miglioramento delle prestazioni per le pagine a risoluzione più elevata.

Tipi di documenti

Tecnicamente, puoi passare altri tipi MIME per la comprensione dei documenti, come TXT, Markdown, HTML, XML e così via. Tuttavia, Document Vision comprende in modo significativo solo i PDF. Gli altri tipi verranno estratti come testo normale e il modello non sarà in grado di interpretare ciò che vediamo nel rendering di questi file. Verranno persi tutti i dettagli specifici del tipo di file, come grafici, diagrammi, tag HTML, formattazione Markdown e così via.

Best practice

Per ottenere risultati ottimali:

  • Ruota le pagine nell'orientamento corretto prima del caricamento.
  • Evita pagine sfocate.
  • Se utilizzi una sola pagina, inserisci il prompt di testo dopo la pagina.

Passaggi successivi

Per saperne di più, consulta le seguenti risorse:

  • Strategie di prompt dei file: l'API Gemini supporta i prompt con dati di testo, immagine, audio e video, noti anche come prompt multimodali.
  • Istruzioni di sistema: le istruzioni di sistema ti consentono di orientare il comportamento del modello in base alle tue esigenze e ai tuoi casi d'uso specifici.