大型語言模型如何串流回覆

發布日期:2025 年 1 月 21 日

串流 LLM 回覆是由遞增且連續發出的資料組成。 串流資料在伺服器和用戶端上看起來不同。

從伺服器

如要瞭解串流回應的樣貌,我使用指令列工具 curl 提示 Gemini 說一個長笑話。請參考以下對 Gemini API 的呼叫。如果試用,請務必將網址中的 {GOOGLE_API_KEY} 替換成您的 Gemini API 金鑰。

$ curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:streamGenerateContent?alt=sse&key={GOOGLE_API_KEY}" \       -H 'Content-Type: application/json' \       --no-buffer \       -d '{ "contents":[{"parts":[{"text": "Tell me a long T-rex joke, please."}]}]}' 

這項要求會以事件串流格式記錄下列 (經過截斷) 輸出內容。 每行開頭都是 data:,後面接著訊息酬載。具體格式並不重要,重要的是文字區塊。

data: {   "candidates":[{     "content": {       "parts": [{"text": "A T-Rex"}],       "role": "model"     },     "finishReason": "STOP","index": 0,"safetyRatings": [       {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE"},       {"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE"},       {"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE"},       {"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE"}]   }],   "usageMetadata": {"promptTokenCount": 11,"candidatesTokenCount": 4,"totalTokenCount": 15} }  data: {   "candidates": [{     "content": {       "parts": [{ "text": " walks into a bar and orders a drink. As he sits there, he notices a" }],       "role": "model"     },     "finishReason": "STOP","index": 0,"safetyRatings": [       {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE"},       {"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE"},       {"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE"},       {"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE"}]   }],   "usageMetadata": {"promptTokenCount": 11,"candidatesTokenCount": 21,"totalTokenCount": 32} } 
執行指令後,結果區塊會以串流形式傳入。

第一個酬載是 JSON。請仔細查看醒目顯示的candidates[0].content.parts[0].text

{   "candidates": [     {       "content": {         "parts": [           {             "text": "A T-Rex"           }         ],         "role": "model"       },       "finishReason": "STOP",       "index": 0,       "safetyRatings": [         {           "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",           "probability": "NEGLIGIBLE"         },         {           "category": "HARM_CATEGORY_HATE_SPEECH",           "probability": "NEGLIGIBLE"         },         {           "category": "HARM_CATEGORY_HARASSMENT",           "probability": "NEGLIGIBLE"         },         {           "category": "HARM_CATEGORY_DANGEROUS_CONTENT",           "probability": "NEGLIGIBLE"         }       ]     }   ],   "usageMetadata": {     "promptTokenCount": 11,     "candidatesTokenCount": 4,     "totalTokenCount": 15   } } 

第一個 text 項目是 Gemini 回覆的開頭。如果擷取更多 text 項目,回應會以換行符分隔。

以下程式碼片段顯示多個 text 項目,代表模型傳回的最終回應。

"A T-Rex"  " was walking through the prehistoric jungle when he came across a group of Triceratops. "  "\n\n\"Hey, Triceratops!\" the T-Rex roared. \"What are"  " you guys doing?\"\n\nThe Triceratops, a bit nervous, mumbled, \"Just... just hanging out, you know? Relaxing.\"\n\n\"Well, you"  " guys look pretty relaxed,\" the T-Rex said, eyeing them with a sly grin. \"Maybe you could give me a hand with something.\"\n\n\"A hand?\""  ... 

但如果要求模型提供稍微複雜的內容,而不是 T-rex 笑話,會發生什麼事?舉例來說,您可以要求 Gemini 產生 JavaScript 函式,判斷數字是偶數還是奇數。text:區塊看起來稍有不同。

輸出內容現在包含 Markdown 格式,開頭是 JavaScript 程式碼區塊。下列範例包含與先前相同的預先處理步驟。

"```javascript\nfunction"  " isEven(number) {\n  // Check if the number is an integer.\n"  "  if (Number.isInteger(number)) {\n  // Use the modulo operator"  " (%) to check if the remainder after dividing by 2 is 0.\n  return number % 2 === 0; \n  } else {\n  " "// Return false if the number is not an integer.\n    return false;\n }\n}\n\n// Example usage:\nconsole.log(isEven("  "4)); // Output: true\nconsole.log(isEven(7)); // Output: false\nconsole.log(isEven(3.5)); // Output: false\n```\n\n**Explanation:**\n\n1. **`isEven("  "number)` function:**\n   - Takes a single argument `number` representing the number to be checked.\n   - Checks if the `number` is an integer using `Number.isInteger()`.\n   - If it's an"  ... 

更棘手的是,部分標記項目會從一個區塊開始,在另一個區塊結束。部分標記是巢狀結構。在下列範例中,醒目顯示的函式會分成兩行:**isEven(number) function:**。合併後,輸出內容為 **isEven("number) function:**。也就是說,如果想輸出格式化的 Markdown,就不能只用 Markdown 剖析器個別處理每個區塊。

從用戶端

如果您使用 MediaPipe LLM 等架構在用戶端執行 Gemma 等模型,串流資料會透過回呼函式傳輸。

例如:

llmInference.generateResponse(   inputPrompt,   (chunk, done) => {      console.log(chunk); }); 

使用 Prompt API 時,您可以透過疊代 ReadableStream,以區塊形式取得串流資料。

const languageModel = await LanguageModel.create(); const stream = languageModel.promptStreaming(inputPrompt); for await (const chunk of stream) {   console.log(chunk); } 

後續步驟

您是否想知道如何以高效能且安全的方式,算繪串流資料?請參閱大型語言模型回覆內容的顯示最佳做法