建立語音音訊檔案

Text-to-Speech 可將字詞和句子轉換為 base64 編碼的自然人類語音音訊資料。接著,您可以解碼 base64 資料,將音訊資料轉換成可播放的音訊檔,例如 MP3。Text-to-Speech API 接受原始文字或語音合成標記語言 (SSML) 做為輸入內容。

本文說明如何使用 Text-to-Speech,從文字或 SSML 輸入內容建立音訊檔案。如果您不熟悉語音合成或 SSML 等概念,也可以參閱「Text-to-Speech 基礎知識」一文。

參考這些範本之前,您必須先安裝並初始化 Google Cloud CLI。如要瞭解如何設定 gcloud CLI,請參閱「向 TTS 服務驗證」。

將文字轉換為合成語音音訊

下列程式碼範例示範如何將字串轉換為音訊資料。

您可以透過各種方式設定語音合成的輸出,包括選取不重複的語音調節輸出的音調、音量、說話速度和取樣率

通訊協定

如要瞭解完整的詳細資訊,請參閱 text:synthesize API 端點。

如要從文字合成音訊,請向 text:synthesize 端點發出 HTTP POST 要求。在 POST 要求的主體中,請在 voice 設定區段指定要合成的語音類型,在 input 區段的 text 欄位中指定要合成的文字,並在 audioConfig 區段中指定要建立的音訊類型。

下列程式碼片段會將合成要求傳送至 text:synthesize 端點,並將結果儲存至名為 synthesize-text.txt 的檔案。將 PROJECT_ID 替換為您的專案 ID。

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \   -H "x-goog-user-project: <var>PROJECT_ID</var>" \   -H "Content-Type: application/json; charset=utf-8" \   --data "{     'input':{       'text':'Android is a mobile operating system developed by Google,          based on the Linux kernel and designed primarily for          touchscreen mobile devices such as smartphones and tablets.'     },     'voice':{       'languageCode':'en-gb',       'name':'en-GB-Standard-A',       'ssmlGender':'FEMALE'     },     'audioConfig':{       'audioEncoding':'MP3'     }   }" "https://texttospeech.googleapis.com/v1/text:synthesize" > synthesize-text.txt

Text-to-Speech API 會以 base64 編碼資料傳回合成的音訊,包含在 JSON 輸出中。synthesize-text.txt 檔案中的 JSON 輸出內容類似於下列程式碼片段。

 {   "audioContent": "//NExAASCCIIAAhEAGAAEMW4kAYPnwwIKw/BBTpwTvB+IAxIfghUfW.." } 

如要將 Text-to-Speech API 傳回的結果解碼為 MP3 音訊檔案,請從 synthesize-text.txt 檔案所在的目錄執行下列指令。

cat synthesize-text.txt | grep 'audioContent' | \ sed 's|audioContent| |' | tr -d '\n ":{},' > tmp.txt && \ base64 tmp.txt --decode > synthesize-text-audio.mp3 && \ rm tmp.txt

Go

如要瞭解如何安裝及使用 Text-to-Speech 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Text-to-Speech Go API 參考說明文件

如要向 Text-to-Speech 服務驗證身分,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

 // SynthesizeText synthesizes plain text and saves the output to outputFile. func SynthesizeText(w io.Writer, text, outputFile string) error { 	ctx := context.Background()  	client, err := texttospeech.NewClient(ctx) 	if err != nil { 		return err 	} 	defer client.Close()  	req := texttospeechpb.SynthesizeSpeechRequest{ 		Input: &texttospeechpb.SynthesisInput{ 			InputSource: &texttospeechpb.SynthesisInput_Text{Text: text}, 		}, 		// Note: the voice can also be specified by name. 		// Names of voices can be retrieved with client.ListVoices(). 		Voice: &texttospeechpb.VoiceSelectionParams{ 			LanguageCode: "en-US", 			SsmlGender:   texttospeechpb.SsmlVoiceGender_FEMALE, 		}, 		AudioConfig: &texttospeechpb.AudioConfig{ 			AudioEncoding: texttospeechpb.AudioEncoding_MP3, 		}, 	}  	resp, err := client.SynthesizeSpeech(ctx, &req) 	if err != nil { 		return err 	}  	err = os.WriteFile(outputFile, resp.AudioContent, 0644) 	if err != nil { 		return err 	} 	fmt.Fprintf(w, "Audio content written to file: %v\n", outputFile) 	return nil } 

Java

如要瞭解如何安裝及使用 Text-to-Speech 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Text-to-Speech Java API 參考說明文件

如要向 Text-to-Speech 服務驗證身分,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

/**  * Demonstrates using the Text to Speech client to synthesize text or ssml.  *  * @param text the raw text to be synthesized. (e.g., "Hello there!")  * @throws Exception on TextToSpeechClient Errors.  */ public static ByteString synthesizeText(String text) throws Exception {   // Instantiates a client   try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {     // Set the text input to be synthesized     SynthesisInput input = SynthesisInput.newBuilder().setText(text).build();      // Build the voice request     VoiceSelectionParams voice =         VoiceSelectionParams.newBuilder()             .setLanguageCode("en-US") // languageCode = "en_us"             .setSsmlGender(SsmlVoiceGender.FEMALE) // ssmlVoiceGender = SsmlVoiceGender.FEMALE             .build();      // Select the type of audio file you want returned     AudioConfig audioConfig =         AudioConfig.newBuilder()             .setAudioEncoding(AudioEncoding.MP3) // MP3 audio.             .build();      // Perform the text-to-speech request     SynthesizeSpeechResponse response =         textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);      // Get the audio contents from the response     ByteString audioContents = response.getAudioContent();      // Write the response to the output file.     try (OutputStream out = new FileOutputStream("output.mp3")) {       out.write(audioContents.toByteArray());       System.out.println("Audio content written to file \"output.mp3\"");       return audioContents;     }   } }

Node.js

如要瞭解如何安裝及使用 Text-to-Speech 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Text-to-Speech Node.js API 參考說明文件

如要向 Text-to-Speech 服務驗證身分,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

const textToSpeech = require('@google-cloud/text-to-speech'); const fs = require('fs'); const util = require('util');  const client = new textToSpeech.TextToSpeechClient();  /**  * TODO(developer): Uncomment the following lines before running the sample.  */ // const text = 'Text to synthesize, eg. hello'; // const outputFile = 'Local path to save audio file to, e.g. output.mp3';  const request = {   input: {text: text},   voice: {languageCode: 'en-US', ssmlGender: 'FEMALE'},   audioConfig: {audioEncoding: 'MP3'}, }; const [response] = await client.synthesizeSpeech(request); const writeFile = util.promisify(fs.writeFile); await writeFile(outputFile, response.audioContent, 'binary'); console.log(`Audio content written to file: ${outputFile}`);

Python

如要瞭解如何安裝及使用 Text-to-Speech 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Text-to-Speech Python API 參考說明文件

如要向 Text-to-Speech 服務驗證身分,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

def synthesize_text():     """Synthesizes speech from the input string of text."""     from google.cloud import texttospeech      text = "Hello there."     client = texttospeech.TextToSpeechClient()      input_text = texttospeech.SynthesisInput(text=text)      # Note: the voice can also be specified by name.     # Names of voices can be retrieved with client.list_voices().     voice = texttospeech.VoiceSelectionParams(         language_code="en-US",         name="en-US-Chirp3-HD-Charon",     )      audio_config = texttospeech.AudioConfig(         audio_encoding=texttospeech.AudioEncoding.MP3     )      response = client.synthesize_speech(         input=input_text,         voice=voice,         audio_config=audio_config,     )      # The response's audio_content is binary.     with open("output.mp3", "wb") as out:         out.write(response.audio_content)         print('Audio content written to file "output.mp3"')  

其他語言

C#: 請按照用戶端程式庫頁面上的 C# 設定說明操作, 然後前往 .NET 適用的 Text-to-Speech 參考說明文件

PHP: 請按照用戶端程式庫頁面上的 PHP 設定說明 操作,然後前往 PHP 適用的 Text-to-Speech 參考文件

Ruby: 請按照用戶端程式庫頁面的 Ruby 設定說明 操作,然後前往 Ruby 適用的 Text-to-Speech 參考說明文件

將 SSML 轉換成合成語音音訊

在音訊合成要求中使用 SSML 會產生更類似自然人類語音的音訊。具體來說,SSML 可讓您以更精密的方式控制音訊輸出在說話時如何表示停頓,或音訊如何唸出日期、時間、首字母縮略字及縮寫。

如要進一步瞭解 Text-to-Speech API 支援的 SSML 元素,請參閱 SSML 參考資料

通訊協定

如要瞭解完整的詳細資訊,請參閱 text:synthesize API 端點。

如要從 SSML 合成音訊,請向 text:synthesize 端點發出 HTTP POST 要求。在 POST 要求的主體中,請在 voice 設定區段指定要合成的語音類型,在 input 區段的 ssml 欄位中指定要合成的 SSML,然後在 audioConfig 區段中指定要建立的音訊類型。

下列程式碼片段會將合成要求傳送至 text:synthesize 端點,並將結果儲存至名為 synthesize-ssml.txt 的檔案。將 PROJECT_ID 替換為您的專案 ID。

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \   -H "x-goog-user-project: <var>PROJECT_ID</var>" \   -H "Content-Type: application/json; charset=utf-8" --data "{     'input':{      'ssml':'<speak>The <say-as interpret-as=\"characters\">SSML</say-as> standard           is defined by the <sub alias=\"World Wide Web Consortium\">W3C</sub>.</speak>'     },     'voice':{       'languageCode':'en-us',       'name':'en-US-Standard-B',       'ssmlGender':'MALE'     },     'audioConfig':{       'audioEncoding':'MP3'     }   }" "https://texttospeech.googleapis.com/v1/text:synthesize" > synthesize-ssml.txt

Text-to-Speech API 會以 base64 編碼資料傳回合成的音訊,包含在 JSON 輸出中。synthesize-ssml.txt 檔案中的 JSON 輸出內容類似於下列程式碼片段。

 {   "audioContent": "//NExAASCCIIAAhEAGAAEMW4kAYPnwwIKw/BBTpwTvB+IAxIfghUfW.." } 

如要將 Text-to-Speech API 傳回的結果解碼為 MP3 音訊檔案,請從 synthesize-ssml.txt 檔案所在的目錄執行下列指令。

cat synthesize-ssml.txt | grep 'audioContent' | \ sed 's|audioContent| |' | tr -d '\n ":{},' > tmp.txt && \ base64 tmp.txt --decode > synthesize-ssml-audio.mp3 && \ rm tmp.txt

Go

如要瞭解如何安裝及使用 Text-to-Speech 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Text-to-Speech Go API 參考說明文件

如要向 Text-to-Speech 服務驗證身分,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

 // SynthesizeSSML synthesizes ssml and saves the output to outputFile. // // ssml must be well-formed according to: // //	https://www.w3.org/TR/speech-synthesis/ // // Example: <speak>Hello there.</speak> func SynthesizeSSML(w io.Writer, ssml, outputFile string) error { 	ctx := context.Background()  	client, err := texttospeech.NewClient(ctx) 	if err != nil { 		return err 	} 	defer client.Close()  	req := texttospeechpb.SynthesizeSpeechRequest{ 		Input: &texttospeechpb.SynthesisInput{ 			InputSource: &texttospeechpb.SynthesisInput_Ssml{Ssml: ssml}, 		}, 		// Note: the voice can also be specified by name. 		// Names of voices can be retrieved with client.ListVoices(). 		Voice: &texttospeechpb.VoiceSelectionParams{ 			LanguageCode: "en-US", 			SsmlGender:   texttospeechpb.SsmlVoiceGender_FEMALE, 		}, 		AudioConfig: &texttospeechpb.AudioConfig{ 			AudioEncoding: texttospeechpb.AudioEncoding_MP3, 		}, 	}  	resp, err := client.SynthesizeSpeech(ctx, &req) 	if err != nil { 		return err 	}  	err = os.WriteFile(outputFile, resp.AudioContent, 0644) 	if err != nil { 		return err 	} 	fmt.Fprintf(w, "Audio content written to file: %v\n", outputFile) 	return nil } 

Java

如要瞭解如何安裝及使用 Text-to-Speech 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Text-to-Speech Java API 參考說明文件

如要向 Text-to-Speech 服務驗證身分,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

/**  * Demonstrates using the Text to Speech client to synthesize text or ssml.  *  * <p>Note: ssml must be well-formed according to: (https://www.w3.org/TR/speech-synthesis/  * Example: <speak>Hello there.</speak>  *  * @param ssml the ssml document to be synthesized. (e.g., "<?xml...")  * @throws Exception on TextToSpeechClient Errors.  */ public static ByteString synthesizeSsml(String ssml) throws Exception {   // Instantiates a client   try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {     // Set the ssml input to be synthesized     SynthesisInput input = SynthesisInput.newBuilder().setSsml(ssml).build();      // Build the voice request     VoiceSelectionParams voice =         VoiceSelectionParams.newBuilder()             .setLanguageCode("en-US") // languageCode = "en_us"             .setSsmlGender(SsmlVoiceGender.FEMALE) // ssmlVoiceGender = SsmlVoiceGender.FEMALE             .build();      // Select the type of audio file you want returned     AudioConfig audioConfig =         AudioConfig.newBuilder()             .setAudioEncoding(AudioEncoding.MP3) // MP3 audio.             .build();      // Perform the text-to-speech request     SynthesizeSpeechResponse response =         textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);      // Get the audio contents from the response     ByteString audioContents = response.getAudioContent();      // Write the response to the output file.     try (OutputStream out = new FileOutputStream("output.mp3")) {       out.write(audioContents.toByteArray());       System.out.println("Audio content written to file \"output.mp3\"");       return audioContents;     }   } }

Node.js

如要瞭解如何安裝及使用 Text-to-Speech 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Text-to-Speech Node.js API 參考說明文件

如要向 Text-to-Speech 服務驗證身分,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

const textToSpeech = require('@google-cloud/text-to-speech'); const fs = require('fs'); const util = require('util');  const client = new textToSpeech.TextToSpeechClient();  /**  * TODO(developer): Uncomment the following lines before running the sample.  */ // const ssml = '<speak>Hello there.</speak>'; // const outputFile = 'Local path to save audio file to, e.g. output.mp3';  const request = {   input: {ssml: ssml},   voice: {languageCode: 'en-US', ssmlGender: 'FEMALE'},   audioConfig: {audioEncoding: 'MP3'}, };  const [response] = await client.synthesizeSpeech(request); const writeFile = util.promisify(fs.writeFile); await writeFile(outputFile, response.audioContent, 'binary'); console.log(`Audio content written to file: ${outputFile}`);

Python

如要瞭解如何安裝及使用 Text-to-Speech 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Text-to-Speech Python API 參考說明文件

如要向 Text-to-Speech 服務驗證身分,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

def synthesize_ssml():     """Synthesizes speech from the input string of ssml.      Note: ssml must be well-formed according to:         https://www.w3.org/TR/speech-synthesis/      """     from google.cloud import texttospeech      ssml = "<speak>Hello there.</speak>"     client = texttospeech.TextToSpeechClient()      input_text = texttospeech.SynthesisInput(ssml=ssml)      # Note: the voice can also be specified by name.     # Names of voices can be retrieved with client.list_voices().     voice = texttospeech.VoiceSelectionParams(         language_code="en-US",         name="en-US-Standard-C",         ssml_gender=texttospeech.SsmlVoiceGender.FEMALE,     )      audio_config = texttospeech.AudioConfig(         audio_encoding=texttospeech.AudioEncoding.MP3     )      response = client.synthesize_speech(         input=input_text, voice=voice, audio_config=audio_config     )      # The response's audio_content is binary.     with open("output.mp3", "wb") as out:         out.write(response.audio_content)         print('Audio content written to file "output.mp3"')  

其他語言

C#: 請按照用戶端程式庫頁面上的 C# 設定說明操作, 然後前往 .NET 適用的 Text-to-Speech 參考說明文件

PHP: 請按照用戶端程式庫頁面上的 PHP 設定說明 操作,然後前往 PHP 適用的 Text-to-Speech 參考文件

Ruby: 請按照用戶端程式庫頁面的 Ruby 設定說明 操作,然後前往 Ruby 適用的 Text-to-Speech 參考說明文件

歡迎試用

如果您未曾使用過 Google Cloud,歡迎建立帳戶,親自體驗實際使用 Text-to-Speech 的成效。新客戶可以獲得價值 $300 美元的免費抵免額,可用於執行、測試及部署工作負載。

免費試用 Text-to-Speech